OK, the output of strace -o trace -ff -ttt ./pipes.rb produces two
files:
ftp://ftp\_20140119\_17561:eLNy4r\+fAN27@ftp.dna.ku.dk
Thank you!
I have yet to figure out what they mean.
I can share a few observations:
The parent creates three pipes:
$ fgrep pipe2 trace.1*
trace.18386:1390135219.046131 pipe2([3, 4], O_CLOEXEC) = 0
trace.18386:1390135219.046295 pipe2([5, 6], O_CLOEXEC) = 0
trace.18386:1390135219.112926 pipe2([7, 8], O_CLOEXEC) = 0
The client reads only from FD 3 and 5:
$ fgrep read trace.18387
1390135375.259220 read(3, "!", 1024) = 1
1390135375.259410 read(3, 0x7fb63c823020, 1024) = -1 EAGAIN (Resource
temporarily unavailable)
1390135375.259663 read(5, 0x7fb63c823020, 1024) = -1 EAGAIN (Resource
temporarily unavailable)
1390135375.261880 read(3, "!", 1024) = 1
1390135375.261910 read(3, 0x7fb63c823020, 1024) = -1 EAGAIN (Resource
temporarily unavailable)
1390135375.261932 read(5, 0x7fb63c823020, 1024) = -1 EAGAIN (Resource
temporarily unavailable)
The parent writes to 2 (stderr) and 4 and 8
$ egrep -o 'write\([0-9]+' trace.18386 | sort -u
write(2
write(4
write(8
But to 4 only at the end
$ fgrep 'write(4' trace.18386
1390135375.258855 write(4, "!", 1) = 1
1390135375.261804 write(4, "!", 1) = 1
Main output from file "big.tab" goes to 8:
$ fgrep -c 'write(8' trace.18386
1924
Now, as you can see 8 is connected to 7 but since the child never
reads from 7 no data is transferred to it.
If you look at the ordering of pipe2 and clone you see that the client
is created before the last pipe is opened:
$ egrep '^[0-9]+\.[0-9]+ (pipe|clone)' trace.18386
1390135219.046131 pipe2([3, 4], O_CLOEXEC) = 0
1390135219.046295 pipe2([5, 6], O_CLOEXEC) = 0
1390135219.046513 clone(child_stack=0x7fb63c384ff0,
flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID,
parent_tidptr=0x7fb63c3859d0, tls=0x7fb63c385700,
child_tidptr=0x7fb63c3859d0) = 18387
1390135219.112926 pipe2([7, 8], O_CLOEXEC) = 0
So maybe it cannot read from FD 7 which it would need to do to read
what the parent writes to FD 8. But see at the end.
The strange thing is that almost all write operations are successful
because as many bytes are written as indicated:
$ egrep -c 'write\(8,.* ([0-9]+)\) = \1' trace.18386
1922
$ egrep -c 'write\(8,.* ([0-9]+)\) =' trace.18386
1924
$ diff -U5 <(egrep 'write\(8,.* ([0-9]+)\) =' trace.18386) <(egrep
'write\(8,.* ([0-9]+)\) = \1' trace.18386)
--- /dev/fd/63 2014-01-19 14:33:25.426820557 +0100
+++ /dev/fd/62 2014-01-19 14:33:25.426820557 +0100
@@ -1918,7 +1918,5 @@
1390135219.160905 write(8, "\332\0
4559\t4602\t3_cpVRyxwXsN1/2\t592"..., 35) = 35
1390135219.160929 write(8, "\332\0
4563\t4597\t5_X4035ywXsN1/1\t833"..., 35) = 35
1390135219.160954 write(8, "\332\0
4565\t4599\t1_bWkzlxwXsN1/1\t442"..., 35) = 35
1390135219.160980 write(8, "\332\0
4568\t4611\t1_TQZWsxwXsN1/2\t540"..., 35) = 35
1390135219.161004 write(8, "\332\0
4570\t4613\t3_gQoPtxwXsN1/2\t196"..., 35) = 35
-1390135219.161029 write(8, "\2764571\t4605\t5_GQbZ3ywXsN1/1\t7\t-\n",
31) = ? ERESTARTSYS (To be restarted)
-1390135375.259425 write(8, "\2764571\t4605\t5_GQbZ3ywXsN1/1\t7\t-\n",
31) = ? ERESTARTSYS (To be restarted)
The last two ones happen because of the signal I believe:
1390135219.161029 write(8, "\2764571\t4605\t5_GQbZ3ywXsN1/1\t7\t-\n",
31) = ? ERESTARTSYS (To be restarted)
1390135375.258745 --- SIGINT (Interrupt) @ 0 (0) ---
1390135375.258855 write(4, "!", 1) = 1
1390135375.259234 rt_sigreturn(0x2) = -1 EINTR (Interrupted system call)
1390135375.259425 write(8, "\2764571\t4605\t5_GQbZ3ywXsN1/1\t7\t-\n",
31) = ? ERESTARTSYS (To be restarted)
1390135375.259784 --- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
1390135375.259836 rt_sigreturn(0x1a) = -1 EINTR (Interrupted system call)
All in all 65266 bytes are written:
$ awk '/write\(8,.* = [0-9]+/ {sum+=$NF} END{print sum}' trace.18386
65266
This is 270 bytes short of 64k. I have no idea whether you happened to
kill the parent before it got into the stuck state. You could check
your pipe buffer size with "ulimit -p". On linux this gives buffer
size in 512 byte blocks. On my machine it returns 8 which is 4k which
also happens to be the memory page size but of course it may be
different on your system.
Maybe you repeat the test and wait a bit longer to see whether the
parent gets into some kind of blocking state.
I looked a bit closer at flags passed to clone() and it seems the
child is a thread only:
1390135219.046513 clone(child_stack=0x7fb63c384ff0,
flags=CLONE_VM (same memory space)
CLONE_FS (same filesystem information)
CLONE_FILES (share the same file descriptor table.)
CLONE_SIGHAND (share the same table of signal handlers)
CLONE_THREAD (same thread group as the calling process)
CLONE_SYSVSEM (share a single list of System V semaphore undo values
CLONE_SETTLS (The newtls argument is the new TLS (Thread Local
Storage) descriptor.)
CLONE_PARENT_SETTID (Store child thread ID at location ptid in parent
and child memory)
CLONE_CHILD_CLEARTID (Erase child thread ID at location ctid in
child memory when the child exits,)
So, since both share the same FD table the client _could_ read from FD
7 but apparently nobody does. So apparently you have a pipe which is
written to but nobody reads from it. At some point in time the writer
then must block since buffers are limited - always.
Kind regards
robert
···
On Sun, Jan 19, 2014 at 1:55 PM, Martin Hansen <lists@ruby-forum.com> wrote:
--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/