Non blocking read and thread

Hello everybody,
I noticed yesterday then any non blocking socket return in blocking mode
inside a thread. Is it on purpose, or a bug?

To be clearer, this script above, as expected, will not block:

require 'socket'
Thread.new do
	sleep 2
	client = TCPSocket.new('localhost', '1999') 
	puts "send a message"
	client.puts "message"
end
server = TCPServer.new("0.0.0.0","1999")
sock = server.accept
sock.fcntl(Fcntl::F_SETFL, Fcntl::O_NONBLOCK)
puts "try to read it...
sock.read(100)
puts "done !"

But this one will:

require 'socket'
Thread.new do
	sleep 2
	client = TCPSocket.new('localhost', '1999') 
	puts "send a message"
	client.puts "message"
end
server = TCPServer.new("0.0.0.0","1999")
thr = Thread.new do
	sock = server.accept
	sock.fcntl(Fcntl::F_SETFL, Fcntl::O_NONBLOCK)
	puts "try to read it...
	sock.read(100)
	puts "done !"
end
thr.join

Just in case, I’m running ruby 1.8.1 on Suse 9.0 with a 2.4.21 kernel

yannick

  Thread.new do
    sleep 2
    client = TCPSocket.new('localhost', '1999')
    puts "send a message"
    client.puts "message"

                client.close

  end

Guy Decoux

Thanks you for your answer, but that’s not really what I meant. I know
than by closing the socket you unblock it anyway, but I would like to
know why there is a difference between reading a non blocking IO in the
main thread and reading it in another thread.

Yannick

···

-----Original Message-----
From: ts [mailto:decoux@moulon.inra.fr]
Sent: Friday, March 26, 2004 8:52 PM
To: ruby-talk ML
Cc: ruby-talk@ruby-lang.org
Subject: Re: non blocking read and thread

Thread.new do
sleep 2
client = TCPSocket.new(‘localhost’, ‘1999’)
puts “send a message”
client.puts “message”

            client.close

end

Guy Decoux

Thanks you for your answer, but that's not really what I meant. I know
than by closing the socket you unblock it anyway, but I would like to
know why there is a difference between reading a non blocking IO in the
main thread and reading it in another thread.

Re-read your script, you have 2 threads : the main thread waiting that
`thr' finish and `thr' waiting for reading a character or EOF.

They are blocked : even if the read is non-blocking, the control is given
to the main thread (which wait for the termination of `thr') which give
the control to `thr' (which try to read a character) which give the
control to the main thread (which wait for the termination of `thr'), etc

Guy Decoux

You are right, after further tests; it looks like a thread scheduling
problem, it could perhaps be solved by making the thread critical, but
then it only return the EAGAIN error. Well, it seems I’m stock here,
better find another way around :slight_smile:

thx
Yannick

···

-----Original Message-----
From: ts [mailto:decoux@moulon.inra.fr]
Sent: Friday, March 26, 2004 9:58 PM
To: ruby-talk ML
Cc: ruby-talk@ruby-lang.org
Subject: Re: non blocking read and thread

Thanks you for your answer, but that’s not really what I meant. I
know
than by closing the socket you unblock it anyway, but I would like to
know why there is a difference between reading a non blocking IO in
the
main thread and reading it in another thread.

Re-read your script, you have 2 threads : the main thread waiting that
thr' finish and thr’ waiting for reading a character or EOF.

They are blocked : even if the read is non-blocking, the control is
given
to the main thread (which wait for the termination of thr') which give the control to thr’ (which try to read a character) which give the
control to the main thread (which wait for the termination of `thr’),
etc

Guy Decoux

In article 000001c4134c$0cbe5920$0300a8c0@boulba,
“yannick” yannick@dazzlebox.com writes:

You are right, after further tests; it looks like a thread scheduling
problem, it could perhaps be solved by making the thread critical, but
then it only return the EAGAIN error. Well, it seems I’m stock here,
better find another way around :slight_smile:

I recommend sysread instead of IO#read with nonblocking mode.

However, your first script is interesting because it transit
multithread to singlethread after sock.read(100) is started.

When there are two or more threads, Ruby calls select(2) before each
IO operation for thread scheduling. But Ruby doesn’t call select(2)
when there is only one thread.

Your first script invokes following system calls.

% strace -e read,write,select ruby tst1

select(0, , , , {0, 0}) = 0 (Timeout)
select(6, [5], , , {1, 999368}) = 0 (Timeout)
select(6, [5], , , {0, 3115}) = 0 (Timeout)
select(6, [5], , , {0, 0}) = 0 (Timeout)
select(9, [5], [7], , NULL) = 2 (in [5], out [7])
write(1, “try to read it…\n”, 18) = 18
(*1) select(9, [8], , , {0, 0}) = 0 (Timeout)
write(1, “send a message\n”, 15) = 15
select(9, [8], [7], , NULL) = 1 (out [7])
write(7, “message”, 7) = 7
select(9, [8], [7], , NULL) = 2 (in [8], out [7])
select(9, [8], , , {0, 0}) = 1 (in [8], left {0, 0})
write(7, “\n”, 1) = 1
(*2) read(8, “message\n”, 1024) = 8
(*3) read(8, 0x4001a000, 1024) = -1 EAGAIN (Resource temporarily unavailable)
write(1, “done !\n”, 7) = 7

System calls corresponding to sock.read(100) is (*1), (*2) and (*3)

(*1) Check sock is readable or not. Since it is not readable yet,
it blocks and switch to other thread.
(*2) Read 8bytes from sock: “message\n”.
(*3) Occur EAGAIN. It terminates sock.read(100).

What interesting is that select is not called before (*3).
If select is called at (*3), it blocks forever.
This means that there are no thread other than the main thread.
Actually the client thread is finished before (*2).

So sock.read(100) blocks at first(*1) by select because multithread.
But it doesn’t block at last(*3) because singlethread.

I think this behavior is very confusing.

···


Tanaka Akira

In article 000001c4134c$0cbe5920$0300a8c0@boulba,
“yannick” yannick@dazzlebox.com writes:

You are right, after further tests; it looks like a thread scheduling
problem, it could perhaps be solved by making the thread critical, but
then it only return the EAGAIN error. Well, it seems I’m stock here,
better find another way around :slight_smile:

I recommend sysread instead of IO#read with nonblocking mode.

can you please elaborate on this? why exactly?

However, your first script is interesting because it transit
multithread to singlethread after sock.read(100) is started.

When there are two or more threads, Ruby calls select(2) before each
IO operation for thread scheduling. But Ruby doesn’t call select(2)
when there is only one thread.

Your first script invokes following system calls.

% strace -e read,write,select ruby tst1

select(0, , , , {0, 0}) = 0 (Timeout)
select(6, [5], , , {1, 999368}) = 0 (Timeout)
select(6, [5], , , {0, 3115}) = 0 (Timeout)
select(6, [5], , , {0, 0}) = 0 (Timeout)
select(9, [5], [7], , NULL) = 2 (in [5], out [7])
write(1, “try to read it…\n”, 18) = 18
(*1) select(9, [8], , , {0, 0}) = 0 (Timeout)
write(1, “send a message\n”, 15) = 15
select(9, [8], [7], , NULL) = 1 (out [7])
write(7, “message”, 7) = 7
select(9, [8], [7], , NULL) = 2 (in [8], out [7])
select(9, [8], , , {0, 0}) = 1 (in [8], left {0, 0})
write(7, “\n”, 1) = 1
(*2) read(8, “message\n”, 1024) = 8
(*3) read(8, 0x4001a000, 1024) = -1 EAGAIN (Resource temporarily unavailable)
write(1, “done !\n”, 7) = 7

System calls corresponding to sock.read(100) is (*1), (*2) and (*3)

(*1) Check sock is readable or not. Since it is not readable yet,
it blocks and switch to other thread.
(*2) Read 8bytes from sock: “message\n”.
(*3) Occur EAGAIN. It terminates sock.read(100).

What interesting is that select is not called before (*3).
If select is called at (*3), it blocks forever.
This means that there are no thread other than the main thread.
Actually the client thread is finished before (*2).

So sock.read(100) blocks at first(*1) by select because multithread.
But it doesn’t block at last(*3) because singlethread.

I think this behavior is very confusing.

if it is confusing to you, we are all doomed! your explanations are really
insightful and i for one really appreciate you taking to time to enlighten

maybe dave could have you write the new pickaxe thread section! :wink:

-a

···

On Sat, 27 Mar 2004, Tanaka Akira wrote:

===============================================================================

EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
PHONE :: 303.497.6469
ADDRESS :: E/GC2 325 Broadway, Boulder, CO 80305-3328
URL :: Solar-Terrestrial Physics Data | NCEI
TRY :: for l in ruby perl;do $l -e “print "\x3a\x2d\x29\x0a"”;done
===============================================================================

Thank you for the tip, using sysread does fix this problem. But after
further tests, I found another problem. Apparently, as Ara.T.Howard
pointed out before, Kernel#select and IO#sysread seems to be a bad mix.
Every sockets Kernel#select returns seems to be buffered, as this code
shows :

require 'socket'
Thread.new do
	sleep 2
	client = TCPSocket.new('localhost', '1999') 
	loop do
		sleep 1
		client.syswrite "message"
	end
end
server = TCPServer.new("0.0.0.0","1999")
thr = Thread.new do
	socks = []
	loop do
		rs,wd,ed = select(socks+[server],nil,nil)
		rs.each do |sock|
			if sock == server
				n_sock = server.accept
				n_sock.fcntl(Fcntl::F_SETFL,

Fcntl::O_NONBLOCK)
socks.push n_sock
#This call will not raise an
IOError
#because this socket in not
buffered yet.
puts n_sock.sysread(100)
elsif sock.closed? || sock.eof?
socks.delete sock
else
#This call however will raise an
IOError.
puts sock.sysread(100)
end
end
end
end
thr.join

Is it a normal behavior? If so, does an option to allow Kernel#select to
returns non-buffered IO exists?

Yannick

···

-----Original Message-----
From: Tanaka Akira [mailto:akr@m17n.org]
Sent: Saturday, March 27, 2004 2:26 AM
To: ruby-talk ML
Subject: Re: non blocking read and thread

In article 000001c4134c$0cbe5920$0300a8c0@boulba,
“yannick” yannick@dazzlebox.com writes:

You are right, after further tests; it looks like a thread scheduling
problem, it could perhaps be solved by making the thread critical, but
then it only return the EAGAIN error. Well, it seems I’m stock here,
better find another way around :slight_smile:

I recommend sysread instead of IO#read with nonblocking mode.

However, your first script is interesting because it transit
multithread to singlethread after sock.read(100) is started.

When there are two or more threads, Ruby calls select(2) before each
IO operation for thread scheduling. But Ruby doesn’t call select(2)
when there is only one thread.

Your first script invokes following system calls.

% strace -e read,write,select ruby tst1

select(0, , , , {0, 0}) = 0 (Timeout)
select(6, [5], , , {1, 999368}) = 0 (Timeout)
select(6, [5], , , {0, 3115}) = 0 (Timeout)
select(6, [5], , , {0, 0}) = 0 (Timeout)
select(9, [5], [7], , NULL) = 2 (in [5], out [7])
write(1, “try to read it…\n”, 18) = 18
(*1) select(9, [8], , , {0, 0}) = 0 (Timeout)
write(1, “send a message\n”, 15) = 15
select(9, [8], [7], , NULL) = 1 (out [7])
write(7, “message”, 7) = 7
select(9, [8], [7], , NULL) = 2 (in [8], out [7])
select(9, [8], , , {0, 0}) = 1 (in [8], left {0, 0})
write(7, “\n”, 1) = 1
(*2) read(8, “message\n”, 1024) = 8
(*3) read(8, 0x4001a000, 1024) = -1 EAGAIN (Resource
temporarily unavailable)
write(1, “done !\n”, 7) = 7

System calls corresponding to sock.read(100) is (*1), (*2) and (*3)

(*1) Check sock is readable or not. Since it is not readable yet,
it blocks and switch to other thread.
(*2) Read 8bytes from sock: “message\n”.
(*3) Occur EAGAIN. It terminates sock.read(100).

What interesting is that select is not called before (*3).
If select is called at (*3), it blocks forever.
This means that there are no thread other than the main thread.
Actually the client thread is finished before (*2).

So sock.read(100) blocks at first(*1) by select because multithread.
But it doesn’t block at last(*3) because singlethread.

I think this behavior is very confusing.

Tanaka Akira

In article 000001c413d5$e0ec97a0$0300a8c0@boulba,
“yannick” yannick@dazzlebox.com writes:

Thank you for the tip, using sysread does fix this problem. But after
further tests, I found another problem. Apparently, as Ara.T.Howard
pointed out before, Kernel#select and IO#sysread seems to be a bad mix.
Every sockets Kernel#select returns seems to be buffered, as this code
shows :

The problem is not Kernel#select but IO#eof?.
IO#eof? is a kind of buffered IO methods.

EOF should be detected using EOFError caused by sysread.

    require 'socket'
    require 'fcntl'
    Thread.new do
            sleep 2
            client = TCPSocket.new('localhost', '1999') 
            loop do
                    sleep 1
                    client.syswrite "message"
            end
    end
    server = TCPServer.new("0.0.0.0","1999")
    thr = Thread.new do
            socks = []
            loop do
                    rs,wd,ed = select(socks+[server],nil,nil)
                    rs.each do |sock|
                            if sock == server
                                    n_sock = server.accept
                                    n_sock.fcntl(Fcntl::F_SETFL, Fcntl::O_NONBLOCK)
                                    socks.push n_sock
                                    #This call will not raise an IOError
                                    #because this socket in not buffered yet.
                                    puts n_sock.sysread(100)
                            elsif sock.closed?
                                    socks.delete sock
                            else
                                    #This call however will raise an IOError.
                                    begin
                                            puts sock.sysread(100)
                                    rescue EOFError
                                            sock.close
                                            socks.delete sock
                                    end
                            end
                    end
            end
    end
    thr.join
···


Tanaka Akira

In article Pine.LNX.4.44.0403261105410.15230-100000@fattire.ngdc.noaa.gov,
“Ara.T.Howard” ahoward@fattire.ngdc.noaa.gov writes:

I recommend sysread instead of IO#read with nonblocking mode.

can you please elaborate on this? why exactly?

Because nonblocking sucks.

(1) nonblocking flag on a fd may be set/unset unexpectedly by other
process if the fd is shared.
(2) many code is unaware about nonblocking.
For example:
(a) Ruby’s thread scheduler doesn’t assume fds may be nonblocking. [ruby-talk:66196]
(b) stdio is unaware about nonblocking.
Especially it is too hard to fix data lost on nonblocking write with stdio.
[ruby-talk:93917]

Although we can fix Ruby, we cannot fix other programs.
Setting nonblocking flag may cause problems on other process because (1).

See Emacs/SSH case:
http://groups.google.com/groups?th=e4df2fdc1f4f4950
http://sources.redhat.com/ml/bug-glibc/2002-08/threads.html#00041
http://sources.redhat.com/ml/bug-glibc/2002-08/threads.html#00186

and DJB’s opininon:
http://cr.yp.to/unix/nonblock.html

···


Tanaka Akira