Nonblocking IO read

Tom_Pollard · 1 November 2006 02:41

D'oh! (typo) That should have been

   status = POpen4::popen4('cat -') do |stdout, stderr, stdin|
     stdin.puts 'hello world'
     stdin.close
     puts "stdout: #{stdout.read.strip}"
     puts "stderr: #{stderr.read.strip}"
   end

Tom

···

On Oct 31, 2006, at 9:37 PM, Tom Pollard wrote:

If you're working in Unix you could try

status = POpen4::poen4('cat -') do |stdout, stderr, stdin|

Ara.T.Howard6 · 1 November 2006 07:10

hi francis-

it's exactly things like eventmachine that make me say using nbio is archaic -
i don't need to handle the complexities of nbio when powerful abstractions
like it exist!

for instance, consider the OP's orignial question: basically he wanted to
timeout if stdout was not produced in a certain amount of time. i certainly
wouldn't delve into the complexities of nbio to handle this, but instaed might
do something along the lines of

harrp:~ > cat a.rb
require 'open4' and require 'timeout'

   def might_take_too_long cmd, stdin, timeout = 42
     stdout, stderr = '', ''
     Timeout::timeout(timeout) do
       open4.popen4(cmd) do |cid,i,o,e|
         i.write stdin and i.close
         o.each{|line| stdout << line} and e.each{|line| stderr << line}
       end
     end
     [ stdout, stderr ]
   end

p might_take_too_long('ruby', 'sleep 1 and p 42', 2)
p might_take_too_long('ruby', 'sleep 42 and p 42', 2)

   harp:~ > ruby a.rb
   ["42\n", ""]
   /home/ahoward//lib/ruby/1.8/timeout.rb:43:in `might_take_too_long': execution expired (Timeout::Error)
   <snip>

now, timeout might use any manner of select and/or nbio - i don't care. my
point is just that, for 90-99% of the problems seem to want to solve with
nbio, it is way too low level a mechanism to skin the cat and prefer to stand
on the shoulders of powerful abstractions, like eventmachine or threads or
whatever, and avoid the painful details of EAGAIN, race conditions (it wasn't
ready so i... oops, now it is ready), and stdio buffers being lost.

for the record, i have code which uses nbio, io/nonblock, etc. it's simply my
observation that a majority of the posts to this list about nbio have two
issues:

- nothing else can really be done while waiting for io. eg. nbio isn't
needed for the problem at hand

   - something else could simoultaneously be done, but the problem could be
     more elegantly solved with threads, queues, readpartial, or something even
     more abstract

the rest, of course, really do need nbio. a quick search of archives though
is an eye opener into the difficulties of mixing nbio and stdio, check out
these threads, specifically the posts by tanaka akira, who knows about 10
billion times more about nbio and ruby than i do

http://groups-beta.google.com/group/comp.lang.ruby/browse_frm/thread/47b0e296cbe9c410/23e22cc9e56a3200?lnk=gst&q=non+blocking+io&rnum=1#23e22cc9e56a3200

http://groups-beta.google.com/group/comp.lang.ruby/browse_frm/thread/116231c43621a61b/775af5d9900aa716?lnk=gst&q=non+blocking+io&rnum=6#775af5d9900aa716

his insight is enough to raise 'too low-level' warnings in my mind! at least
until using nbio from a ruby script doesn't require such deep understanding of
ruby's internals

kind regards.

-a

···

On Wed, 1 Nov 2006, Francis Cianfrocca wrote:

Ara, I'm curious to know why you think the usage of nbio is archaic or
indicative of a design flaw. I wrote the EventMachine library in order to
make certain kinds of Ruby programs easier to write, and it uses nbio
pervasively. I'm fairly sure libevent does, also. EM works with almost any
kind of descriptor, except for certain things on Windows (like _pipe()
descriptors, which are nonselectable, and some of the native Windows objects
that don't support a socket-like API).

Up until recently (late May 2006), Ruby didn't have complete support for
nbio on all descriptor types, which is one of the reasons that EventMachine
includes a compiled C++ extension. EM interoperates perfectly well with Ruby
threads,* which is another important reason to use nbio.

--
my religion is very simple. my religion is kindness. -- the dalai lama

Robert_K1 · 1 November 2006 16:30

Tom Pollard wrote:

Anyway, you would only need nonblocking IO if you wanted to read bits of the stderr stream before the command exited, but that doesn't sound like what you're want.

Actually this is not correct: if there is a lot written to stderr then you need to read that concurrently. If you do not do that then the process will block on some stderr write operation that fills up the pipe and you get a deadlock because your code waits for process termination.

Kind regards

robert

Bill_Kelly · 1 November 2006 17:59

The select() function (Kernel#select, in Ruby) is the standard way to do IO multiplexing under Unix. You give it a list of file handles on which you're waiting to be able to read or write, and it returns when one or more of them is ready. Windows provides a select() function, too, but it only supports sockets (not pipes or ordinary file IO.) The Win32 API generally doesn't provide non- blocking IO methods, because it's assumed you'll use threads if you want to do something else while waiting for IO.

right. but ruby's thread are green and, on windows, block the entire process when waiting on IO so not an option here.

I thought Ruby internally uses non-blocking I/O in order to avoid that a green thread reading something blocks every other thread: am I wrong?
Or is this true just under unix?

Ruby uses select() internally, and Windows doesn't support
select() on pipes, just sockets.

Regards,

Bill

···

From: "Gabriele Marrone" <gabriele.marrone@gmail.com>

Il giorno 01/nov/06, alle ore 01:33, ara.t.howard@noaa.gov ha scritto:

Francis_Cianfrocca · 1 November 2006 18:35

I think part of the problem here is that on Windows, functions like _pipe()
were added primarily as hacks to make it easier to migrate programs from
Unix, and never completely implemented. It may be too much to ask for them
to play nice in the sandbox with select().

···

On 11/1/06, Bill Kelly <billk@cts.com> wrote:

Windows will be a problem. Admittedly, I haven't tried
ruby 1.8.5 yet, which has new nonblock_* methods. However,
my expectation is that you'll only get nonblocking behavior
on windows from sockets, not from pipes.

On Windows, calling select() on a pipe, always returns
immediately with "data ready to read", regardless if there's
any data there or not.

This has been the bane of my existence on Windows ruby for
5 or 6 years. I do IPC on Windows ruby using TCP over
loopback, instead of pipes, in order to get nonblocking
semantics. (That still doesn't help for reading from the
console, though... (search the archives for 'kbhit' for a
partial solution there...))

One of these years, I'd like to chat with a Windows guru
and ask how he/she would recommend making a select() that
works on both sockets and pipes on Windows. Ruby could
*really* use one.

Regards,

Bill

Ara.T.Howard6 · 1 November 2006 18:42

maybe

Apache Portable Runtime: Poll Routines

-a

···

On Thu, 2 Nov 2006, Bill Kelly wrote:

Windows will be a problem. Admittedly, I haven't tried ruby 1.8.5 yet,
which has new nonblock_* methods. However, my expectation is that you'll
only get nonblocking behavior on windows from sockets, not from pipes.

On Windows, calling select() on a pipe, always returns immediately with
"data ready to read", regardless if there's any data there or not.

This has been the bane of my existence on Windows ruby for 5 or 6 years. I
do IPC on Windows ruby using TCP over loopback, instead of pipes, in order
to get nonblocking semantics. (That still doesn't help for reading from the
console, though... (search the archives for 'kbhit' for a partial solution
there...))

One of these years, I'd like to chat with a Windows guru and ask how he/she
would recommend making a select() that works on both sockets and pipes on
Windows. Ruby could *really* use one.

--
my religion is very simple. my religion is kindness. -- the dalai lama

Ara.T.Howard6 · 1 November 2006 18:46

try this on windows

   harp:~ > cat a.rb
   t = Thread.new{ loop{ STDERR.puts Time.now.to_f } }
   STDIN.gets

-a

···

On Thu, 2 Nov 2006, Gabriele Marrone wrote:

I thought Ruby internally uses non-blocking I/O in order to avoid that a
green thread reading something blocks every other thread: am I wrong? Or is
this true just under unix?

--
my religion is very simple. my religion is kindness. -- the dalai lama

Francis_Cianfrocca · 1 November 2006 07:23

Thanks for clarifying, Ara. Having put a lot of work into domesticating nbio
for Ruby programs, all I can say is I agree with you.

I still don't feel like I understand what the OP really is trying to do.
Feels like it shouldn't be such a hard problem. Maybe nbio is just a red
herring in this case (although it does appear in the first sentence of the
OP).

···

On 11/1/06, ara.t.howard@noaa.gov <ara.t.howard@noaa.gov> wrote:

now, timeout might use any manner of select and/or nbio - i don't
care. my
point is just that, for 90-99% of the problems seem to want to solve with
nbio, it is way too low level a mechanism to skin the cat and prefer to
stand
on the shoulders of powerful abstractions, like eventmachine or threads or
whatever, and avoid the painful details of EAGAIN, race conditions (it
wasn't
ready so i... oops, now it is ready), and stdio buffers being lost.

for the record, i have code which uses nbio, io/nonblock, etc. it's
simply my
observation that a majority of the posts to this list about nbio have two
issues:

   - nothing else can really be done while waiting for io. eg. nbio
isn't
     needed for the problem at hand

   - something else could simoultaneously be done, but the problem could
be
     more elegantly solved with threads, queues, readpartial, or something
even
     more abstract

the rest, of course, really do need nbio. a quick search of archives
though
is an eye opener into the difficulties of mixing nbio and stdio, check out
these threads, specifically the posts by tanaka akira, who knows about 10
billion times more about nbio and ruby than i do

http://groups-beta.google.com/group/comp.lang.ruby/browse_frm/thread/47b0e296cbe9c410/23e22cc9e56a3200?lnk=gst&q=non+blocking+io&rnum=1#23e22cc9e56a3200

http://groups-beta.google.com/group/comp.lang.ruby/browse_frm/thread/116231c43621a61b/775af5d9900aa716?lnk=gst&q=non+blocking+io&rnum=6#775af5d9900aa716

http://groups-beta.google.com/group/comp.lang.ruby/browse_frm/thread/116231c43621a61b/775af5d9900aa716?lnk=gst&q=non+blocking+io&rnum=6#775af5d9900aa716

http://groups-beta.google.com/group/comp.lang.ruby/browse_frm/thread/116231c43621a61b/775af5d9900aa716?lnk=gst&q=non+blocking+io&rnum=6#775af5d9900aa716

his insight is enough to raise 'too low-level' warnings in my mind! at
least
until using nbio from a ruby script doesn't require such deep
understanding of
ruby's internals

Tom_Pollard · 1 November 2006 20:38

I guess I can see that, though I can't think of a program that I'd expect to be able to generate enough stderr output to clog a pipe. In any case, my response would be to merge stdout and stderr, rather than use non-blocking IO. If you're just reading one stream while the command is executing, you don't need to worry about blocking. I'm certainly with Ara in recommending that if you can avoid non-blocking IO, you should.

At the risk of starting an unrelated discussion ("stderr considered harmful"), my feeling has long been that stderr is misused by most people, and that the only context in which it makes any sense is for small commandline tools that you expect to use in a pipeline. For apps like that, it's helpful to keep error messages out of your stdout stream. For most apps, however, I don't think it makes any sense to write error messages to a separate file. Error messages should be written to the app's main log file or output file, where the user will be looking for their results. That way, nonfatal error messages also appear naturally in the proper sequence with other output. I work with a lot of scientist programmers who don't think much about issues like this and (typically) write their error messages to stderr just because it's there. I'm not sure that's relevant to the OP's situation or not. (Probably not.)

Tom

···

On Nov 1, 2006, at 11:30 AM, Robert Klemme wrote:

Tom Pollard wrote:

Anyway, you would only need nonblocking IO if you wanted to read bits of the stderr stream before the command exited, but that doesn't sound like what you're want.

Actually this is not correct: if there is a lot written to stderr then you need to read that concurrently. If you do not do that then the process will block on some stderr write operation that fills up the pipe and you get a deadlock because your code waits for process termination.

Robert_James · 5 November 2006 00:45

Yep, that's how I came across this problem initially.

Doing that (replace 'cat -' with my command) hung indefinetly.
Commenting out the stderr line fixed it. I assume that it was waiting
for something to write to stderr before progressing.

Now, you are correct that the external process had terminated. Why
that didn't close stderr and move on I do not know. More importantly -
is there a way to do what Tom is suggesting - that is, have Ruby move
on the second the external process terminates - that will work on
Windows as well?

(As an aside, kudos to the developers of popen4 - it's really great.)

Tom Pollard wrote:

···

On Oct 31, 2006, at 9:37 PM, Tom Pollard wrote:
> If you're working in Unix you could try
>
> status = POpen4::poen4('cat -') do |stdout, stderr, stdin|

D'oh! (typo) That should have been

   status = POpen4::popen4('cat -') do |stdout, stderr, stdin|
     stdin.puts 'hello world'
     stdin.close
     puts "stdout: #{stdout.read.strip}"
     puts "stderr: #{stderr.read.strip}"
   end

Tom

Vidar_Hokstad · 5 November 2006 10:00

The problem is that all of these "powerful abstractions" are
ridiculously slow compared to a well written nbio approach for many
types of applications. Particularly as long as Ruby's threading is so
abysmal.

Try writing a network server that needs to handle a high number of
concurrent connections, and you'll quickly find "select()" taking most
of your CPU if you use a model that makes use of threading and blocking
IO - your only real choice to get decent performance out of Ruby for
that kind of app is multiplexing the processing manually using nbio
(which is what Ruby is trying to do being the scenes, but fails
miserably at doing effectively once the number of threads gets high
enough) or fork instead which has it's own problems if you need to
share significant state.

This is from personal experience - I currently have a guy on my team
rewriting an important backend process because we started running into
those exact issues.

Even when Ruby's threading is sorted out so we won't run into these
problems, nbio will be vital for high performance network programming -
well done nbio reduces the number of syscalls, and thereby context
switches enormously.

Vidar

···

ara.t.howard@noaa.gov wrote:

it's exactly things like eventmachine that make me say using nbio is archaic -
i don't need to handle the complexities of nbio when powerful abstractions
like it exist!

Ara.T.Howard6 · 1 November 2006 21:02

Tom Pollard wrote:

Anyway, you would only need nonblocking IO if you wanted to read bits of
the stderr stream before the command exited, but that doesn't sound like
what you're want.

Actually this is not correct: if there is a lot written to stderr then you
need to read that concurrently. If you do not do that then the process
will block on some stderr write operation that fills up the pipe and you
get a deadlock because your code waits for process termination.

I guess I can see that, though I can't think of a program that I'd expect to be able to generate enough stderr output to clog a pipe.

did you check out my recent post (switched subjects) - it takes a suprisingly
small amount (4242 lines of output does it easily)!

In any case, my response would be to merge stdout and stderr, rather than
use non-blocking IO. If you're just reading one stream while the command is
executing, you don't need to worry about blocking. I'm certainly with Ara
in recommending that if you can avoid non-blocking IO, you should.

no argument there... but

At the risk of starting an unrelated discussion ("stderr considered
harmful"), my feeling has long been that stderr is misused by most people,
and that the only context in which it makes any sense is for small
commandline tools that you expect to use in a pipeline. For apps like that,
it's helpful to keep error messages out of your stdout stream. For most
apps, however, I don't think it makes any sense to write error messages to a
separate file. Error messages should be written to the app's main log file
or output file, where the user will be looking for their results. That way,
nonfatal error messages also appear naturally in the proper sequence with
other output. I work with a lot of scientist programmers who don't think
much about issues like this and (typically) write their error messages to
stderr just because it's there. I'm not sure that's relevant to the OP's
situation or not. (Probably not.)

i'm in the same boat as you (writing for scientists) and have found it's
nearly __always__ the case that a program can produce something useful in
stdout and therefore always log to stderr so that programs can be used in
pipes.

mostly i agree though.

cheers.

-a

···

On Thu, 2 Nov 2006, Tom Pollard wrote:

On Nov 1, 2006, at 11:30 AM, Robert Klemme wrote:

--
my religion is very simple. my religion is kindness. -- the dalai lama

Robert_K1 · 1 November 2006 22:10

Tom Pollard wrote:

Tom Pollard wrote:

Anyway, you would only need nonblocking IO if you wanted to read bits of the stderr stream before the command exited, but that doesn't sound like what you're want.

Actually this is not correct: if there is a lot written to stderr then you need to read that concurrently. If you do not do that then the process will block on some stderr write operation that fills up the pipe and you get a deadlock because your code waits for process termination.

I guess I can see that, though I can't think of a program that I'd expect to be able to generate enough stderr output to clog a pipe.

A typical pipe buffer size is 4k which can get filled pretty fast.

> In

any case, my response would be to merge stdout and stderr, rather than use non-blocking IO. If you're just reading one stream while the command is executing, you don't need to worry about blocking.

Merging both from outside the subprocess is certainly possible from Ruby (via a shell) but I am not sure, whether there is a portable solution (one of the popenN methods?).

> I'm

certainly with Ara in recommending that if you can avoid non-blocking IO, you should.

I second that.

At the risk of starting an unrelated discussion ("stderr considered harmful"), my feeling has long been that stderr is misused by most people, and that the only context in which it makes any sense is for small commandline tools that you expect to use in a pipeline. For apps like that, it's helpful to keep error messages out of your stdout stream. For most apps, however, I don't think it makes any sense to write error messages to a separate file. Error messages should be written to the app's main log file or output file, where the user will be looking for their results. That way, nonfatal error messages also appear naturally in the proper sequence with other output. I work with a lot of scientist programmers who don't think much about issues like this and (typically) write their error messages to stderr just because it's there. I'm not sure that's relevant to the OP's situation or not. (Probably not.)

All true. But if you do not know what the program does or how it is implemented you better deal with potential output to stderr (either by merging, see above, or by making sure that stderr and stdout are read) because otherwise the consequences might be somewhat catastrophic. And this could also mean that your program at some point in the future simply stops working because another piece of software has changed.

Kind regards

robert

···

On Nov 1, 2006, at 11:30 AM, Robert Klemme wrote:

Ara.T.Howard6 · 5 November 2006 03:56

Yep, that's how I came across this problem initially.

Doing that (replace 'cat -' with my command) hung indefinetly.
Commenting out the stderr line fixed it. I assume that it was waiting
for something to write to stderr before progressing.

Now, you are correct that the external process had terminated. Why
that didn't close stderr and move on I do not know. More importantly -
is there a way to do what Tom is suggesting - that is, have Ruby move
on the second the external process terminates - that will work on
Windows as well?

check out systemu

(As an aside, kudos to the developers of popen4 - it's really great.)

i am 99% positive that the implimentation of popen4 does not play well with
windows and may be impossible to make it do so. the systemu package i just
released is my attempt and an alternate implimentation. give it a while and
let me know how it goes.

regards.

-a

···

On Sun, 5 Nov 2006, S. Robert James wrote:
--
my religion is very simple. my religion is kindness. -- the dalai lama

Bill_Kelly · 5 November 2006 12:38

it's exactly things like eventmachine that make me say using nbio is archaic -
i don't need to handle the complexities of nbio when powerful abstractions
like it exist!

The problem is that all of these "powerful abstractions" are
ridiculously slow compared to a well written nbio approach for many
types of applications. Particularly as long as Ruby's threading is so
abysmal.

Sounds like you might want to actually take a look at eventmachine,
then.

Regards,

Bill

···

From: "Vidar Hokstad" <vidar.hokstad@gmail.com>

ara.t.howard@noaa.gov wrote:

Topic		Replies	Views
Non-blocking io ruby-talk	9	165	13 December 2003
State of blocking/nonblocking I/O ruby-talk	18	131	5 October 2005
Considering Ruby For a Networking Application ruby-talk	35	150	16 May 2006
"Dummy" IO object to push and pull data? ruby-talk	34	214	5 January 2010
Non blocking read and thread ruby-talk	9	189	27 March 2004

Nonblocking IO read

Related topics