Reading from an external process with IO.popen

Hello all,

I'm trying to wrap my head around IO.popen with some simple examples
that send data to and read data from an
external process. I've create a sample case in the shell like this:

$ { echo hello ; sleep 2 ; echo world; } | cat
hello
world

I've written the same in ruby like so, which works:

$ cat foo.rb
#!/usr/bin/env ruby
if $0 == __FILE__
cat = IO.popen("cat", "w+") ;
cat.puts("hello, ") ;
puts(cat.gets) ;
sleep 2 ;
cat.puts("world") ;
puts(cat.gets) ;
end

$ ./foo.rb
hello
world

However, if I change the cat command to a sed command, the ruby
version no longer works. The command-line equivalent does work, but
the ruby version waits forever and has to be interrupted:

$ { echo hello ; sleep 2 ; echo world; } | sed -ne p
hello
world

$ cat foo.rb
#!/usr/bin/env ruby
if $0 == __FILE__
cat = IO.popen("sed -ne p", "w+") ;
cat.puts("hello, ") ;
puts(cat.gets) ;
sleep 2 ;
cat.puts("world") ;
puts(cat.gets) ;
end

$ ./foo.rb
./foo.rb:6:in `gets': Interrupt
from ./foo.rb:6

Why does ruby work in the first case but wait forever in the second?

Using this version of ruby:

$ ruby -v
ruby 1.8.6 (2007-09-24 patchlevel 111) [i486-linux]

Thanks in advance for any pointers to references.

Regards,
- Robert

That's probably because you do not close the write end of the pipe in
Ruby code. Also, it's better to place the reading portion in a
separate thread in order to prevent deadlocks. And, please use the
block form of IO.popen which is more robust.

Try this pattern:

IO.popen("cat", "w+") do |cat|
  # background output
  t = Thread.new { cat.each {|l| puts l} }

  # main work
  cat.puts "hello, "
  sleep 2
  cat.puts "world"

  # terminate processing:
  cat.close_write
  t.join
end

Kind regards

robert

···

2009/11/5 Robert Citek <robert.citek@gmail.com>:

Hello all,

I'm trying to wrap my head around IO.popen with some simple examples
that send data to and read data from an
external process. I've create a sample case in the shell like this:

$ { echo hello ; sleep 2 ; echo world; } | cat
hello
world

I've written the same in ruby like so, which works:

$ cat foo.rb
#!/usr/bin/env ruby
if $0 == __FILE__
cat = IO.popen("cat", "w+") ;
cat.puts("hello, ") ;
puts(cat.gets) ;
sleep 2 ;
cat.puts("world") ;
puts(cat.gets) ;
end

$ ./foo.rb
hello
world

However, if I change the cat command to a sed command, the ruby
version no longer works. The command-line equivalent does work, but
the ruby version waits forever and has to be interrupted:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Perhaps, but what if I don't want to close the pipe? That is, I would
like to keep the pipe open so that I can send some data, read some
data and work on it, send some more data, read some more data and work
on it, etc. much like the process was a service, e.g. database. I am
trying to code the equivalent of a Call and Response. My examples
using cat and sed are just stand-ins for the real program.

BTW, the cat example works as expected, but the using sed doesn't
work. That is, there is no output from sed until the pipe closes.
There seems to be some buffering going on. I'm guessing it's from the
Ruby side since I don't see this when run from the shell. But that's
just a guess.

Of course, it's entirely possible that IO.popen is not the "right" way
to tackle this and I have not discovered the Ruby way, yet.

Again, any pointers in the right direction are greatly appreciated.

Regards,
- Robert

···

On Fri, Nov 6, 2009 at 5:15 AM, Robert Klemme <shortcutter@googlemail.com> wrote:

2009/11/5 Robert Citek <robert.citek@gmail.com>:

However, if I change the cat command to a sed command, the ruby
version no longer works. The command-line equivalent does work, but
the ruby version waits forever and has to be interrupted:

That's probably because you do not close the write end of the pipe in
Ruby code.

However, if I change the cat command to a sed command, the ruby
version no longer works. The command-line equivalent does work, but
the ruby version waits forever and has to be interrupted:

That's probably because you do not close the write end of the pipe in
Ruby code.

Perhaps, but what if I don't want to close the pipe? That is, I would
like to keep the pipe open so that I can send some data, read some
data and work on it, send some more data, read some more data and work
on it, etc. much like the process was a service, e.g. database. I am
trying to code the equivalent of a Call and Response. My examples
using cat and sed are just stand-ins for the real program.

If the program you are using does not cooperate you're out of luck. For example, if it assigns a huge read buffer then you might have to send hundreds of lines before it even starts processing the first one. I have no idea how the implementation of sed that you are using does it but if you for example think of sort you _cannot_ get any output before the last line has been written and the write end of the pipe has been closed.

BTW, the cat example works as expected, but the using sed doesn't
work. That is, there is no output from sed until the pipe closes.
There seems to be some buffering going on. I'm guessing it's from the
Ruby side since I don't see this when run from the shell. But that's
just a guess.

The shell closes the pipe as well. It is sed that is doing the buffering and you have no control over it unless it provides an option to control this.

Of course, it's entirely possible that IO.popen is not the "right" way
to tackle this and I have not discovered the Ruby way, yet.

No, it's the right way but your expectations cannot be met in all cases.

Kind regards

  robert

···

On 11/06/2009 04:01 PM, Robert Citek wrote:

On Fri, Nov 6, 2009 at 5:15 AM, Robert Klemme > <shortcutter@googlemail.com> wrote:

2009/11/5 Robert Citek <robert.citek@gmail.com>:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

BTW, the cat example works as expected, but the using sed doesn't
work. That is, there is no output from sed until the pipe closes.
There seems to be some buffering going on. I'm guessing it's from the
Ruby side since I don't see this when run from the shell. But that's
just a guess.

The shell closes the pipe as well. It is sed that is doing the buffering
and you have no control over it unless it provides an option to control
this.

Yes, it appears that the external program is controlling the
buffering. When I tried the same process with the program I really
wanted to use, IO.popen worked pretty much the way I wanted it to.
The pattern was this:

foo = io.popen("external_program", "w+")
while data = gets
  prepare data
  foo.puts(data)
  while not end of record
    newdata += foo.readlines
  end
  process newdata
end
foo.close

Turns out that the program I used has a signal to signify the end of a
chunk of data. So the program knows when I am finished sending data
and it can start crunching away. And I know when I can stop reading
data from the pipe and begin processing it. This saves the time of
repeatedly having to open and close the pipe.

Of course, it's entirely possible that IO.popen is not the "right" way
to tackle this and I have not discovered the Ruby way, yet.

No, it's the right way but your expectations cannot be met in all cases.

It's nice to know that I'm at least on the right track, or on one of
many possible right tracks.

Thanks for your help.

Regards,
- Robert

···

On Fri, Nov 6, 2009 at 12:50 PM, Robert Klemme <shortcutter@googlemail.com> wrote:

On 11/06/2009 04:01 PM, Robert Citek wrote: