Does IO.read block?

Hi All,

I have a setup where I am writing to a pipe in one process and reading in another. I am closing the end I'm not using etc, but I have a new problem. Having recently read that I was just getting lucky when it came to my IO.write completing each time (probably due to Ruby's green thread implementation in v1.8 and 'nice' scheduling) but now that the OS has taken over scheduling, it tends to interrupt things when it damn well feels like it, so I believe some of my write calls aren't completing, so I've set up a loop like this:

rescue Exception => error
  ex_string = Marshal.dump(error)
  ex_size = ex_string.bytesize
  bytes_written = 0
  while bytes_written < ex_size
    bytes_written += write_end.write(ex_string.slice!(bytes_written))
  end
  ensure
    write_end.close
  end
end

and in the other process, I currently make just one call to read_end.read() so my question is, is this guaranteed to get all of the bytes written through multiple calls to write or do I need to set up a similar loop on the read end? like

while !(read_end.eof?)
    string += read_end.read
end

Or does read_end.read block until it finds EOF?

Can anyone tell me what happens when read_end.read is re-scheduled partway through? Or is that guaranteed to finish?
I'm running this in Linux, so posix rules probably apply.

Thanks in advance,
Michael

···

=======================================================================
This email, including any attachments, is only for the intended
addressee. It is subject to copyright, is confidential and may be
the subject of legal or other privilege, none of which is waived or
lost by reason of this transmission.
If the receiver is not the intended addressee, please accept our
apologies, notify us by return, delete all copies and perform no
other act on the email.
Unfortunately, we cannot warrant that the email has not been
altered or corrupted during transmission.

I have a setup where I am writing to a pipe in one process and reading in another. I am closing the end I'm not using etc, but I have a new problem. Having recently read that I was just getting lucky when it came to my IO.write completing each time (probably due to Ruby's green thread implementation in v1.8 and 'nice' scheduling) but now that the OS has taken over scheduling, it tends to interrupt things when it damn well feels like it, so I believe some of my write calls aren't completing, so I've set up a loop like this:

rescue Exception => error
  ex_string = Marshal.dump(error)
  ex_size = ex_string.bytesize
  bytes_written = 0
  while bytes_written < ex_size
    bytes_written += write_end.write(ex_string.slice!(bytes_written))
  end
  ensure
    write_end.close

You do not seem to open the stream in this context, why do you close it here? Or do you have a pattern like

write_end = ... # open
begin
   ...
rescue
   ...
ensure
   write_end.close
end

  end
end

IMHO you can simplify that do

rescue Exception => error
   Marshal.dump(error, write_end)
end

Which should also be more efficient since the looping is done in C code.

and in the other process, I currently make just one call to read_end.read() so my question is, is this guaranteed to get all of the bytes written through multiple calls to write or do I need to set up a similar loop on the read end? like

while !(read_end.eof?)
    string += read_end.read
end

Or does read_end.read block until it finds EOF?

Yes, that's what it does:
http://www.ruby-doc.org/core/classes/IO.html#M002295

But again, I would just do

obj = Marshal.read(read_end)

Which, again is more efficient since you do not need the additional buffer and looping is done in C code.

Can anyone tell me what happens when read_end.read is re-scheduled partway through? Or is that guaranteed to finish?

I am not sure what you mean by "rescheduled". Even if the OS preempts execution of this process or thread it will continue at the same location of the stream and semantics of the methods do not change.

Kind regards

  robert

···

On 15.03.2009 23:02, Michael Malone wrote:

Robert Klemme wrote:

rescue Exception => error
  ex_string = Marshal.dump(error)
  ex_size = ex_string.bytesize
  bytes_written = 0
  while bytes_written < ex_size
    bytes_written += write_end.write(ex_string.slice!(bytes_written))
  end
  ensure
    write_end.close

You do not seem to open the stream in this context, why do you close it here? Or do you have a pattern like

that code is inside the block passed to fork(), so the write_end needs to be closed so the parent process reads an EOF. The pipe is opened prior to the fork() call, so the fd should be copied to the child process, thus the parent process would block indefinitely waiting for the EOF.

IMHO you can simplify that do

rescue Exception => error
  Marshal.dump(error, write_end)
end

Which should also be more efficient since the looping is done in C code.

Thanks, I've applied that.

But again, I would just do

obj = Marshal.read(read_end)

Which, again is more efficient since you do not need the additional buffer and looping is done in C code.

I need the additional buffer because an exception is not always thrown so attempting to unmarshal an empty string causes problems

I am not sure what you mean by "rescheduled". Even if the OS preempts execution of this process or thread it will continue at the same location of the stream and semantics of the methods do not change.

Yes, I meant preempted.

Kind regards

    robert

My problem still exists. :frowning: I didn't get into the details of the specifics because I haven't proved where it exists. I just got excited when I saw that as a potential problem area and figured it needed fixing because it would bite me sooner or later.

Back to the drawing board until I have a better question...

Thanks,
Michael

···

=======================================================================
This email, including any attachments, is only for the intended
addressee. It is subject to copyright, is confidential and may be
the subject of legal or other privilege, none of which is waived or
lost by reason of this transmission.
If the receiver is not the intended addressee, please accept our
apologies, notify us by return, delete all copies and perform no
other act on the email.
Unfortunately, we cannot warrant that the email has not been
altered or corrupted during transmission.

Robert Klemme wrote:

···

On 15.03.2009 23:02, Michael Malone wrote:

Or does read_end.read block until it finds EOF?

Yes, that's what it does:
class IO - RDoc Documentation

The docs shouldn't be read as a Bible. The docs may not be very precise.
If IO#read is similar to C's read(), then it also returns when it had
read all the available bytes. See "man 2 read".
--
Posted via http://www.ruby-forum.com/\.

IMHO you can simplify that do

rescue Exception => error
  Marshal.dump(error, write_end)
end

Which should also be more efficient since the looping is done in C code.

Thanks, I've applied that.

But again, I would just do

obj = Marshal.read(read_end)

Which, again is more efficient since you do not need the additional buffer and looping is done in C code.

I need the additional buffer because an exception is not always thrown so attempting to unmarshal an empty string causes problems

Wait, I just re-read your email. The looping is done in C. So that implies I do need to loop on the read_end.read() call? Efficiency is not a massive concern for me - correctness is.

···

=======================================================================
This email, including any attachments, is only for the intended
addressee. It is subject to copyright, is confidential and may be
the subject of legal or other privilege, none of which is waived or
lost by reason of this transmission.
If the receiver is not the intended addressee, please accept our
apologies, notify us by return, delete all copies and perform no
other act on the email.
Unfortunately, we cannot warrant that the email has not been
altered or corrupted during transmission.

Robert Klemme wrote:

rescue Exception => error
  ex_string = Marshal.dump(error)
  ex_size = ex_string.bytesize
  bytes_written = 0
  while bytes_written < ex_size
    bytes_written += write_end.write(ex_string.slice!(bytes_written))
  end
  ensure
    write_end.close

You do not seem to open the stream in this context, why do you close it here? Or do you have a pattern like

that code is inside the block passed to fork(), so the write_end needs to be closed so the parent process reads an EOF. The pipe is opened prior to the fork() call, so the fd should be copied to the child process, thus the parent process would block indefinitely waiting for the EOF.

Ah, I see.

IMHO you can simplify that do

rescue Exception => error
  Marshal.dump(error, write_end)
end

Which should also be more efficient since the looping is done in C code.

Thanks, I've applied that.

But again, I would just do

obj = Marshal.read(read_end)

Which, again is more efficient since you do not need the additional buffer and looping is done in C code.

I need the additional buffer because an exception is not always thrown so attempting to unmarshal an empty string causes problems

Well, you could just serialize nil or the result of the computation in the OK case. That simplifies handling in the calling process plus you can get the result as a Ruby object. You then only need to test whether what you got is an instance of Exception or not.

My problem still exists. :frowning: I didn't get into the details of the specifics because I haven't proved where it exists. I just got excited when I saw that as a potential problem area and figured it needed fixing because it would bite me sooner or later.

One problem that could hit you is that you might not get the concurrency right: you need to make sure that stderr and stdout are read concurrently (if there can be output on both) because if the pipe on one of them fills up your child process is blocked. Maybe your issue lies in that area.

Back to the drawing board until I have a better question...

:slight_smile:

Wait, I just re-read your email. The looping is done in C. So that implies I do need to loop on the read_end.read() call? Efficiency is not a massive concern for me - correctness is.

No, you do not need to loop for IO#read since the docs say it will read all the way to the end of the stream when leaving out the length argument.

Kind regards

  robert

···

On 16.03.2009 00:17, Michael Malone wrote:

In this particular case the docs are correct to my knowledge. In other words IO#read *will* block until the full stream is read (or an error condition occurs of course).

Cheers

  robert

···

On 16.03.2009 22:12, Albert Schlef wrote:

Robert Klemme wrote:

On 15.03.2009 23:02, Michael Malone wrote:

Or does read_end.read block until it finds EOF?

Yes, that's what it does:
class IO - RDoc Documentation

The docs shouldn't be read as a Bible. The docs may not be very precise. If IO#read is similar to C's read(), then it also returns when it had read all the available bytes. See "man 2 read".

One problem that could hit you is that you might not get the concurrency right: you need to make sure that stderr and stdout are read concurrently (if there can be output on both) because if the pipe on one of them fills up your child process is blocked. Maybe your issue lies in that area.

I have set STDOUT.sync = true Is that enough? It should be, because I don't have any output originating from the child processes or any child threads, only the parent/calling thread. But I've been wrong before...

Though I think I have narrowed the problem _very_ slightly. The thread/process pair that blocks indefinitely blocks on the read_end.read() call. Which is weird and annoying. My next task will be to set up some logging to figure out if there should have been an exception to chuck down the pipe. (In my test case I have a random amount of jobs, in which every second one raises an exception)

Thanks a lot for your help! It's certainly possible that this is going to be my "Stupidly Difficult Bug When I Was Near The Beginning Of My First Programming Job" that everyone seems to have.

Michael

···

=======================================================================
This email, including any attachments, is only for the intended
addressee. It is subject to copyright, is confidential and may be
the subject of legal or other privilege, none of which is waived or
lost by reason of this transmission.
If the receiver is not the intended addressee, please accept our
apologies, notify us by return, delete all copies and perform no
other act on the email.
Unfortunately, we cannot warrant that the email has not been
altered or corrupted during transmission.

One problem that could hit you is that you might not get the concurrency right: you need to make sure that stderr and stdout are read concurrently (if there can be output on both) because if the pipe on one of them fills up your child process is blocked. Maybe your issue lies in that area.

I have set STDOUT.sync = true Is that enough?

No, definitively not! This covers the *sending* side but the locking issue I described is caused by the receiving side not reading all the data. Assume an application as a child process that frequently writes to stdout and stderr. The parent process at the other end of those two (!) pipes only reads the stdout pipe. Now what happens is this: once the pipe's buffer (OS and configuration dependent, often 4k) fills up the write call blocks and the child hangs - indefinitely.

It should be, because I don't have any output originating from the child processes or any child threads, only the parent/calling thread. But I've been wrong before...

Though I think I have narrowed the problem _very_ slightly. The thread/process pair that blocks indefinitely blocks on the read_end.read() call. Which is weird and annoying.

And you are sure you close stdout in the client or the client terminates?

My next task will be to set up some logging to figure out if there should have been an exception to chuck down the pipe. (In my test case I have a random amount of jobs, in which every second one raises an exception)

What I usually do in these situations that can be a bit tricky: I start out writing a simple test scenario that I modify step by step until it resembles the problem I want to solve _and_ does not exhibit abnormal behavior.

Thanks a lot for your help! It's certainly possible that this is going to be my "Stupidly Difficult Bug When I Was Near The Beginning Of My First Programming Job" that everyone seems to have.

LOL

  robert

···

On 16.03.2009 21:00, Michael Malone wrote:

Robert Klemme wrote:

One problem that could hit you is that you might not get the concurrency right: you need to make sure that stderr and stdout are read concurrently (if there can be output on both) because if the pipe on one of them fills up your child process is blocked. Maybe your issue lies in that area.

I have set STDOUT.sync = true Is that enough?

No, definitively not! This covers the *sending* side but the locking issue I described is caused by the receiving side not reading all the data. Assume an application as a child process that frequently writes to stdout and stderr. The parent process at the other end of those two (!) pipes only reads the stdout pipe. Now what happens is this: once the pipe's buffer (OS and configuration dependent, often 4k) fills up the write call blocks and the child hangs - indefinitely.

It should be, because I don't have any output originating from the child processes or any child threads, only the parent/calling thread. But I've been wrong before...

Though I think I have narrowed the problem _very_ slightly. The thread/process pair that blocks indefinitely blocks on the read_end.read() call. Which is weird and annoying.

And you are sure you close stdout in the client or the client terminates?

        pw = IO::pipe
        pr = IO::pipe
        pe = IO::pipe
        read_end,write_end = IO.pipe
        pid = fork() do
           begin
              pw[1].close
              STDIN.reopen(pw[0])
              pw[0].close

              pr[0].close
              STDOUT.reopen(pr[1])
              pr[1].close

              pe[0].close
              STDERR.reopen(pe[1])
              pe[1].close

That's the code I used to make sure STDOUT, STDIN and STDERR weren't causing full buffer hangs. I borrowed it from a process detach method elsewhere, so it might not be exactly what I want. However, it had no effect. The problem still exists.

Righto, here's the deal: Either the fork call doesn't work (but doesn't fail either -> a valid pid is returned)
if I do a puts either side of the fork() (with and without the STDOUT re-direction) then on the process that blocks, the puts just inside the fork() doesn't run. This indicates that the fork() call fails, however it doesn't return nil like the docs say.

An interesting effect I noticed, is if I remove my call to Process.wait(pid) then it displays almost identical behaviour! It runs perfectly some times, and occasionally just hangs. However, if I'm not collecting child processes, then I would expect to hit my ulimit at the same place every time and for the fork() call to fail noisily. The docs don't mention anything on this. Is it possible that my child processes aren't being collected correctly, thus the rlimit is being hit, but Ruby *believes* the process is collected, so its internal rlimit is not reached, so it calls fork() regardless? Is that at all sane? Though presumably, the external fork() call *would* fail, causing the internal call to fail. And yes, I'm speaking as if the two are different, but they might not be. I suppose it doesn't really make sense to duplicate it. In fact, the call directly after the block passed to fork() doesn't execute. Is it possible that my Thread is hanging on the fork() call? What would that indicate? How could I fix it?

No straw will be left un-grasped at!

Michael

···

On 16.03.2009 21:00, Michael Malone wrote:

=======================================================================
This email, including any attachments, is only for the intended
addressee. It is subject to copyright, is confidential and may be
the subject of legal or other privilege, none of which is waived or
lost by reason of this transmission.
If the receiver is not the intended addressee, please accept our
apologies, notify us by return, delete all copies and perform no
other act on the email.
Unfortunately, we cannot warrant that the email has not been
altered or corrupted during transmission.

Robert Klemme wrote:

One problem that could hit you is that you might not get the concurrency right: you need to make sure that stderr and stdout are read concurrently (if there can be output on both) because if the pipe on one of them fills up your child process is blocked. Maybe your issue lies in that area.

I have set STDOUT.sync = true Is that enough?

No, definitively not! This covers the *sending* side but the locking issue I described is caused by the receiving side not reading all the data. Assume an application as a child process that frequently writes to stdout and stderr. The parent process at the other end of those two (!) pipes only reads the stdout pipe. Now what happens is this: once the pipe's buffer (OS and configuration dependent, often 4k) fills up the write call blocks and the child hangs - indefinitely.

It should be, because I don't have any output originating from the child processes or any child threads, only the parent/calling thread. But I've been wrong before...

Though I think I have narrowed the problem _very_ slightly. The thread/process pair that blocks indefinitely blocks on the read_end.read() call. Which is weird and annoying.

And you are sure you close stdout in the client or the client terminates?

        pw = IO::pipe
        pr = IO::pipe
        pe = IO::pipe
        read_end,write_end = IO.pipe

Why do you open four pipes when three are sufficient?

        pid = fork() do
           begin
              pw[1].close
              STDIN.reopen(pw[0])
              pw[0].close

              pr[0].close
              STDOUT.reopen(pr[1])
              pr[1].close

              pe[0].close
              STDERR.reopen(pe[1])
              pe[1].close

That's the code I used to make sure STDOUT, STDIN and STDERR weren't causing full buffer hangs.

This code by no means does something about buffer hangs. You just make sure that stdin, stdout and stderr are redirected to pipes.

Btw, do you also close other ends in the parent?

I borrowed it from a process detach method elsewhere, so it might not be exactly what I want. However, it had no effect. The problem still exists.

I do not know whether you plan to exec your forked process but if so you can make your life much easier by using Open3. Then you can do

# from memory
require 'open3'
Open3.popen "cmd args" do |in,out,err|
   out.puts "foo"
end

Righto, here's the deal: Either the fork call doesn't work (but doesn't fail either -> a valid pid is returned)
if I do a puts either side of the fork() (with and without the STDOUT re-direction) then on the process that blocks, the puts just inside the fork() doesn't run. This indicates that the fork() call fails, however it doesn't return nil like the docs say.

I am not sure I can follow your logic. Failure of fork can be recognized by an exception (in the unlikely case that your OS cannot manage more processes or won't allow you to have another one).

I have to agree that your reasoning is not fully clear to me. Using output operations to determine failure or success of a fork call seems to be using the wrong tool for the job.

An interesting effect I noticed, is if I remove my call to Process.wait(pid) then it displays almost identical behaviour! It runs perfectly some times, and occasionally just hangs. However, if I'm not collecting child processes, then I would expect to hit my ulimit at the same place every time and for the fork() call to fail noisily. The docs don't mention anything on this. Is it possible that my child processes aren't being collected correctly, thus the rlimit is being hit, but Ruby *believes* the process is collected, so its internal rlimit is not reached, so it calls fork() regardless?

Could just mean that you create new processes faster than old ones are closed.

Is that at all sane? Though presumably, the external fork() call *would* fail, causing the internal call to fail. And yes, I'm speaking as if the two are different, but they might not be. I suppose it doesn't really make sense to duplicate it. In fact, the call directly after the block passed to fork() doesn't execute. Is it possible that my Thread is hanging on the fork() call? What would that indicate? How could I fix it?

I'd rather ask: what is the problem you are trying to solve? I still lack the big picture - we're talking about details here all the time but it's difficult to stay on track when the context is missing.

Cheers

  robert

···

On 17.03.2009 00:03, Michael Malone wrote:

On 16.03.2009 21:00, Michael Malone wrote:

I borrowed it from a process detach method elsewhere, so it might not be exactly what I want. However, it had no effect. The problem still exists.

I do not know whether you plan to exec your forked process but if so you can make your life much easier by using Open3. Then you can do

# from memory
require 'open3'
Open3.popen "cmd args" do |in,out,err|
  out.puts "foo"
end

I'm not running a command as such, I'm running a block of code. Also, I have come to the conclusion that I actually don't care about the output in the child process. Is there an easy way to redirect to /dev/null?

An interesting effect I noticed, is if I remove my call to Process.wait(pid) then it displays almost identical behaviour! It runs perfectly some times, and occasionally just hangs. However, if I'm not collecting child processes, then I would expect to hit my ulimit at the same place every time and for the fork() call to fail noisily. The docs don't mention anything on this. Is it possible that my child processes aren't being collected correctly, thus the rlimit is being hit, but Ruby *believes* the process is collected, so its internal rlimit is not reached, so it calls fork() regardless?

Could just mean that you create new processes faster than old ones are closed.

This is almost certainly true. I didn't think of that. And I finally worked out how to use ulimit, realising the processes are per user and in the realm of 26000. You're right, it probably isn't that.

I'd rather ask: what is the problem you are trying to solve? I still lack the big picture - we're talking about details here all the time but it's difficult to stay on track when the context is missing.

Sorry, I was trying to ask specific questions rather than asking the good folks on ruby-talk to debug my script for me entirely. I am writing a thread pool, which accepts jobs as block parameters and by calling run_process(&block) the job is run in an entirely new process, with the thread from the pool calling fork(), creating a pipe to transfer any exception objects back to the parent (hence the 4th pipe) and then waiting to collect the child thread's status. The main use of this is for compiling concurrently with the thread pool, then linking concurrently with the process option.

Michael

···

=======================================================================
This email, including any attachments, is only for the intended
addressee. It is subject to copyright, is confidential and may be
the subject of legal or other privilege, none of which is waived or
lost by reason of this transmission.
If the receiver is not the intended addressee, please accept our
apologies, notify us by return, delete all copies and perform no
other act on the email.
Unfortunately, we cannot warrant that the email has not been
altered or corrupted during transmission.

I borrowed it from a process detach method elsewhere, so it might not be exactly what I want. However, it had no effect. The problem still exists.

I do not know whether you plan to exec your forked process but if so you can make your life much easier by using Open3. Then you can do

# from memory
require 'open3'
Open3.popen "cmd args" do |in,out,err|
  out.puts "foo"
end

I'm not running a command as such, I'm running a block of code. Also, I have come to the conclusion that I actually don't care about the output in the child process. Is there an easy way to redirect to /dev/null?

Please take a look at http://pastie.org/419160 for an answer to that as well as experimental script.

I'd rather ask: what is the problem you are trying to solve? I still lack the big picture - we're talking about details here all the time but it's difficult to stay on track when the context is missing.

Sorry, I was trying to ask specific questions rather than asking the good folks on ruby-talk to debug my script for me entirely. I am writing a thread pool, which accepts jobs as block parameters and by calling run_process(&block) the job is run in an entirely new process, with the thread from the pool calling fork(), creating a pipe to transfer any exception objects back to the parent (hence the 4th pipe) and then waiting to collect the child thread's status. The main use of this is for compiling concurrently with the thread pool, then linking concurrently with the process option.

But in that case you only need a single pipe because all you want to transport back is the result of calculation, correct? And you could leave stderr untouched to get any error messages that you do not expect to the console.

Another alternative would be to use DRb for the communication.

Cheers

  robert

···

On 17.03.2009 21:09, Michael Malone wrote:

Here's another pastie that demonstrates a common error causing a deadlock:

http://pastie.org/419533

Cheers

robert

···

--
remember.guy do |as, often| as.you_can - without end

Robert Klemme wrote:

Here's another pastie that demonstrates a common error causing a deadlock:

http://pastie.org/419533

Cheers

robert

Thanks, but I have a new theory. I broke everything down to be as atomic as possible, which included pulling in the Synchronised Queue from thread.rb and modifying it slightly to lock the mutex then unlock when necessary, so blocks were being passed as parameters as little as possible. (As they are quite removed from being an atomic operation). And very nearly immediately, ruby's deadlock protection kicked in. What happens is this:

In the main thread, I create the jobs and put them on the queue. Depending on how things run (why it only happens sometimes) if the worker threads are scheduled more favourably, then they will complete, but not die (as they are blocking on queue, waiting for new jobs) meaning that each thread, except for the main thread is asleep. I then call join() in the main thread meaning the main thread sleeps before the others have a chance to wakeup and finish, at which point ruby detects a deadlock, even though its next operation should be to wakeup a new thread. At this point a "Bug Report: Unlocking mutex must be NULL" is generated by the interpreter and then hastily exits.

As soon as I have a smaller, specific case I will be filing a bug report. Thanks for all your help, but I think I have it now.

Michael

···

=======================================================================
This email, including any attachments, is only for the intended
addressee. It is subject to copyright, is confidential and may be
the subject of legal or other privilege, none of which is waived or
lost by reason of this transmission.
If the receiver is not the intended addressee, please accept our
apologies, notify us by return, delete all copies and perform no
other act on the email.
Unfortunately, we cannot warrant that the email has not been
altered or corrupted during transmission.

As soon as I have a smaller, specific case I will be filing a bug report.

But make sure that you file the bug report against the right piece of software. Could well be an error in _your_ code. :slight_smile:

Thanks for all your help, but I think I have it now.

You're welcome!

Cheers

  robert

···

On 18.03.2009 19:59, Michael Malone wrote: