ARGF.eof? behavior

Hi folks,

In Ruby 1.8, I know that:

$ ruby -e 'while !ARGF.eof?; puts ARGF.readline; end' /tmp/foo /tmp/bar

prints every line in /tmp/foo, but not /tmp/bar. However, in Ruby 1.9:

$ ruby1.9 -e 'p ARGF.eof?' /tmp/foo
true

Which means that lines from neither /tmp/foo nor /tmp/bar would be printed
in the first example. Is this an expected change in behavior? Seems to be
consistent for both 1.9.1p0 and the 1.9.2 svn trunk I just compiled.

If it is, it's not that big of a deal, except I'm not sure how to "switch
ARGF" to the next file without calling ARGF.gets or ARGF.readline.

For example, is there a method to complete the following code, so as to
print lines from all files listed in ARGV?

  while !ARGV.empty?
    # ARGF.some_method_to_advance_to_next_file

    while !ARGF.eof?
      puts ARGF.readline
    end
  end

I know I can use ARGF.each, gets, or readline. However, I'm really calling
a parsing method that takes ARGF as an argument and calls readline on my
behalf. I'd like to able to distinguish between EOFErrors due to reaching
EOF before parsing (no more data records), and EOFErrors due to reaching
EOF during parsing (an incomplete data record).

Thanks!

Hi,

···

2009/7/24 Mike Kasick <mkasick-rt@club.cc.cmu.edu>:

Hi folks,

In Ruby 1.8, I know that:

$ ruby -e 'while !ARGF.eof?; puts ARGF.readline; end' /tmp/foo /tmp/bar

prints every line in /tmp/foo, but not /tmp/bar. However, in Ruby 1.9:

$ ruby1.9 -e 'p ARGF.eof?' /tmp/foo
true

Which means that lines from neither /tmp/foo nor /tmp/bar would be printed
in the first example. Is this an expected change in behavior? Seems to be
consistent for both 1.9.1p0 and the 1.9.2 svn trunk I just compiled.

If it is, it's not that big of a deal, except I'm not sure how to "switch
ARGF" to the next file without calling ARGF.gets or ARGF.readline.

For example, is there a method to complete the following code, so as to
print lines from all files listed in ARGV?

while !ARGV.empty?
# ARGF.some_method_to_advance_to_next_file

while !ARGF.eof?
puts ARGF.readline
end
end

You can use ARGF.each for both 1.8.x and 1.9.x.

Try
$ ruby -e 'ARGF.each{|l|puts l}' /tmp/foo /tmp/bar

Regards,

Park Heesob

Right, I understand that this works in this particular example. Perhaps
a more indepth example helps illustrate the problem better.

Presume I have a method, "parse", that parses data records from an IO
stream. It looks something like this:

  def parse(io)
    first = io.readline
    ... # Code to validate first
    second = io.readline
    ...
    third = io.readline
    ...

    ParsedThing.new(first, second, third)
  end

The idea is to call "parse ARGF" only when I know there's data left in
the stream to be parsed. Otherwise if I get an EOFError its meaning is
ambiguous--there could be an incomplete data record (i.e., could parse
"first" and "second", but got an EOF while reading "third"), or there
could be no more records in the file.

ARGF.each isn't going to work since ARGF is being used as an external
iterator by the parse method, and calling ARGF.gets/readline outside the
parse method strips the first line of a record. I'm looking for a
non-destructive file advance operation, if that makes sense.

···

On Fri, Jul 24, 2009 at 11:42:00AM +0900, Heesob Park wrote:

You can use ARGF.each for both 1.8.x and 1.9.x.

Try
$ ruby -e 'ARGF.each{|l|puts l}' /tmp/foo /tmp/bar

Two things:

- Turns out Ruby 1.9's ARGF.eof? behavior was a bug, now fixed in svn
  trunk. A workaround is to call "ARGF.file" (or another ARGF accessor)
  before the while loop.

- ARGF.skip is the non-destructive file advance operation that I was
  looking for. ARGF.close works too, but can also close $stdin which may
  not be preferred. Problem is that neither currently works when followed
  by ARGF.eof?. I submitted a patch to fix that to the issue tracker.

Unfortunately this means that the behavior of ARGF with regard to
close/eof/skip changes somewhat between patchlevels on Ruby 1.8.7 & 1.9.1.

To answer my original question (and for the benefit of anyone else looking
to do something similar), the following code includes the appropriate
workarounds to work on, I believe, all 1.8/1.9 versions:

  # Print (or whatever) every line from all files listed in ARGV or $stdin.

  loop do
    current = ARGF.file

    while !ARGF.eof?
      puts ARGF.readline # Or whatever.
    end

    ARGF.skip.file == current and break
  end

···

On Fri, Jul 24, 2009 at 12:20:36PM +0900, Mike Kasick wrote:

I'm looking for a non-destructive file advance operation, if that makes
sense.