Reading and EOF on Win2K vs. OS X

I'm writing a book on Ruby for testers. The first example is writing a souped-up "tail -f" program that'll be useful for exploratory testing. Here's the very first implementation:

open_file = File.open('the-log')
loop do
   puts open_file.readlines
   sleep 1
end

This works on Windows (Win2K, Ruby 1.8.1). If another program is writing steadily to 'the-log', this program steadily prints out new lines.

It does not work on OS X (and, I'm guessing, other versions of Unix). Once readlines hits the end of file, it never returns anything but []. I believe this happens because readlines() eventually ends up in io.c/appendline and calls getc(3). Because feof(3), once set, is sticky until clearerr(3) is called, any additions to the file are ignored.

(sysread, which ends up calling read(2), avoids this consequence of stdio behavior.)

There's a workaround: do something that calls clearerr(3) by side effect. This is the first one I found, though there may be more:

  open_file.pos = open_file.pos

I don't want to have to explain that to a tester just learning programming. I can hide this inside a required file that overrides File#readlines, but it's still kludgy and some set of readers will discover the kludge quickly.

It doesn't seem like the behavior should be inconsistent on Unix and Windows - you can't write a portable program that depends on either behavior. Of the two behaviors, the Windows one seems better to me: it lets you do more things more easily.

An alternative would be to have an IO#clear_eof message. Because of the #ifdefness of io.c, that's a safer route.

RCR?

···

-----
Brian Marick, independent consultant
Mostly on agile methods with a testing slant
www.exampler.com, www.testing.com/cgi-bin/blog
Book in progress: www.exampler.com/book

To my recollection, 'tail' is written with 3 different strategies for different OSes. I do not believe a simple semantic change will fill the bill. It really is an application type function.
Dan

···

On May 6, 2005, at 12:21, Brian Marick wrote:

I'm writing a book on Ruby for testers. The first example is writing a souped-up "tail -f" program that'll be useful for exploratory testing. Here's the very first implementation:

open_file = File.open('the-log')
loop do
  puts open_file.readlines
  sleep 1
end

This works on Windows (Win2K, Ruby 1.8.1). If another program is writing steadily to 'the-log', this program steadily prints out new lines.

It does not work on OS X (and, I'm guessing, other versions of Unix). Once readlines hits the end of file, it never returns anything but . I believe this happens because readlines() eventually ends up in io.c/appendline and calls getc(3). Because feof(3), once set, is sticky until clearerr(3) is called, any additions to the file are ignored.

(sysread, which ends up calling read(2), avoids this consequence of stdio behavior.)

There's a workaround: do something that calls clearerr(3) by side effect. This is the first one I found, though there may be more:

  open_file.pos = open_file.pos

I don't want to have to explain that to a tester just learning programming. I can hide this inside a required file that overrides File#readlines, but it's still kludgy and some set of readers will discover the kludge quickly.

It doesn't seem like the behavior should be inconsistent on Unix and Windows - you can't write a portable program that depends on either behavior. Of the two behaviors, the Windows one seems better to me: it lets you do more things more easily.

An alternative would be to have an IO#clear_eof message. Because of the #ifdefness of io.c, that's a safer route.

RCR?

-----
Brian Marick, independent consultant
Mostly on agile methods with a testing slant
www.exampler.com, www.testing.com/cgi-bin/blog
Book in progress: Driving Software Projects With Examples

In article <3c110c14125e11f40819bf4bd8401b85@visibleworkings.com>,
  Brian Marick <marick@visibleworkings.com> writes:

I'm writing a book on Ruby for testers. The first example is writing a
souped-up "tail -f" program that'll be useful for exploratory testing.
Here's the very first implementation:

open_file = File.open('the-log')
loop do
   puts open_file.readlines
   sleep 1
end

This works on Windows (Win2K, Ruby 1.8.1). If another program is
writing steadily to 'the-log', this program steadily prints out new
lines.

It does not work on OS X (and, I'm guessing, other versions of Unix).

It depends on stdio implementation.

Some stdio implementations such as 4.4BSD's causes EOF if EOF is
already occured. In same situatoin, some other stdio implementations
such as glibc and Solaris 2.8 doesn't cause EOF.

#include <stdio.h>

int main()
{
  FILE *f, *t;

  if ((f = fopen("tst.tmp", "w")) == NULL) {
    perror("fopen");
    return 1;
  }
  fclose(f);

  if ((t = fopen("tst.tmp", "r")) == NULL) {
    perror("fopen");
    return 1;
  }

  printf("feof: %d ", feof(t));
  printf("getc: %d\n", getc(t));

  if ((f = fopen("tst.tmp", "a")) == NULL) {
    perror("fopen");
    return 1;
  }
  putc('a', f);
  fclose(f);

  printf("feof: %d ", feof(t));
  printf("getc: %d\n", getc(t));
  
  fclose(t);
  
  return 0;
}

NetBSD 2.0:
feof: 0 getc: -1
feof: 1 getc: -1

Debian GNU/Linux (sarge):
feof: 0 getc: -1
feof: 1 getc: 97

Solaris 2.8:
feof: 0 getc: -1
feof: 16 getc: 97

I expect OS X just behaves like NetBSD 2.0.

Once readlines hits the end of file, it never returns anything but .
I believe this happens because readlines() eventually ends up in
io.c/appendline and calls getc(3). Because feof(3), once set, is sticky
until clearerr(3) is called, any additions to the file are ignored.

Some stdio implementations reads file contents even if feof(3) is true.

There's a workaround: do something that calls clearerr(3) by side
effect. This is the first one I found, though there may be more:

  open_file.pos = open_file.pos

I don't want to have to explain that to a tester just learning
programming. I can hide this inside a required file that overrides
File#readlines, but it's still kludgy and some set of readers will
discover the kludge quickly.

Agreed.

It doesn't seem like the behavior should be inconsistent on Unix and
Windows - you can't write a portable program that depends on either
behavior. Of the two behaviors, the Windows one seems better to me: it
lets you do more things more easily.

Ruby 1.9 should work well even on 4.4BSD because Ruby 1.9 doesn't rely
stdio buffering.

For Ruby 1.8, how about following patch?

Index: io.c

···

===================================================================
RCS file: /src/ruby/io.c,v
retrieving revision 1.246.2.72
diff -u -r1.246.2.72 io.c
--- io.c 28 Feb 2005 02:45:19 -0000 1.246.2.72
+++ io.c 7 May 2005 09:43:30 -0000
@@ -742,7 +742,6 @@
     GetOpenFile(io, fptr);
     rb_io_check_readable(fptr);

- if (feof(fptr->f)) return Qtrue;
     if (READ_DATA_PENDING(fptr->f)) return Qfalse;
     READ_CHECK(fptr->f);
     TRAP_BEG;
@@ -1045,7 +1044,6 @@
     off_t siz = BUFSIZ;
     off_t pos;

- if (feof(fptr->f)) return 0;
     if (fstat(fileno(fptr->f), &st) == 0 && S_ISREG(st.st_mode)
#ifdef __BEOS__
   && (st.st_dev > 3)
@@ -1086,7 +1084,10 @@
   rb_str_unlocktmp(str);
   if (n == 0 && bytes == 0) {
       if (!fptr->f) break;
- if (feof(fptr->f)) break;
+ if (feof(fptr->f)) {
+ clearerr(fptr->f);
+ break;
+ }
       if (!ferror(fptr->f)) break;
       rb_sys_fail(fptr->path);
   }
@@ -1275,7 +1276,6 @@

     GetOpenFile(io, fptr);
     rb_io_check_readable(fptr);
- if (feof(fptr->f)) return Qnil;
     if (len == 0) return str;

     rb_str_locktmp(str);
@@ -1288,6 +1288,7 @@
     if (n == 0) {
   if (!fptr->f) return Qnil;
   if (feof(fptr->f)) {
+ clearerr(fptr->f);
       rb_str_resize(str, 0);
       return Qnil;
   }
@@ -1365,6 +1366,7 @@
         rb_sys_fail(fptr->path);
     continue;
       }
+ clearerr(fptr->f);
#ifdef READ_DATA_PENDING_PTR
       return c;
#endif
@@ -1440,6 +1442,9 @@
       return Qtrue;
   }
     } while (c != EOF);
+ if (!ferror(f)) {
+ clearerr(f);
+ }
     return Qfalse;
}

@@ -1806,6 +1811,7 @@
         rb_sys_fail(fptr->path);
     continue;
       }
+ clearerr(f);
       break;
   }
   rb_yield(INT2FIX(c & 0xff));
@@ -1851,6 +1857,7 @@
     rb_sys_fail(fptr->path);
       goto retry;
   }
+ clearerr(f);
   return Qnil;
     }
     return INT2FIX(c & 0xff);
--
Tanaka Akira