DATA object and processes; unexpected problem

I have have several forked processes simultateously accessing the data
area with the following function, and it all goes horribly wrong.

def extract(dest,name,mode)
User.asroot {
start=0
fn = path(dest,name)
File.open(fn,File::CREAT|File::WRONLY,mode) { |file|
DATA.rewind
DATA.each { |line|
if start==0
start=1 if line =~ %r{^=begin #{name} -}
else
break if line =~ %r{^=end -
}
yield line if block_given?
file.puts line
end
}
}
}
end

Each process gets partial or corrupt data. Now it obviously has
something to do with the DATA object being shared between processes
somehow (there is presumably a file handle in there somewhere), but how
can I fix it?

Andrew Walrond

Hi,

···

At Thu, 12 Jun 2003 09:31:37 +0900, Andrew Walrond wrote:

Each process gets partial or corrupt data. Now it obviously has
something to do with the DATA object being shared between processes
somehow (there is presumably a file handle in there somewhere), but
how can I fix it?

Shared file descriptors share the position even between
processes. You can duplicate DATA.


Nobu Nakada

Yes I know, so the first thing I tried was

def extract(dest,name,mode)

 data = DATA.dup

 User.asroot {
   start=0
   fn = path(dest,name)
   File.open(fn,File::CREAT|File::WRONLY,mode) { |file|
     data.rewind
     data.each { |line|
       if start==0
         start=1 if line =~ %r{^=begin #{name} -*}
       else
         break if line =~ %r{^=end -*}
         yield line if block_given?
         file.puts line
       end
     }
   }
 }

end

Which didn’t work.

Futher investigation proved to be confusing. Can anyone explain whats
going on?
(I’ve appended the output from each line to make it easier)

#!/bin/ruby -w
#Hello cat
#Hello dog
#hello canary

puts DATA.gets #-> one Ok, as expected.
puts DATA.gets #-> two Ok, as expected.

a=DATA.dup
puts a.gets #-> nil Huh? I was expecting ‘three’
a.rewind # Try rewinding…
puts a.gets #-> #!/bin/ruby -w Ok, I can live with that

b=DATA.dup
puts b.gets #-> nil Hmmm. Ok, same as before
b.rewind # Rewind…
puts b.gets #-> #!/bin/ruby -w As expected

Mix it up a bit…

puts a.gets #-> #Hello cat Ok
puts a.gets #-> #Hello dog Yep
puts b.gets #-> #Hello cat Ok
puts a.gets #-> #Hello canary Great!

Ok, lets simplify it a bit

c=DATA.dup
c.rewind
d=DATA.dup
d.rewind
puts c.gets #-> #!/bin/ruby -w As expected
puts d.gets #-> nil What the hell is going on here?

Try again, reordering stuff a bit…

c=DATA.dup
d=DATA.dup
c.rewind
puts c.gets #-> #!/bin/ruby -w As expected
d.rewind
puts d.gets #-> #!/bin/ruby -w It works! But why???

END
one
two
three

···

nobu.nokada@softhome.net wrote:

Shared file descriptors share the position even between
processes. You can duplicate DATA.

Hi,

Futher investigation proved to be confusing. Can anyone explain whats
going on?
(I’ve appended the output from each line to make it easier)

It looks position mismatch between stdio and IO descriptor.
Although this would be a bug, try with #seek before #dup.

#!/bin/ruby -w
#Hello cat
#Hello dog
#hello canary

puts DATA.gets #-> one Ok, as expected.
puts DATA.gets #-> two Ok, as expected.

DATA.seek(0, IO::SEEK_CUR)

a=DATA.dup
puts a.gets #-> nil Huh? I was expecting ‘three’
a.rewind # Try rewinding…
puts a.gets #-> #!/bin/ruby -w Ok, I can live with that

DATA.seek(0, IO::SEEK_CUR)

b=DATA.dup
puts b.gets #-> nil Hmmm. Ok, same as before
b.rewind # Rewind…
puts b.gets #-> #!/bin/ruby -w As expected

But it doesn’t work with the below, so more investigation is
needed.

···

At Thu, 12 Jun 2003 17:52:02 +0900, Andrew Walrond wrote:

Ok, lets simplify it a bit

c=DATA.dup
c.rewind
d=DATA.dup
d.rewind
puts c.gets #-> #!/bin/ruby -w As expected
puts d.gets #-> nil What the hell is going on here?


Nobu Nakada

OK;

#!/bin/ruby -w
#Hello cat
#Hello dog
#hello canary

puts DATA.gets #-> one Ok, as expected.
puts DATA.gets #-> two Ok, as expected.

DATA.seek(0, IO::SEEK_CUR)
a=DATA.dup
puts a.gets #-> three Yes!
a.rewind # Try rewinding…
puts a.gets #-> #!/bin/ruby -w Yes!

DATA.seek(0, IO::SEEK_CUR)
b=DATA.dup
puts b.gets #-> three Yes!
b.rewind # Rewind…
puts b.gets #-> #!/bin/ruby -w As expected

So as you suggested, the first part works fine with the sync
I’ll leave it with you then :slight_smile:

Andrew Walrond

···

nobu.nokada@softhome.net wrote:

Hi,

It looks position mismatch between stdio and IO descriptor.
Although this would be a bug, try with #seek before #dup.

Hi,

So as you suggested, the first part works fine with the sync
I’ll leave it with you then :slight_smile:

It was normal. As I wrote at [ruby-talk:73295],

Shared file descriptors share the position even between
processes.

Therefore, when first fd reached EOF, second fd points at EOF
too. This is UNIX I/O model.

At least, however, seek before dup probably should be done
automatically.

Index: io.c

···

At Fri, 13 Jun 2003 20:30:48 +0900, Andrew Walrond wrote:

RCS file: /cvs/ruby/src/ruby/io.c,v
retrieving revision 1.213
diff -u -2 -p -r1.213 io.c
— io.c 7 Jun 2003 15:33:40 -0000 1.213
+++ io.c 22 Jun 2003 01:07:46 -0000
@@ -2491,8 +2491,12 @@ rb_io_init_copy(dest, io)
if (orig->f2) {
io_fflush(orig->f2, orig);

  • fseeko(orig->f, 0L, SEEK_CUR);
    }
    else if (orig->mode & FMODE_WRITABLE) {
    io_fflush(orig->f, orig);
    }

  • else {

  • fseeko(orig->f, 0L, SEEK_CUR);

  • }

    /* copy OpenFile structure */


Nobu Nakada