What is the best way to edit a file to eliminate a line using Ruby?

This sounds an easy task, but I'm certain that I'm yet to find the most elegant solution.

I have a text file which I want to process using ruby in order to update it. I want to remove the single line which matches a regexp for which I have a definition. I'd prefer not to explicitly use temporary files - however (and this is important) I also don't want to risk loosing data with corruptions if the ruby process is killed unexpectedly... and I definitely don't want a file other than the one with/without the line I'm deleting to be read by any other process.

Is there something in a library which would make this task easy?

Steve [RubyTalk] wrote:

This sounds an easy task, but I'm certain that I'm yet to find the most
elegant solution.

I have a text file which I want to process using ruby in order to update
it. I want to remove the single line which matches a regexp for which I
have a definition. I'd prefer not to explicitly use temporary files -
however (and this is important) I also don't want to risk loosing data
with corruptions if the ruby process is killed unexpectedly... and I
definitely don't want a file other than the one with/without the line
I'm deleting to be read by any other process.

Is there something in a library which would make this task easy?

ruby -i.bak -ne 'print if $_ !~ /foo/' stuff.txt

William James wrote:

ruby -i.bak -ne 'print if $_ !~ /foo/' stuff.txt

Coo... that's a new one to me... very nifty.

Unfortunately, I mislead you... I want to transform a file from within a cgi script... which means I need to use standard out to generate other feedback to the user. Is a similar facility available within a ruby program without executing a new ruby process?

steve_rubytalk wrote:

William James wrote:

ruby -i.bak -ne 'print if $_ !~ /foo/' stuff.txt

Coo... that's a new one to me... very nifty.

Unfortunately, I mislead you... I want to transform a file from within a
cgi script... which means I need to use standard out to generate other
feedback to the user. Is a similar facility available within a ruby
program without executing a new ruby process?

File.new( "stuff.txt" ) do | f |
  f.each do |line|
    print unless line =~ /foo/
  end
end

Or if you needed to rewrite it to a different file

File.new( "stuff.txt" ) do | in |
  File.new( "newstuff.txt", "w" ) do |out|
    in.each { | line | out.print line unless line =~ /foo/ }
  end
end

···

--
Posted via http://www.ruby-forum.com/\.

Mike Fletcher wrote:

steve_rubytalk wrote:
> William James wrote:
>> ruby -i.bak -ne 'print if $_ !~ /foo/' stuff.txt
> Coo... that's a new one to me... very nifty.
>
> Unfortunately, I mislead you... I want to transform a file from within a
> cgi script... which means I need to use standard out to generate other
> feedback to the user. Is a similar facility available within a ruby
> program without executing a new ruby process?

File.new( "stuff.txt" ) do | f |
  f.each do |line|
    print unless line =~ /foo/
  end
end

IO.foreach('stuff'){|s| print s unless s =~ /foo/}

Mike Fletcher wrote:

steve_rubytalk wrote:

William James wrote:

ruby -i.bak -ne 'print if $_ !~ /foo/' stuff.txt

Coo... that's a new one to me... very nifty.

Unfortunately, I mislead you... I want to transform a file from within a
cgi script... which means I need to use standard out to generate other
feedback to the user. Is a similar facility available within a ruby
program without executing a new ruby process?

File.new( "stuff.txt" ) do | f |
  f.each do |line|
    print unless line =~ /foo/
  end
end

Or if you needed to rewrite it to a different file

File.new( "stuff.txt" ) do | in |
  File.new( "newstuff.txt", "w" ) do |out|
    in.each { | line | out.print line unless line =~ /foo/ }
  end
end

File.new doesn't take a block. Use File.open. Also, "in" is a keyword, so the above code produces a syntax error. With fixes:

# Like grep -v.

File.open( "stuff.txt" ) do |input|
   File.open( "newstuff.txt", "w" ) do |output|
     input.each { |line| output.print line unless line =~ /foo/ }
   end
end

Mike Fletcher wrote:

File.new( "stuff.txt" ) do | in |
  File.new( "newstuff.txt", "w" ) do |out|
    in.each { | line | out.print line unless line =~ /foo/ }
  end
end

That's remarkably similar to my current rough-n-ready approach - the one I consider inelegant...(N.B. the example above doesn't address the problem of atomically replacing stuff.txt with newstuff.txt.) I was thinking that something like this would be preferable:

FileModify.open 'stuff.txt' { |mfile| mfile.delete(/foo/) }

Of course, I've just invented FileModify off the top of my head, and I imagine it being 'transactional' - i.e. any exception arising in the block would prevent any change to stuff.txt. I'd prefer not to go around re-inventing the wheel if FileModify (or something similar) already exists. I don't need it to be desperately scalable or quick - on the other hand, reliability _is_ a key concern and I'd prefer to use the neatest possible syntax.

Steve [RubyTalk] wrote:

Mike Fletcher wrote:

File.new( "stuff.txt" ) do | in |
  File.new( "newstuff.txt", "w" ) do |out|
    in.each { | line | out.print line unless line =~ /foo/ }
  end
end

That's remarkably similar to my current rough-n-ready approach - the one I consider inelegant...(N.B. the example above doesn't address the problem of atomically replacing stuff.txt with newstuff.txt.) I was thinking that something like this would be preferable:

FileModify.open 'stuff.txt' { |mfile| mfile.delete(/foo/) }

Of course, I've just invented FileModify off the top of my head, and I imagine it being 'transactional' - i.e. any exception arising in the block would prevent any change to stuff.txt. I'd prefer not to go around re-inventing the wheel if FileModify (or something similar) already exists. I don't need it to be desperately scalable or quick - on the other hand, reliability _is_ a key concern and I'd prefer to use the neatest possible syntax.

I have done some things like:

def write filename, data
   File.open( filename, 'w' ){ |file| file.write data }
end

begin
   data = IO.read( 'stuff.txt' )
   write 'stuff.txt', data.gsub( /^foo(\n|$)/, '' )
rescue
   write 'stuff.txt.orig', data
end

Zach

zdennis wrote:

def write filename, data
  File.open( filename, 'w' ){ |file| file.write data }
end

begin
  data = IO.read( 'stuff.txt' )
  write 'stuff.txt', data.gsub( /^foo(\n|$)/, '' )
rescue
  write 'stuff.txt.orig', data
end

You'd probably want the rescue statement to be a "rescue Exception" so you catch any/all errors...

Zach

Ok, slightly more elegant...

class File
   def self.modify filename
     if block_given?
       data = IO.read filename
       begin
         file = File.open filename, 'w'
         yield file, data
       rescue Exception => ex
         file.open( filename, 'w' ){|file| file.write data }
       ensure
         file.close unless file.closed?
       end
     end
   end
end

File.modify( 'stuff.txt' ) do |writable_file, original_file_contents|
   writable_file.write original_file_contents.gsub /foo(\n|$)/, ''
end

Hope this works better...

Zach

zdennis wrote:

Hope this works better...

It will still permanently and irrecoverably loose data if the process terminates (e.g. a process hard-limit is exceeded, an administrator kills the process explicitly; or an old-fashioned power-cut etc.) just after starting to write the updated file... so I wouldn't consider it sufficiently robust for my purposes.

zdennis wrote:

def write filename, data
  File.open( filename, 'w' ){ |file| file.write data }
end

begin
  data = IO.read( 'stuff.txt' )
  write 'stuff.txt', data.gsub( /^foo(\n|$)/, '' )
rescue
  write 'stuff.txt.orig', data
end

You'd probably want the rescue statement to be a "rescue Exception" so you catch any/all errors...

Both versions look dangerous to me.

1. If an exception is raised on opening 'stuff.txt' to read then an attempt will be made to truncate the file (or to overwrite it with whatever happened to be in data previously. [This could be avoided by reading before begin.]

2. If a disk becomes full (or nearly full) during the write operation then the rescue will likely not be able to write all the unmodified data back - hence permanently loosing valuable information.

I need a more robust approach than this. :slight_smile:

Steve [RubyTalk] wrote:

zdennis wrote:

Hope this works better...

It will still permanently and irrecoverably loose data if the process terminates (e.g. a process hard-limit is exceeded, an administrator kills the process explicitly; or an old-fashioned power-cut etc.) just after starting to write the updated file... so I wouldn't consider it sufficiently robust for my purposes.

You could modify for your needs:

require 'fileutils'

class File
   def self.modify filename
     if block_given?
       data = IO.read filename
       FileUtils.mv filename, "#{filename}.orig"
       begin
         file = File.open filename, 'w'
         yield file, data
       ensure
         file.close unless file.closed?
       end
     end
   end
end

File.modify( 'stuff.txt' ) do |writable_file, original_file_contents|
   writable_file.write original_file_contents.gsub( /foo(\n|$)/, '' )
end

Zach

Steve [RubyTalk] wrote:

zdennis wrote:

def write filename, data
  File.open( filename, 'w' ){ |file| file.write data }
end

begin
  data = IO.read( 'stuff.txt' )
  write 'stuff.txt', data.gsub( /^foo(\n|$)/, '' )
rescue
  write 'stuff.txt.orig', data
end

You'd probably want the rescue statement to be a "rescue Exception" so you catch any/all errors...

Both versions look dangerous to me.

1. If an exception is raised on opening 'stuff.txt' to read then an attempt will be made to truncate the file (or to overwrite it with whatever happened to be in data previously. [This could be avoided by reading before begin.]

2. If a disk becomes full (or nearly full) during the write operation then the rescue will likely not be able to write all the unmodified data back - hence permanently loosing valuable information.

I need a more robust approach than this. :slight_smile:

Understood. I don't know full extent of your issue. It appears you can run into a lot of possibilities regarding where the *power goes up*. It could happen during any system process, not just rubys.

If this helps lead you to an elegant implementation, great! Otherwise...maybe it will steer you away from a potential disaster! good luck!

Zach

I don't think you can get away without a tempfile and get safe
"in-place"
modifications. It looks to me like the best compromise would be to

- read in the original
- write the modified file to a temp (use ruby's 'tempfile' which, I
think, should create a temp with secure permissions)
- use the most atomic os facility you can to copy the modified atop the
original

On many platforms this might map to Rubys File.rename or FileUtils.mv,
I'm not sure...

HTH,
- alan

Alan Chen wrote:

I don't think you can get away without a tempfile and get safe
"in-place"
modifications. It looks to me like the best compromise would be to

- read in the original
- write the modified file to a temp (use ruby's 'tempfile' which, I
think, should create a temp with secure permissions)
- use the most atomic os facility you can to copy the modified atop the
original

On many platforms this might map to Rubys File.rename or FileUtils.mv,
I'm not sure...
  

Yup... that seems pretty reasonable to me too....though I have to say I'm surprised that I seem to be defining something to do this rather than just using a library component. It's exactly the sort of thing I'd have previously been sure someone would have contributed.

Steve

I think most of us have faith that, in general, the computer will not lose power in the middle of an operation.

Which explains why I had to re-type a half-hour's worth of wiki page editing when the my company's building lost power a few days ago. :slight_smile:

···

On Dec 2, 2005, at 4:55 AM, Steve [RubyTalk] wrote:

Yup... that seems pretty reasonable to me too....though I have to say I'm surprised that I seem to be defining something to do this rather than just using a library component. It's exactly the sort of thing I'd have previously been sure someone would have contributed.