This sounds an easy task, but I'm certain that I'm yet to find the most elegant solution.
I have a text file which I want to process using ruby in order to update it. I want to remove the single line which matches a regexp for which I have a definition. I'd prefer not to explicitly use temporary files - however (and this is important) I also don't want to risk loosing data with corruptions if the ruby process is killed unexpectedly... and I definitely don't want a file other than the one with/without the line I'm deleting to be read by any other process.
Is there something in a library which would make this task easy?
This sounds an easy task, but I'm certain that I'm yet to find the most
elegant solution.
I have a text file which I want to process using ruby in order to update
it. I want to remove the single line which matches a regexp for which I
have a definition. I'd prefer not to explicitly use temporary files -
however (and this is important) I also don't want to risk loosing data
with corruptions if the ruby process is killed unexpectedly... and I
definitely don't want a file other than the one with/without the line
I'm deleting to be read by any other process.
Is there something in a library which would make this task easy?
Unfortunately, I mislead you... I want to transform a file from within a cgi script... which means I need to use standard out to generate other feedback to the user. Is a similar facility available within a ruby program without executing a new ruby process?
Unfortunately, I mislead you... I want to transform a file from within a
cgi script... which means I need to use standard out to generate other
feedback to the user. Is a similar facility available within a ruby
program without executing a new ruby process?
File.new( "stuff.txt" ) do | f |
f.each do |line|
print unless line =~ /foo/
end
end
Or if you needed to rewrite it to a different file
File.new( "stuff.txt" ) do | in |
File.new( "newstuff.txt", "w" ) do |out|
in.each { | line | out.print line unless line =~ /foo/ }
end
end
steve_rubytalk wrote:
> William James wrote:
>> ruby -i.bak -ne 'print if $_ !~ /foo/' stuff.txt
> Coo... that's a new one to me... very nifty.
>
> Unfortunately, I mislead you... I want to transform a file from within a
> cgi script... which means I need to use standard out to generate other
> feedback to the user. Is a similar facility available within a ruby
> program without executing a new ruby process?
File.new( "stuff.txt" ) do | f |
f.each do |line|
print unless line =~ /foo/
end
end
IO.foreach('stuff'){|s| print s unless s =~ /foo/}
Unfortunately, I mislead you... I want to transform a file from within a
cgi script... which means I need to use standard out to generate other
feedback to the user. Is a similar facility available within a ruby
program without executing a new ruby process?
File.new( "stuff.txt" ) do | f |
f.each do |line|
print unless line =~ /foo/
end
end
Or if you needed to rewrite it to a different file
File.new( "stuff.txt" ) do | in |
File.new( "newstuff.txt", "w" ) do |out|
in.each { | line | out.print line unless line =~ /foo/ }
end
end
File.new doesn't take a block. Use File.open. Also, "in" is a keyword, so the above code produces a syntax error. With fixes:
# Like grep -v.
File.open( "stuff.txt" ) do |input|
File.open( "newstuff.txt", "w" ) do |output|
input.each { |line| output.print line unless line =~ /foo/ }
end
end
File.new( "stuff.txt" ) do | in |
File.new( "newstuff.txt", "w" ) do |out|
in.each { | line | out.print line unless line =~ /foo/ }
end
end
That's remarkably similar to my current rough-n-ready approach - the one I consider inelegant...(N.B. the example above doesn't address the problem of atomically replacing stuff.txt with newstuff.txt.) I was thinking that something like this would be preferable:
Of course, I've just invented FileModify off the top of my head, and I imagine it being 'transactional' - i.e. any exception arising in the block would prevent any change to stuff.txt. I'd prefer not to go around re-inventing the wheel if FileModify (or something similar) already exists. I don't need it to be desperately scalable or quick - on the other hand, reliability _is_ a key concern and I'd prefer to use the neatest possible syntax.
File.new( "stuff.txt" ) do | in |
File.new( "newstuff.txt", "w" ) do |out|
in.each { | line | out.print line unless line =~ /foo/ }
end
end
That's remarkably similar to my current rough-n-ready approach - the one I consider inelegant...(N.B. the example above doesn't address the problem of atomically replacing stuff.txt with newstuff.txt.) I was thinking that something like this would be preferable:
Of course, I've just invented FileModify off the top of my head, and I imagine it being 'transactional' - i.e. any exception arising in the block would prevent any change to stuff.txt. I'd prefer not to go around re-inventing the wheel if FileModify (or something similar) already exists. I don't need it to be desperately scalable or quick - on the other hand, reliability _is_ a key concern and I'd prefer to use the neatest possible syntax.
I have done some things like:
def write filename, data
File.open( filename, 'w' ){ |file| file.write data }
end
begin
data = IO.read( 'stuff.txt' )
write 'stuff.txt', data.gsub( /^foo(\n|$)/, '' )
rescue
write 'stuff.txt.orig', data
end
class File
def self.modify filename
if block_given?
data = IO.read filename
begin
file = File.open filename, 'w'
yield file, data
rescue Exception => ex
file.open( filename, 'w' ){|file| file.write data }
ensure
file.close unless file.closed?
end
end
end
end
File.modify( 'stuff.txt' ) do |writable_file, original_file_contents|
writable_file.write original_file_contents.gsub /foo(\n|$)/, ''
end
It will still permanently and irrecoverably loose data if the process terminates (e.g. a process hard-limit is exceeded, an administrator kills the process explicitly; or an old-fashioned power-cut etc.) just after starting to write the updated file... so I wouldn't consider it sufficiently robust for my purposes.
def write filename, data
File.open( filename, 'w' ){ |file| file.write data }
end
begin
data = IO.read( 'stuff.txt' )
write 'stuff.txt', data.gsub( /^foo(\n|$)/, '' )
rescue
write 'stuff.txt.orig', data
end
You'd probably want the rescue statement to be a "rescue Exception" so you catch any/all errors...
Both versions look dangerous to me.
1. If an exception is raised on opening 'stuff.txt' to read then an attempt will be made to truncate the file (or to overwrite it with whatever happened to be in data previously. [This could be avoided by reading before begin.]
2. If a disk becomes full (or nearly full) during the write operation then the rescue will likely not be able to write all the unmodified data back - hence permanently loosing valuable information.
It will still permanently and irrecoverably loose data if the process terminates (e.g. a process hard-limit is exceeded, an administrator kills the process explicitly; or an old-fashioned power-cut etc.) just after starting to write the updated file... so I wouldn't consider it sufficiently robust for my purposes.
You could modify for your needs:
require 'fileutils'
class File
def self.modify filename
if block_given?
data = IO.read filename
FileUtils.mv filename, "#{filename}.orig"
begin
file = File.open filename, 'w'
yield file, data
ensure
file.close unless file.closed?
end
end
end
end
File.modify( 'stuff.txt' ) do |writable_file, original_file_contents|
writable_file.write original_file_contents.gsub( /foo(\n|$)/, '' )
end
def write filename, data
File.open( filename, 'w' ){ |file| file.write data }
end
begin
data = IO.read( 'stuff.txt' )
write 'stuff.txt', data.gsub( /^foo(\n|$)/, '' )
rescue
write 'stuff.txt.orig', data
end
You'd probably want the rescue statement to be a "rescue Exception" so you catch any/all errors...
Both versions look dangerous to me.
1. If an exception is raised on opening 'stuff.txt' to read then an attempt will be made to truncate the file (or to overwrite it with whatever happened to be in data previously. [This could be avoided by reading before begin.]
2. If a disk becomes full (or nearly full) during the write operation then the rescue will likely not be able to write all the unmodified data back - hence permanently loosing valuable information.
I need a more robust approach than this.
Understood. I don't know full extent of your issue. It appears you can run into a lot of possibilities regarding where the *power goes up*. It could happen during any system process, not just rubys.
If this helps lead you to an elegant implementation, great! Otherwise...maybe it will steer you away from a potential disaster! good luck!
I don't think you can get away without a tempfile and get safe
"in-place"
modifications. It looks to me like the best compromise would be to
- read in the original
- write the modified file to a temp (use ruby's 'tempfile' which, I
think, should create a temp with secure permissions)
- use the most atomic os facility you can to copy the modified atop the
original
On many platforms this might map to Rubys File.rename or FileUtils.mv,
I'm not sure...
I don't think you can get away without a tempfile and get safe
"in-place"
modifications. It looks to me like the best compromise would be to
- read in the original
- write the modified file to a temp (use ruby's 'tempfile' which, I
think, should create a temp with secure permissions)
- use the most atomic os facility you can to copy the modified atop the
original
On many platforms this might map to Rubys File.rename or FileUtils.mv,
I'm not sure...
Yup... that seems pretty reasonable to me too....though I have to say I'm surprised that I seem to be defining something to do this rather than just using a library component. It's exactly the sort of thing I'd have previously been sure someone would have contributed.
I think most of us have faith that, in general, the computer will not lose power in the middle of an operation.
Which explains why I had to re-type a half-hour's worth of wiki page editing when the my company's building lost power a few days ago.
···
On Dec 2, 2005, at 4:55 AM, Steve [RubyTalk] wrote:
Yup... that seems pretty reasonable to me too....though I have to say I'm surprised that I seem to be defining something to do this rather than just using a library component. It's exactly the sort of thing I'd have previously been sure someone would have contributed.