Regexing a file's contents without reading the whole thing?

I see that it is possible currently to parse through a file without
reading the whole thing into RAM, a la

a = File.open('a', 'r')
a.lines{|line|
  if line =~ /some regex/
    ...
  end
}

But what if I can to do something like
a = File.read('a').scan /some regex/

is that possible?

Thanks.
-r

···

--
Posted via http://www.ruby-forum.com/.

Roger Pack wrote:

I see that it is possible currently to parse through a file without
reading the whole thing into RAM, a la

a = File.open('a', 'r')
a.lines{|line|
  if line =~ /some regex/
    ...
  end
}

But what if I can to do something like
a = File.read('a').scan /some regex/

is that possible?

Thanks.
-r

File.open('/usr/share/dict/words').grep /ruby/i

···

--
       vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

If you know that matches will never cross line breaks you can do

a =
File.foreach("a") do |line|
  line.scan /regex/ do |m|
    a << m
  end
  # alternative:
  a.concat(line.scan(/regex/))
end

If matches can cross line breaks the whole store becomes more
complicated and your solution with File.read is probably the simplest
way to do it (if files aren't too large).

Kind regards

robert

···

2009/11/30 Roger Pack <rogerpack2005@gmail.com>:

I see that it is possible currently to parse through a file without
reading the whole thing into RAM, a la

a = File.open('a', 'r')
a.lines{|line|
if line =~ /some regex/
...
end
}

But what if I can to do something like
a = File.read('a').scan /some regex/

is that possible?

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

The library which makes this possible is sequence. I'm coding this
from memory, so I'm likely to get something wrong, but the equivalent
in sequence looks more or less like this:

require 'rubygems'
require 'sequence'
require 'sequence/file'

seq=Sequence.new(File.open('a'))
seq.scan_until(/some regex/)

Keep the following in mind:
1) Sequence#scan works like StringScanner#scan, not String#scan.
2) The pattern to be matched must have a max length (4k by default, I
think; it can be changed).
3) If your pattern is guaranteed to not contain a nl, you're better
off with readline, as robert said.

···

On 11/30/09, Roger Pack <rogerpack2005@gmail.com> wrote:

I see that it is possible currently to parse through a file without
reading the whole thing into RAM, a la

a = File.open('a', 'r')
a.lines{|line|
  if line =~ /some regex/
    ...
  end
}

But what if I can to do something like
a = File.read('a').scan /some regex/

is that possible?