[NEWBIE] Extract all occurences from a text

Hi!
I just can’t figure out how to extract all matches from a text file.
I have a file that looks like this:

some text i don't want

Data from 21.4.2004
Data from 22.4.2004
.... I have a pattern /\d{4,}/\\d{2,}/\\d{2,}/ which matches everything in the links' target, but I don't want just the first match or the last one... I searched google and looked in "programming ruby" but couldn't find a solution... If you know how this works, please tell me!

Thanks in advance!

Michael

Michael Weller wrote:

Hi!
I just can’t figure out how to extract all matches from a text file.
I have a file that looks like this:

some text i don't want

Data from 21.4.2004
Data from 22.4.2004
.... I have a pattern /\d{4,}/\\d{2,}/\\d{2,}/ which matches everything in the links' target, but I don't want just the first match or the last one... I searched google and looked in "programming ruby" but couldn't find a solution... If you know how this works, please tell me!

Thanks in advance!

File.open(‘filename’).read().scan /\d{4,}/\d{2,}/\d{2,}/
should return the array of matches.

emmanuel

you could simply do:

lines=
open(filename) do |f|
lines= f.grep your_regexp
end

even if Actually your regexp does not makes sens to me.
I believe you want dates, so it should be
/\d{4}/\d{2}/\d{2}/
it seem you escaped wrong characters ( \ instead of / )

···

il Fri, 23 Apr 2004 19:53:43 +0900, Michael Weller michael@gutschi.de ha scritto::

Hi!
I just can’t figure out how to extract all matches from a text file.
I have a file that looks like this:

some text i don't want

Data from 21.4.2004
Data from 22.4.2004
... I have a pattern /\d{4,}/\\d{2,}/\\d{2,}/ which matches everything in the links' target, but I don't want just the first match or the last one... I searched google and looked in "programming ruby" but couldn't find a solution... If you know how this works, please tell me!

My favourite method for extracting regexps and processing them is gsub()
with block:

File.open(“fileName.html”,“r”).readlines.join.each_line do |line|
line.gsub(/yourPattern/) do |match|
…do something with 'match’
end
end

Thanks for your responses! I knew there must be a simple way…
(actually the pattern is a bit different, I just wrote an example
pattern without testing!)

Michael

gabriele renzi wrote:

···

il Fri, 23 Apr 2004 19:53:43 +0900, Michael Weller >michael@gutschi.de ha scritto::

Hi!
I just can’t figure out how to extract all matches from a text file.
I have a file that looks like this:

some text i don't want

Data from 21.4.2004
Data from 22.4.2004
... I have a pattern /\d{4,}/\\d{2,}/\\d{2,}/ which matches everything in the links' target, but I don't want just the first match or the last one... I searched google and looked in "programming ruby" but couldn't find a solution... If you know how this works, please tell me!

you could simply do:

lines=
open(filename) do |f|
lines= f.grep your_regexp
end

even if Actually your regexp does not makes sens to me.
I believe you want dates, so it should be
/\d{4}/\d{2}/\d{2}/
it seem you escaped wrong characters ( \ instead of / )

Hi –

gabriele renzi surrender_it@remove.yahoo.it writes:

···

il Fri, 23 Apr 2004 19:53:43 +0900, Michael Weller > michael@gutschi.de ha scritto::

Hi!
I just can’t figure out how to extract all matches from a text file.
I have a file that looks like this:

some text i don't want

Data from 21.4.2004
Data from 22.4.2004
... I have a pattern /\d{4,}/\\d{2,}/\\d{2,}/ which matches everything in the links' target, but I don't want just the first match or the last one... I searched google and looked in "programming ruby" but couldn't find a solution... If you know how this works, please tell me!

you could simply do:

lines=
open(filename) do |f|
lines= f.grep your_regexp
end

Or even more simply:

lines = open(filename).grep(regex)

David


David A. Black
dblack@wobblini.net

you could simply do:

lines=

This line has no effect, the Array will be discarded
at the next expression

open(filename) do |f|
lines= f.grep your_regexp
end

Or even more simply:

lines = open(filename).grep(regex)

doesn’t that leave the file open?

···

On Fri, 23 Apr 2004 05:08:00 -0700, David Alan Black wrote:

just try it :stuck_out_tongue_winking_eye:

If you do not initialize a variable outside of a block it will be
discarded when the block ends.

This will change in ruby2 and work like you expect (I expect the
behaviour it has ATM)

···

il Fri, 23 Apr 2004 15:43:24 +0200, Kristof Bastiaensen kristof@vleeuwen.org ha scritto::

On Fri, 23 Apr 2004 05:08:00 -0700, David Alan Black wrote:

you could simply do:

lines=

This line has no effect, the Array will be discarded
at the next expression

open(filename) do |f|
lines= f.grep your_regexp
end

Hi –

Kristof Bastiaensen kristof@vleeuwen.org writes:

···

On Fri, 23 Apr 2004 05:08:00 -0700, David Alan Black wrote:

Or even more simply:

lines = open(filename).grep(regex)

doesn’t that leave the file open?

Sigh – yes. That’s a recurrent mental glitch I’ve got.

David


David A. Black
dblack@wobblini.net

Of course you are right. Maybe initializing it to nil would
be less expensive and make the purpose more clear.

···

On Fri, 23 Apr 2004 13:52:10 +0000, gabriele renzi wrote:

il Fri, 23 Apr 2004 15:43:24 +0200, Kristof Bastiaensen > kristof@vleeuwen.org ha scritto::

On Fri, 23 Apr 2004 05:08:00 -0700, David Alan Black wrote:

you could simply do:

lines=

This line has no effect, the Array will be discarded
at the next expression

open(filename) do |f|
lines= f.grep your_regexp
end

just try it :stuck_out_tongue_winking_eye:

If you do not initialize a variable outside of a block it will be
discarded when the block ends.

This will change in ruby2 and work like you expect (I expect the
behaviour it has ATM)

“David Alan Black” dblack@wobblini.net schrieb im Newsbeitrag
news:m3ekqeu4le.fsf@wobblini.net

Hi –

Kristof Bastiaensen kristof@vleeuwen.org writes:

Or even more simply:

lines = open(filename).grep(regex)

doesn’t that leave the file open?

Sigh – yes. That’s a recurrent mental glitch I’ve got.

File#readlines does the trick correctly:

lines = File.readlines(filename).grep(regexp)

Only drawback: there’s an intermediate array which burns a lot of mem if
the file is huge. In those cases better do:

lines = File.open(filenam) do |io|
l=

while ( line = io.gets )
l << line if regexp =~ line
end

l
end

Regards

robert
···

On Fri, 23 Apr 2004 05:08:00 -0700, David Alan Black wrote: