Efficient regex scanning

Hello there,

I wan't to extract all the words from a file and so i wrote the
following code:

file = ARGV[0]
File.open('output','w') {|f|
        IO.read(file).scan(/\w+/).each{|w| f.print w}
}

The problem with this code is that it stores all the words in an array
which is not so good in terms of efficiency.
Is there a better way to do it?
Something like IO.read(file).each_scan { foo }

Thanks
Christos

Trochalakis Christos wrote:

Hello there,

I wan't to extract all the words from a file and so i wrote the
following code:

file = ARGV[0]
File.open('output','w') {|f|
        IO.read(file).scan(/\w+/).each{|w| f.print w}
}

The problem with this code is that it stores all the words in an array
which is not so good in terms of efficiency.
Is there a better way to do it?
Something like IO.read(file).each_scan { foo }

Thanks
Christos

Scan takes a block form:

ri String.scan

        IO.read(file).scan(/\w+/) {|w| f.print w}

Cheers

···

--
Ola Bini (http://ola-bini.blogspot.com) JRuby Core Developer
Developer, ThoughtWorks Studios (http://studios.thoughtworks.com)

"Yields falsehood when quined" yields falsehood when quined.

Trochalakis Christos wrote:

Hello there,

The problem with this code is that it stores all the words in an array
which is not so good in terms of efficiency.
Is there a better way to do it?
Something like IO.read(file).each_scan { foo }

Thanks
Christos

Does just using a block with scan do what you need?

IO.read(file).scan(/\w+/) { |word| f.print word }

http://www.ruby-doc.org/core/classes/String.html#M000827

best,
Dan

···

--
Posted via http://www.ruby-forum.com/\.

Hi --

···

On Wed, 6 Jun 2007, Trochalakis Christos wrote:

Hello there,

I wan't to extract all the words from a file and so i wrote the
following code:

file = ARGV[0]
File.open('output','w') {|f|
       IO.read(file).scan(/\w+/).each{|w| f.print w}
}

The problem with this code is that it stores all the words in an array
which is not so good in terms of efficiency.
Is there a better way to do it?
Something like IO.read(file).each_scan { foo }

You could do something like this (untested, and reversing your logic
somewhat):

   File.open(file).each {|line| f.print(line.scan(/\w+/)) }

(You might want to join them with a space or something so they don't
all run together.)

David

--
Q. What is THE Ruby book for Rails developers?
A. RUBY FOR RAILS by David A. Black (http://www.manning.com/black\)
    (See what readers are saying! http://www.rubypal.com/r4rrevs.pdf\)
Q. Where can I get Ruby/Rails on-site training, consulting, coaching?
A. Ruby Power and Light, LLC (http://www.rubypal.com)

Trochalakis Christos wrote:

Hello there,

I wan't to extract all the words from a file and so i wrote the
following code:

file = ARGV[0]
File.open('output','w') {|f|
        IO.read(file).scan(/\w+/).each{|w| f.print w}
}

The problem with this code is that it stores all the words in an array
which is not so good in terms of efficiency.
Is there a better way to do it?
Something like IO.read(file).each_scan { foo }

Here's a thought. Note that it doesn't handle //m regexen. Like David's and Robert's solutions, it doesn't read the whole at once. (I guess one could check for pat.options&Regexp::MULTILINE, and read the whole IO in that case.)

class IO
   def scan pat
     if block_given?
       each {|line| line.scan(pat) {|s| yield s} }
     else
       read.scan(pat)
     end
   end
end

File.open(filename) do |f|
   f.scan(/\w+/) {|word| puts word}
end

···

--
       vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

Thanks a lot!
I suppose should have checked first :slight_smile:

···

On Jun 6, 2:00 pm, Ola Bini <ola.b...@gmail.com> wrote:

Trochalakis Christos wrote:
> Hello there,

> I wan't to extract all the words from a file and so i wrote the
> following code:

> file = ARGV[0]
> File.open('output','w') {|f|
> IO.read(file).scan(/\w+/).each{|w| f.print w}
> }

> The problem with this code is that it stores all the words in an array
> which is not so good in terms of efficiency.
> Is there a better way to do it?
> Something like IO.read(file).each_scan { foo }

> Thanks
> Christos

Scan takes a block form:

ri String.scan

        IO.read(file).scan(/\w+/) {|w| f.print w}

Cheers

You're not closing the IO. I know it's not an issue for a small script but...

I'd do this:

ARGF.each {|line| puts line.scan /\w+/}

:slight_smile:

Kind regards

  robert

···

On 06.06.2007 13:08, dblack@wobblini.net wrote:

Hi --

On Wed, 6 Jun 2007, Trochalakis Christos wrote:

Hello there,

I wan't to extract all the words from a file and so i wrote the
following code:

file = ARGV[0]
File.open('output','w') {|f|
       IO.read(file).scan(/\w+/).each{|w| f.print w}
}

The problem with this code is that it stores all the words in an array
which is not so good in terms of efficiency.
Is there a better way to do it?
Something like IO.read(file).each_scan { foo }

You could do something like this (untested, and reversing your logic
somewhat):

  File.open(file).each {|line| f.print(line.scan(/\w+/)) }

(You might want to join them with a space or something so they don't
all run together.)

Hi --

···

On Wed, 6 Jun 2007, Robert Klemme wrote:

On 06.06.2007 13:08, dblack@wobblini.net wrote:

Hi --

On Wed, 6 Jun 2007, Trochalakis Christos wrote:

Hello there,

I wan't to extract all the words from a file and so i wrote the
following code:

file = ARGV[0]
File.open('output','w') {|f|
       IO.read(file).scan(/\w+/).each{|w| f.print w}
}

The problem with this code is that it stores all the words in an array
which is not so good in terms of efficiency.
Is there a better way to do it?
Something like IO.read(file).each_scan { foo }

You could do something like this (untested, and reversing your logic
somewhat):

  File.open(file).each {|line| f.print(line.scan(/\w+/)) }

(You might want to join them with a space or something so they don't
all run together.)

You're not closing the IO. I know it's not an issue for a small script but...

It's not a complete script; I was only showing one line. At the very
least it's not going to run unless you assign something to f :slight_smile:

David

--
Q. What is THE Ruby book for Rails developers?
A. RUBY FOR RAILS by David A. Black (http://www.manning.com/black\)
    (See what readers are saying! http://www.rubypal.com/r4rrevs.pdf\)
Q. Where can I get Ruby/Rails on-site training, consulting, coaching?
A. Ruby Power and Light, LLC (http://www.rubypal.com)