No speedup...!

Hello,

The Code:

···

====================================
def look_for_begin
   while line = gets
     if line =~ /^begin/
       puts line
       # return
     end
   end
end

ARGF.each { look_for_begin }

I have files with uuencoded and yencoded
data, and some text-only files, all in all 188 files,
and the size for all are about 16 MB.

The tool needs 3.6 seconds to look for the /^begin/
in all files.
When using exceptions, or break, or return (see the
comment above) to stop reading the file after a /^begin/
was found, I got no speedup!

I tries Perl, OCaml and C and all are a lot faster.
OK, if Ruby is slower, so it is.... and I have to live
with that.
But what I can NOT accept, is that the code needs the same
time with the statements and without the statements, that stop the
further reading of the files!

That seems very strange to mee!

Someone who can explain me this?

Thanks In Advance,
              Oliver

iterates over LINES of files passes on commandline, not files.
try ARGV for filenames.

you can see the behaviour when you'll add
  puts "new file"
before while

···

On 8/25/06, Oliver Bandel <oliver@first.in-berlin.de> wrote:

Hello,

The Code:

====================================
def look_for_begin
   while line = gets
     if line =~ /^begin/
       puts line
       # return
     end
   end
end

ARGF.each { look_for_begin }

Oliver Bandel wrote:

====================================
def look_for_begin
   while line = gets
     if line =~ /^begin/
       puts line
       # return
     end
   end
end

ARGF.each { look_for_begin }

The problem is that you've got two loops here. ARGF.each calls
look_for_begin once for each line of each file passed in. Then within
look_for_begin, it has another loop that runs until there are no more
lines to process. So what happens is this: without the return
statement, the look_for_begin function is called once, and its while
loop runs through all of the lines until until there are no more to
process. The function is not called again, because the ARGF.each loop
terminates immediately, because all lines have been read.

If you put in the return, the while loop runs until it finds the first
"begin". Then the function returns. Then the ARGF.each loop calls
look_for_begin again, and it picks up where it left off, processing the
line after the one where "begin" was found.

So, either way, your function process every line of every file. The
only difference you cause by adding and removing the return statement
is whether it processes all of the lines in one call to look_for_begin,
or over multiple calls.

I think what you wanted to do is use ARGV.each instead of ARGF.each, to
iterate over the list of file names, and pass each file name into the
look_for_begin function. Within the function, you'd process only the
lines in that file. In other words, like this:

def look_for_begin(fn)
  IO.foreach(fn) do |line|
    if line =~ /^begin/
      puts line
      return
    end
  end
end

ARGV.each {|fn| look_for_begin(fn) }

Oliver Bandel wrote:

Hello,

The Code:

====================================
def look_for_begin
   while line = gets
     if line =~ /^begin/
       puts line
       # return
     end
   end
end

ARGF.each { look_for_begin }

I have files with uuencoded and yencoded
data, and some text-only files, all in all 188 files,
and the size for all are about 16 MB.

If you have enough RAM to slurp whole files:

while text = gets( nil )
  # text contains the entire contents of one file.
  if text =~ /^begin.*/
    puts "In #{ $FILENAME }, found:"
    puts $&
  end
end

Oliver Bandel wrote:

Hello,

The Code:

====================================
def look_for_begin
   while line = gets
     if line =~ /^begin/
       puts line
       # return
     end
   end
end

ARGF.each { look_for_begin }

puts ARGV.map{|f|IO.readlines(f).find{|s|s=~/^begin/}}

Karl von Laudermann wrote:

Oliver Bandel wrote:

I think what you wanted to do is use ARGV.each instead of ARGF.each, to
iterate over the list of file names, and pass each file name into the
look_for_begin function. Within the function, you'd process only the
lines in that file. In other words, like this:

def look_for_begin(fn)
  IO.foreach(fn) do |line|
    if line =~ /^begin/
      puts line
      return
    end
  end
end

ARGV.each {|fn| look_for_begin(fn) }

I think, Oliver wanted to iterate all lines in the files whose names were given as command line arguments. Something like:

ARGF.each do |line|
   if line =~ /^begin/
     puts line
     break
   end
end

Kind regards

  robert

Hi --

···

On Sat, 26 Aug 2006, William James wrote:

Oliver Bandel wrote:

Hello,

The Code:

====================================
def look_for_begin
   while line = gets
     if line =~ /^begin/
       puts line
       # return
     end
   end
end

ARGF.each { look_for_begin }

puts ARGV.map{|f|IO.readlines(f).find{|s|s=~/^begin/}}

Or maybe:

   puts ARGF.find {|s| /^begin/.match(s) }

David

--
http://www.rubypowerandlight.com => Ruby/Rails training & consultancy
   ----> SEE SPECIAL DEAL FOR RUBY/RAILS USERS GROUPS! <-----
http://dablog.rubypal.com => D[avid ]A[. ]B[lack's][ Web]log
Ruby for Rails => book, Ruby for Rails
http://www.rubycentral.org => Ruby Central, Inc.

dblack@wobblini.net wrote:

Hi --

Oliver Bandel wrote:

Hello,

The Code:

====================================
def look_for_begin
   while line = gets
     if line =~ /^begin/
       puts line
       # return
     end
   end
end

ARGF.each { look_for_begin }

puts ARGV.map{|f|IO.readlines(f).find{|s|s=~/^begin/}}

Or maybe:

  puts ARGF.find {|s| /^begin/.match(s) }

[...]

Theese both things looks like if they would look for *all*
occurrnces of "begin", not the first one.

I also think to look only in the first 1000 lines or so...

Ciao,
    Oliver

P.S.: But I now also found files, where more than one
       uuencoded section was inside...
       ... so, maybe reading the files complete also could make sense...
       (I didn't found such files before, so I thought it would make
        sense to read only until the first occurence of /^begin/)

···

On Sat, 26 Aug 2006, William James wrote:

No, don't use match, it is slow: [ruby-talk:204747]

···

On Aug 26, 2006, at 4:11 AM, dblack@wobblini.net wrote:

On Sat, 26 Aug 2006, William James wrote:

Oliver Bandel wrote:

====================================
def look_for_begin
   while line = gets
     if line =~ /^begin/
       puts line
       # return
     end
   end
end

ARGF.each { look_for_begin }

puts ARGV.map{|f|IO.readlines(f).find{|s|s=~/^begin/}}

Or maybe:

  puts ARGF.find {|s| /^begin/.match(s) }

--
Eric Hodel - drbrain@segment7.net - http://blog.segment7.net
This implementation is HODEL-HASH-9600 compliant

http://trackmap.robotcoop.com

Oliver Bandel wrote:

> Hi --
>
>
>> Oliver Bandel wrote:
>>
>>> Hello,
>>>
>>>
>>> The Code:
>>>
>>> ====================================
>>> def look_for_begin
>>> while line = gets
>>> if line =~ /^begin/
>>> puts line
>>> # return
>>> end
>>> end
>>> end
>>>
>>> ARGF.each { look_for_begin }
>>> ====================================
>>
>>
>> puts ARGV.map{|f|IO.readlines(f).find{|s|s=~/^begin/}}
>
>
> Or maybe:
>
> puts ARGF.find {|s| /^begin/.match(s) }

No, this only finds one instance. Mine finds the first
in each file.

[...]

Theese both things looks like if they would look for *all*
occurrnces of "begin", not the first one.

You know too little of Ruby to tell what the code will do
just by looking at it. Try both if you want to know what
they will do.

I also think to look only in the first 1000 lines or so...

ARGV.each{|f| count = 0
  IO.foreach(f) {|line|
    if line =~ /^begin/
      print line
      break
    end
    count += 1
    break if 1000 == count
  }
}

···

dblack@wobblini.net wrote:
> On Sat, 26 Aug 2006, William James wrote:

Ciao,
    Oliver

P.S.: But I now also found files, where more than one
       uuencoded section was inside...
       ... so, maybe reading the files complete also could make sense...
       (I didn't found such files before, so I thought it would make
        sense to read only until the first occurence of /^begin/)

Hi --

···

On Sun, 27 Aug 2006, Oliver Bandel wrote:

dblack@wobblini.net wrote:

Hi --

On Sat, 26 Aug 2006, William James wrote:

Oliver Bandel wrote:

Hello,

The Code:

====================================
def look_for_begin
   while line = gets
     if line =~ /^begin/
       puts line
       # return
     end
   end
end

ARGF.each { look_for_begin }

puts ARGV.map{|f|IO.readlines(f).find{|s|s=~/^begin/}}

Or maybe:

  puts ARGF.find {|s| /^begin/.match(s) }

[...]

Theese both things looks like if they would look for *all*
occurrnces of "begin", not the first one.

Well, if you know how Enumerable#find works, then they look like they
find the first one :slight_smile: (Though, as William pointed out, my code
answers the wrong question, because it only finds one for all the
files instead of one for each.)

David

--
http://www.rubypowerandlight.com => Ruby/Rails training & consultancy
   ----> SEE SPECIAL DEAL FOR RUBY/RAILS USERS GROUPS! <-----
http://dablog.rubypal.com => D[avid ]A[. ]B[lack's][ Web]log
Ruby for Rails => book, Ruby for Rails
http://www.rubycentral.org => Ruby Central, Inc.

Snipped & adapted for
   http://rubygarden.org:3000/Ruby/page/show/RubyOptimization

John Carter Phone : (64)(3) 358 6639
Tait Electronics Fax : (64)(3) 359 4632
PO Box 1645 Christchurch Email : john.carter@tait.co.nz
New Zealand

"We have more to fear from
  The Bungling of the Incompetent
  Than from the Machinations of the Wicked." (source unknown)

···

On Mon, 28 Aug 2006, Eric Hodel wrote:

No, don't use match, it is slow: [ruby-talk:204747]

Hi --

···

On Sun, 27 Aug 2006, William James wrote:

puts ARGV.map{|f|IO.readlines(f).find{|s|s=~/^begin/}}

Or maybe:

  puts ARGF.find {|s| /^begin/.match(s) }

No, this only finds one instance. Mine finds the first
in each file.

Whoops; so it does.

David

--
http://www.rubypowerandlight.com => Ruby/Rails training & consultancy
   ----> SEE SPECIAL DEAL FOR RUBY/RAILS USERS GROUPS! <-----
http://dablog.rubypal.com => D[avid ]A[. ]B[lack's][ Web]log
Ruby for Rails => book, Ruby for Rails
http://www.rubycentral.org => Ruby Central, Inc.