I have written just a simple script to analyze a log file and (just for fun) I have written
exactly the same in python to see the difference and …
python is almost twice the faster doing the same job (???)
You can see attached files for sources.
Environment: P4 1.8Ghz, 256MB, WinXP Pro. Python 2.2.1, Ruby 1.7.2-4 - the Pragmatic distribution.
The analyzed file is about 420 Mbytes and python does it in about 60 sec and ruby in about 115
sec.
Have some suggestion how to speed the ruby code?
r = Regexp.new( “xxx\xxxxxxxx”, Regexp::IGNORECASE )
t = Time.new
puts t
File.new( “WEB.log” ).each_line do
|line|
next if line.index( "#" ) == 0
pom = line.split( "\t" )
if pom[1] =~ r
outf.write( line + "\n" )
end
end
te = Time.new
puts te
print "Time: "
puts (te - t).to_s
outf.close()
if in line 07 you are checking to see if the first character is a # and
thus ignore the line then do this…
next if line[0]==35
…where ‘35’ is the ascii value of ‘#’ and on line 10 change to…
outf.puts(line)
…where puts writes a line w/crlf…in your code you add two objects
(strings) together…which may effect speed.
Also on line 08 an 09 you split a line based on tab which creates an
array. It may be faster (albiet uglier) to do this:
08 t1 = line.index(“\t”)+1; pom = line[t1…(line.index(“\t”, t1)-1)]
09. if pom =~ r
-rich
···
-----Original Message-----
From: Tomas Brixi [mailto:tomas_brixi@yahoo.com]
Sent: Friday, September 20, 2002 12:23 PM
To: ruby-talk ML
Subject: Speed up suggestions
Hello,
I have written just a simple script to analyze a log file and
(just for fun) I have written exactly the same in python to
see the difference and … python is almost twice the faster
doing the same job (???) You can see attached files for sources.
Environment: P4 1.8Ghz, 256MB, WinXP Pro. Python 2.2.1, Ruby
1.7.2-4 - the Pragmatic distribution. The analyzed file is
about 420 Mbytes and python does it in about 60 sec and ruby
in about 115 sec. Have some suggestion how to speed the ruby code?
There are several things you can do to speed up the ruby code, and
probably the python code… These are really general ideas:
Don’t index the whole line in the case of it NOT being a comment.
Don’t split the whole line if you only want the second field.
I had to tweak my script to work on a different file, splitting on
colon and such. But here are my changes:
outside the loop:
comment_re = /^#/
inside the loop:
next if line =~ comment_re
if line =~ /[^:]+:([^:]+) then
pom = $1
if pom =~ r
outf.write( line + “\n” )
This speeds up my run by more than half for an admittedly small and
unscientific sample. But the changes will speed up python as well… I
suggest you look at the “big language shootout” at http://www.bagley.org/~doug/shootout/ for some speed differences
between python and ruby… python is only moderatly faster than perl or
ruby… they all have their own pros and cons (like ruby’s method
dispatching blows away python and perl, but it’s not as fast in
numerics).
···
On Friday, September 20, 2002, at 09:22 AM, Tomas Brixi wrote:
I have written just a simple script to analyze a log file and (just
for fun) I have written
exactly the same in python to see the difference and …
python is almost twice the faster doing the same job (???)
You can see attached files for sources.
Environment: P4 1.8Ghz, 256MB, WinXP Pro. Python 2.2.1, Ruby 1.7.2-4 -
the Pragmatic distribution.
The analyzed file is about 420 Mbytes and python does it in about 60
sec and ruby in about 115
sec.
Have some suggestion how to speed the ruby code?
WOW! I just found something out that is pretty interesting… too bad I
don’t understand why yet… By sheer luck I noticed in my email that I
didn’t extract the above regex to be instantiated outside the loop. So
I did it. And it slowed down, a lot! here’s the profile:
BEFORE:
time = 0.03045900 sec
% cumulative self self total
time seconds seconds calls ms/call ms/call name
50.00 0.01 0.01 1 7.81 15.62 IO#each_line
50.00 0.02 0.01 86 0.09 0.09 String#=~
AFTER:
time = 0.04492900 sec
% cumulative self self total
time seconds seconds calls ms/call ms/call name
66.67 0.03 0.03 1 31.25 46.88 IO#each_line
33.33 0.05 0.02 127 0.12 0.12 String#=~
So I made a new version and got rid of all the regex instances (diff
follows) and the profile looks like:
time = 0.00842100 sec
% cumulative self self total
time seconds seconds calls ms/call ms/call name
100.00 0.01 0.01 1 7.81 7.81 IO#each_line
(we are now so small and fast (file sould be bigger for better
sampling) that the profile has no relevant data… subsequent runs show
the sampling jumping all over but the total time is the same)
On Tue, Sep 24, 2002 at 12:22:56AM +0900, Vincent Foley wrote:
I other numbers, when I searched for any 6-letter word, here are the
time outputs:
–
[ Wojtek gminick Walczak ][ http://gminick.linuxsecurity.pl/ ]
[ gminick (at) hacker.pl ][ gminick (at) underground.org.pl/ ]
NOW, can someone tell me why regexen in place are much faster than
instantiated regexen?
Can somebody tell me why the plural of “regex” is “regexen”?
German influence?
···
----- Original Message -----
–
Giuseppe “Oblomov” Bilotta
“E la storia dell’umanità, babbo?”
“Ma niente: prima si fanno delle cazzate,
poi si studia che cazzate si sono fatte”
(Altan)
(“And what about the history of the human race, dad?”
“Oh, nothing special: first they make some foolish things,
then you study which foolish things have been made”)