My CPU Hates Me

Ari_Brown · 6 July 2007 18:04

Pattern matching problem. This time, it doesn't print out any thing and just soaks up my CPU. I tried slowly adding more and more for it to do, and it worked great -- until TABLE7. Then it just soaks up my CPU and makes me cry. At first, when nothing was printing, I added $stdout.flush to make it print. But it didn't print! This makes me think that it's something in the when part.

Whats going on?

Help!

lines.each do |line|
   case line
   when /^"(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)"$/
     TABLE1.puts("\"#{$1}\",\"#{$2}\",\"#{$3}\",\"#{$4}\",\"#{$5}\",\"#{$6}\",\"#{$7}\",\"#{$8}\",\"#{$9}\""); print '-'; $stdout.flush
     TABLE2.puts("\"#{$10}\",\"#{$11}\",\"#{$12}\""); print '-'; $stdout.flush
     TABLE3.puts("\"#{$13}\",\"#{$14}\",\"#{$15}\""); print '-'; $stdout.flush
     TABLE4.puts("\"#{$16}\",\"#{$17}\""); print '-'; $stdout.flush
     TABLE5.puts("\"#{$18}\",\"#{$19}\""); print '-'; $stdout.flush
     TABLE6.puts("\"#{$20}\",\"#{$21}\""); print '-'; $stdout.flush
     TABLE7.print("\"#{$22}\",\"#{$23}\""); print'!'; $stdout.flush
     TABLE7.print("\"#{$24}\",\"#{$25}\""); print'!'; $stdout.flush
     TABLE7.print("\"#{$26}\",\"#{$27}\""); print'!'; $stdout.flush
     TABLE7.print("\"#{$28}\",\"#{$29}\""); print'!'; $stdout.flush
     TABLE7.print("\"#{$30}\",\"#{$31}\",\"#{$32}\""); print '-'; $stdout.flush
# TABLE8.puts("\"#{33}\",\"#{34}\""); print '-'; $stdout.flush
     puts; $stdout.flush
     print '.'; $stdout.flush
   else
     print '$'
   end
end

-------------------------------------------------------|
~ Ari
crap my sig won't fit

Michael_Glaesemann · 6 July 2007 18:11

You don't happen to be trying to parse a CSV file by any chance? If so, why not use FasterCSV?

Michael Glaesemann
grzm seespotcode net

···

On Jul 6, 2007, at 13:04 , Ari Brown wrote:

/^"(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)"$/

Jim_Clark · 6 July 2007 18:44

Ari Brown wrote:

Pattern matching problem. This time, it doesn't print out any thing and just soaks up my CPU. I tried slowly adding more and more for it to do, and it worked great -- until TABLE7. Then it just soaks up my CPU and makes me cry. At first, when nothing was printing, I added $stdout.flush to make it print. But it didn't print! This makes me think that it's something in the when part.

Whats going on?

Help!

lines.each do |line|
case line
when /^"(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)"$/

There are several ways to optimize the regular expression but the most important thing is to not be greedy. What I mean by this is that using (.*) matches everything to the end of the line and then the regular expression backtracks to find the next " character specified. It will choose the " character closest to the end of the line but that is not the one you want so it backtracks again and again and so on wasting CPU cycles.

Instead of being greedy and using "(.*)", your best bet would be to use "([^"]*)". This assumes that there are no " characters within each field. This stops the regex from getting past the next " character of each field and eliminates all that backtracking.

Alternatively, you could look at splitting the line on the comma (see class String - RDoc Documentation) and end up with a nice array to reference each field. You'll still have the quotes that you'll need to strip from each item (unless you use the three character separator of "," and manually remove the leading " character from the first element and the trailing " character from the last element). This will likely be the fastest way since the regex doesn't need to be evaluated. However, you may need to put in more logic if not all lines are to be split in the text file such as comment lines.

Regards,
Jim

James_Britt · 6 July 2007 20:29

Ari also check out Unit Testing in any of the Ruby books. You can test your regex for failures as you go. Regex is one of those instances where UT is really immediately and obviously useful. (tho UT is truthfully useful all the time)

Also, sometimes reading giant files, even if you use readline or another way to break it into smaller parts to work with, you should consider reading x bytes of the file at a time. (you can add a routine to check where the last \n appeared before x bytes and then use the location (in bytes) of the last \n to rewind the file to and start reading again for x bytes more or until end of file.

RegEx is great but will become a big resource hog if you just let it go on a big file. Chop it up into smaller tasks, and you can report on the progress of the whole process.

John Joyce

···

On Jul 6, 2007, at 1:44 PM, Jim Clark wrote:

Ari Brown wrote:

Pattern matching problem. This time, it doesn't print out any thing and just soaks up my CPU. I tried slowly adding more and more for it to do, and it worked great -- until TABLE7. Then it just soaks up my CPU and makes me cry. At first, when nothing was printing, I added $stdout.flush to make it print. But it didn't print! This makes me think that it's something in the when part.

Whats going on?

Help!

lines.each do |line|
case line
when /^"(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)"$/

There are several ways to optimize the regular expression but the most important thing is to not be greedy. What I mean by this is that using (.*) matches everything to the end of the line and then the regular expression backtracks to find the next " character specified. It will choose the " character closest to the end of the line but that is not the one you want so it backtracks again and again and so on wasting CPU cycles.

Instead of being greedy and using "(.*)", your best bet would be to use "([^"]*)". This assumes that there are no " characters within each field. This stops the regex from getting past the next " character of each field and eliminates all that backtracking.

Alternatively, you could look at splitting the line on the comma (see class String - RDoc Documentation) and end up with a nice array to reference each field. You'll still have the quotes that you'll need to strip from each item (unless you use the three character separator of "," and manually remove the leading " character from the first element and the trailing " character from the last element). This will likely be the fastest way since the regex doesn't need to be evaluated. However, you may need to put in more logic if not all lines are to be split in the text file such as comment lines.

Regards,
Jim

Ari_Brown · 7 July 2007 02:12

FasterCSV.... I think I will! Thanks!

Ari
-------------------------------------------|
Nietzsche is my copilot

···

On Jul 6, 2007, at 2:11 PM, Michael Glaesemann wrote:

On Jul 6, 2007, at 13:04 , Ari Brown wrote:

/^"(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)"$/

You don't happen to be trying to parse a CSV file by any chance? If so, why not use FasterCSV?

Topic		Replies	Views
Q: n-times matching ruby-talk	4	91	20 June 2002
Doesnt work ruby-talk	4	86	11 May 2006
Pattern Matching Problem ruby-talk	12	77	6 July 2007
Net::Telnet ruby-talk	2	71	20 June 2007
[SUMMARY] Tournament Matchups (#105) ruby-talk	0	78	14 December 2006

My CPU Hates Me

Related topics