Problem with ".scan"

RUBY's complaining about the following 3 lines of code. I've got it in a
new program, but, I copied it directly from an older, working program.
Can someone help me understand what's the problem with the "scan" line,
or, apparently, the "each" line?

Thanks,
Peter

10 Dir.glob("*.ps").each do |psfile|
11 file_contents = File.read(psfile)
12 file_contents.scan(/\%\%Pages: (\d{1,5})[ ]+\n/) do

Error message:

E:/PageCounts/test1.rb:12:in `scan': string modified (RuntimeError)
  from E:/PageCounts/test1.rb:12
  from E:/PageCounts/test1.rb:10:in `each'
  from E:/PageCounts/test1.rb:10

···

--
Posted via http://www.ruby-forum.com/.

the modification is probably here. can't you show us everything up through
the matching end?

-a

···

On Wed, 27 Sep 2006, Peter Bailey wrote:

RUBY's complaining about the following 3 lines of code. I've got it in a
new program, but, I copied it directly from an older, working program.
Can someone help me understand what's the problem with the "scan" line,
or, apparently, the "each" line?

Thanks,
Peter

10 Dir.glob("*.ps").each do |psfile|
11 file_contents = File.read(psfile)
12 file_contents.scan(/\%\%Pages: (\d{1,5})+\n/) do

--
in order to be effective truth must penetrate like an arrow - and that is
likely to hurt. -- wei wu wei

Peter Bailey wrote:

RUBY's complaining about the following 3 lines of code. I've got it in a
new program, but, I copied it directly from an older, working program.
Can someone help me understand what's the problem with the "scan" line,
or, apparently, the "each" line?

Thanks,
Peter

10 Dir.glob("*.ps").each do |psfile|
11 file_contents = File.read(psfile)
12 file_contents.scan(/\%\%Pages: (\d{1,5})+\n/) do

Whoa. The error message suggests that the source string is being modified
while being read, but your listing stops before the part where that might
happen.

Error message:

E:/PageCounts/test1.rb:12:in `scan': string modified (RuntimeError)

It's always a good idea to post a short working example in a case like this,
not just the part where you think the problem is. That would have provided
us with the true problem area.

···

--
Paul Lutus
http://www.arachnoid.com

unknown wrote:

11 file_contents = File.read(psfile)
12 file_contents.scan(/\%\%Pages: (\d{1,5})+\n/) do

the modification is probably here. can't you show us everything up
through
the matching end?

-a

Sorry. It's a bit much. That's why I was holding back. Here's the whole
script.

require 'kirbybase'
Dir.chdir("E:/pagecounts")
#First, create the database table.
db = KirbyBase.new
# If table exists, delete it.
db.drop_table(:pageinfo) if db.table_exists?(:pageinfo)
pageinfo_tbl = db.create_table(:pageinfo,
                                :filename, {:DataType=>:String,
:Index=>1},
                                :lconstant, :String,
                                :compcode, :String,
                                :primecode, :Integer,
                                :costcenter, :String,
                                :acctgroup, :Integer,
                                :blank, :String,
                                :description, :String,
                                :pagecount, :Float,
                                :sjccode, :String,
                                :fullname, {:DataType=>:String,
:Index=>2}
                   )
# Import the csv file.
pageinfo_tbl.import_csv('McArdle_indexes.csv')

=begin
Parse each postscript print file in the polled directory. Create
variables for:
the number of pages in each file; the number of blank pages in each
file; and,
what exact pages are blank.
=end
Dir.glob("*.ps").each do |psfile|
  file_contents = File.read(psfile)
  file_contents.scan(/\%\%Pages: (\d{1,5})+\n/) do
    totalpages = $1
    if (totalpages.to_i % 2) !=0 then
      newtotalpages = totalpages.to_i + 1
      file_contents << "\%\%Blank page for Asura.\n\%\%Page:
       #{newtotalpages.to_i}\nshowpage\n"
      File.open(psfile, "w") { |f| f.print file_contents }
      FileUtils.touch(psfile)
    end

=begin
Find blank pages in the postscript file. Look for the regular expression
that
sees a page callout followed by postscript data that does not include
data in parentheses. Any type on a postscript page is enclosed in
parentheses,
so, that's why this is a legitimate search. Blank pages have no
parenthesized
data.
=end
  blanks =
  file_contents.scan(/\%\%Page: [()0-9{1,5}]
([0-9]{1,5})\n[^\(.*\)]\%\%Page/)
    do |match|
  blanks.push($1)
  end
  file_contents.scan(/\%\%Blank page for Asura.\n/) do |match|
  blanks.push(totalpages.to_i + 1)
  end

=begin
Open a "pageinfo" file. Put page information about the file into it.
Notice that the variable for the total number of pages differs depending
on whether a "newtotalpages" variable exists. And, that variable only
exists if the original page count was odd and a blank had to be added.
=end
  filename = File.basename("#{psfile}", '.ps')
  pageinfofile = File.basename("#{psfile}", '.ps') + ".pageinfo"
  File.open("E:/pagecounts/#{pageinfofile}", "a") do |fileinfo|
   if newtotalpages then
    fileinfo << #{filename}\n << "Total number of pages in this PDF:
      #{newtotalpages}\n" <<
  "Number of blank pages in this PDF: #{blanks.size}\n" <<
  "Specific pages that are blank in this PDF: " <<
        "#{blanks.join(', ')}\n"
   else
    fileinfo << #{filename}\n <<
    "Total number of pages in this PDF: #{totalpages}\n" <<
    "Number of blank pages in this PDF: #{blanks.size}\n" <<
    "Specific pages that are blank in this PDF: " <<
    "#{blanks.join(', ')}\n"
   end
   end
   end
end

=begin
Back to the database table. . . .
Query against the table and match the filename in the directory with
whichever entry
in the "filename" column of the table matches. Then, if there's a match,
populate
the "pagecount" field in that row of the table with the variable for the
page count, as
found above. That variable name is "newtotalpages."
=end

Dir.glob("*.ps").each do |dirfile|
result = pageinfo_tbl.select(:filename) { |r| dirfile =~
  Regexp.new(r.filename) }
pageinfo_tbl.update { |r| r.name ==
{filename}.set(:pagecount=>#{newtotalpages}) } unless result.nil?
end

···

On Wed, 27 Sep 2006, Peter Bailey wrote:

--
Posted via http://www.ruby-forum.com/\.

unknown wrote:

11 file_contents = File.read(psfile)
12 file_contents.scan(/\%\%Pages: (\d{1,5})+\n/) do

the modification is probably here. can't you show us everything up
through
the matching end?

-a

Sorry. It's a bit much. That's why I was holding back. Here's the whole
script.

<snip>

Dir.glob("*.ps").each do |psfile|
file_contents = File.read(psfile)
file_contents.scan(/\%\%Pages: (\d{1,5})+\n/) do
   totalpages = $1
   if (totalpages.to_i % 2) !=0 then
     newtotalpages = totalpages.to_i + 1
     file_contents << "\%\%Blank page for Asura.\n\%\%Page:

                      ^^
                      the modification is question

      #{newtotalpages.to_i}\nshowpage\n"
     File.open(psfile, "w") { |f| f.print file_contents }
     FileUtils.touch(psfile)
   end

so, ruby is correct, you are modifying a string while in an in-progress scan
block. easy-cheasy.

kind regards.

-a

···

On Thu, 28 Sep 2006, Peter Bailey wrote:

On Wed, 27 Sep 2006, Peter Bailey wrote:

--
in order to be effective truth must penetrate like an arrow - and that is
likely to hurt. -- wei wu wei

unknown wrote:

···

On Thu, 28 Sep 2006, Peter Bailey wrote:

-a

Sorry. It's a bit much. That's why I was holding back. Here's the whole
script.

<snip>

Dir.glob("*.ps").each do |psfile|
file_contents = File.read(psfile)
file_contents.scan(/\%\%Pages: (\d{1,5})+\n/) do
   totalpages = $1
   if (totalpages.to_i % 2) !=0 then
     newtotalpages = totalpages.to_i + 1
     file_contents << "\%\%Blank page for Asura.\n\%\%Page:

                      ^^
                      ^^
                      ^^
                      ^^
                      the modification is question

      #{newtotalpages.to_i}\nshowpage\n"
     File.open(psfile, "w") { |f| f.print file_contents }
     FileUtils.touch(psfile)
   end

so, ruby is correct, you are modifying a string while in an in-progress
scan
block. easy-cheasy.

kind regards.

-a

Thanks. I ended the scan block before doing any file writing. That
seemed to do the trick. It still confuses me, though, because, this code
was borrowed from an existing script that I've been using for 6 months,
and, that part of it is just as you see it above.

--
Posted via http://www.ruby-forum.com/\.

probably because totalpages is always 1 - it's never even - in your new script
the number of pages is always 2 (or 0) i'm guessing, and so the bug is
triggered. if i we're you i'd update the other script - it's a bug in
waiting.

regards.

-a

···

On Thu, 28 Sep 2006, Peter Bailey wrote:

unknown wrote:

On Thu, 28 Sep 2006, Peter Bailey wrote:

-a

Sorry. It's a bit much. That's why I was holding back. Here's the whole
script.

<snip>

Dir.glob("*.ps").each do |psfile|
file_contents = File.read(psfile)
file_contents.scan(/\%\%Pages: (\d{1,5})+\n/) do
   totalpages = $1
   if (totalpages.to_i % 2) !=0 then
     newtotalpages = totalpages.to_i + 1
     file_contents << "\%\%Blank page for Asura.\n\%\%Page:

                      ^^
                      the modification is question

      #{newtotalpages.to_i}\nshowpage\n"
     File.open(psfile, "w") { |f| f.print file_contents }
     FileUtils.touch(psfile)
   end

so, ruby is correct, you are modifying a string while in an in-progress
scan
block. easy-cheasy.

kind regards.

-a

Thanks. I ended the scan block before doing any file writing. That
seemed to do the trick. It still confuses me, though, because, this code
was borrowed from an existing script that I've been using for 6 months,
and, that part of it is just as you see it above.

--
in order to be effective truth must penetrate like an arrow - and that is
likely to hurt. -- wei wu wei

unknown wrote:

···

On Thu, 28 Sep 2006, Peter Bailey wrote:

Dir.glob("*.ps").each do |psfile|

                      the modification is question

-a

Thanks. I ended the scan block before doing any file writing. That
seemed to do the trick. It still confuses me, though, because, this code
was borrowed from an existing script that I've been using for 6 months,
and, that part of it is just as you see it above.

probably because totalpages is always 1 - it's never even - in your new
script
the number of pages is always 2 (or 0) i'm guessing, and so the bug is
triggered. if i we're you i'd update the other script - it's a bug in
waiting.

regards.

-a

Well, I know that they're not always odd or even. They've been a mix of
both. But, I understand what you're saying. I will change my original
script. Basically, and, please tell me if I understand this correctly:
if I'm going to do a scan of a file, open the file, scan it, and then
close it. Right?

--
Posted via http://www.ruby-forum.com/\.

yup. just remember to avoid this

   string = 'foobar'

   string.scan(%r/foo/) do |word|
     string << 'foo' # can't modify while scanning
   end

regards.

-a

···

On Thu, 28 Sep 2006, Peter Bailey wrote:

Well, I know that they're not always odd or even. They've been a mix of
both. But, I understand what you're saying. I will change my original
script. Basically, and, please tell me if I understand this correctly:
if I'm going to do a scan of a file, open the file, scan it, and then
close it. Right?

--
in order to be effective truth must penetrate like an arrow - and that is
likely to hurt. -- wei wu wei

unknown wrote:

···

On Thu, 28 Sep 2006, Peter Bailey wrote:

Well, I know that they're not always odd or even. They've been a mix of
both. But, I understand what you're saying. I will change my original
script. Basically, and, please tell me if I understand this correctly:
if I'm going to do a scan of a file, open the file, scan it, and then
close it. Right?

yup. just remember to avoid this

   string = 'foobar'

   string.scan(%r/foo/) do |word|
     string << 'foo' # can't modify while scanning
   end

regards.

-a

Thanks a lot, -a! I've cleaned up my code. But, if you notice way above,
I've got a File.read in the line before the file scan. If I do an "end"
for the file scan, my "read" is still open, right? Meaning, I can still
do stuff to the open file.

--
Posted via http://www.ruby-forum.com/\.

If you're referring to your original code, then no. You use File.read(name) which returns the whole file in a single string. No open connection is returned.

Btw, for efficiency reasons if your files are large you might consider using

File.foreach(file_name) do |line|
    ....
end

Or use File.readlines instead of File.read - that way you get an array with lines and not the whole file in one piece.

Kind regards

    robert

···

Peter Bailey <pbailey@bna.com> wrote:

unknown wrote:

On Thu, 28 Sep 2006, Peter Bailey wrote:

Well, I know that they're not always odd or even. They've been a
mix of both. But, I understand what you're saying. I will change my
original script. Basically, and, please tell me if I understand
this correctly: if I'm going to do a scan of a file, open the file,
scan it, and then close it. Right?

yup. just remember to avoid this

   string = 'foobar'

   string.scan(%r/foo/) do |word|
     string << 'foo' # can't modify while scanning
   end

regards.

-a

Thanks a lot, -a! I've cleaned up my code. But, if you notice way
above, I've got a File.read in the line before the file scan. If I do
an "end" for the file scan, my "read" is still open, right? Meaning,
I can still do stuff to the open file.

Robert Klemme wrote:

···

Peter Bailey <pbailey@bna.com> wrote:

Thanks a lot, -a! I've cleaned up my code. But, if you notice way
above, I've got a File.read in the line before the file scan. If I do
an "end" for the file scan, my "read" is still open, right? Meaning,
I can still do stuff to the open file.

If you're referring to your original code, then no. You use
File.read(name)
which returns the whole file in a single string. No open connection is
returned.

Btw, for efficiency reasons if your files are large you might consider
using

File.foreach(file_name) do |line|
    ....
end

Or use File.readlines instead of File.read - that way you get an array
with
lines and not the whole file in one piece.

Kind regards

    robert

Thanks, Robert. I'll look into that line-by-line technique. The reason I
probably haven't used it is that I often need to search for or
accommodate data that spans over multiple lines.

--
Posted via http://www.ruby-forum.com/\.

Yeah, in that case File.read is clearly superior (if the file fits into memory that is). For me line by line is the default because it scales better and I switch only to slurp in at once if I need line spanning. But then again my typical problem might be different from yours so your different default might actually be the better solution for you.

Kind regards

    robert

···

Peter Bailey <pbailey@bna.com> wrote:

Thanks, Robert. I'll look into that line-by-line technique. The
reason I probably haven't used it is that I often need to search for
or accommodate data that spans over multiple lines.