Index of multiple similar strings

I'm trying to read through a file like this:
http://www.genomics.ceh.ac.uk/~milo/example.html
In order to count the number of N tracts and locate their
positions. My code goes like this:

dust_seq = # file in url above
nums = 0
d.dust_seq.scan(/[N]+/) do |blah|
  nums += 1
  puts "Index #{d.dust_seq.index(blah.to_s)}"
done
puts "Num of Ns: #{nums}"

In the example, the index for the third of the N groups
is reported as the same as the first, as it's small enough
to fit within it. Is there any way around this?
Thanks.

···

--
www.sirwilliamhope.org

"Milo Thurston" <nospam@linacreschoolofdefence.org> schrieb im Newsbeitrag
news:ck3c16$699$1@news.ox.ac.uk...

I'm trying to read through a file like this:
http://www.genomics.ceh.ac.uk/~milo/example.html
In order to count the number of N tracts and locate their
positions. My code goes like this:

dust_seq = # file in url above
nums = 0
d.dust_seq.scan(/[N]+/) do |blah|
nums += 1
puts "Index #{d.dust_seq.index(blah.to_s)}"
done
puts "Num of Ns: #{nums}"

In the example, the index for the third of the N groups
is reported as the same as the first, as it's small enough
to fit within it. Is there any way around this?
Thanks.

--
www.sirwilliamhope.org

dust_seq = # file in url above
nums = 0
dust_seq.scan(/N+/) do |blah|
  nums += 1
  puts "Index #{$`.length}"
end
puts "Num of Ns: #{nums}"

Kind regards

    robert

[Milo Thurston <nospam@linacreschoolofdefence.org>, 2004-10-07 14.19 CEST]

I'm trying to read through a file like this:
http://www.genomics.ceh.ac.uk/~milo/example.html
In order to count the number of N tracts and locate their
positions. My code goes like this:

dust_seq = # file in url above
nums = 0
d.dust_seq.scan(/[N]+/) do |blah|
  nums += 1
  puts "Index #{d.dust_seq.index(blah.to_s)}"
done
puts "Num of Ns: #{nums}"

In the example, the index for the third of the N groups
is reported as the same as the first, as it's small enough
to fit within it. Is there any way around this?

(not tested):

nums = 0
idx = 0

while idx = dust_seq.index /N+/, idx
  nums += 1
  puts "Index #{idx}"
  idx = Regexp.last_match.end(0)+1
end
puts "Num of Ns: #{nums}"

Excellent, thanks.
In which book/manual is $` described? I've not seen it before.

···

Robert Klemme <bob.news@gmx.net> wrote:

  puts "Index #{$`.length}"

--
www.sirwilliamhope.org

Thanks - the interpreter didn't like this line, though.
However, I got it working and it seems better than the $`
method, which caused some nasty memory hogging problems
(I now regret not compiling in an kernel OOM killer...).

···

Carlos <angus@quovadis.com.ar> wrote:

while idx = dust_seq.index /N+/, idx

--
www.sirwilliamhope.org

"Milo Thurston" <nospam@linacreschoolofdefence.org> schrieb im Newsbeitrag
news:ck3dg6$6sr$1@news.ox.ac.uk...

> puts "Index #{$`.length}"

Excellent, thanks.
In which book/manual is $` described? I've not seen it before.

It's in the Pickaxe (both versions) although not in the online version of
the first edition AFAIK. You can find about the other way in the Regexp
doc:
http://www.ruby-doc.org/docs/ProgrammingRuby/html/ref_c_regexp.html
http://www.ruby-doc.org/core/classes/Regexp.html

Kind regards

    robert

···

Robert Klemme <bob.news@gmx.net> wrote:

"Milo Thurston" <nospam@linacreschoolofdefence.org> schrieb im Newsbeitrag
news:ck3el8$7dm$1@news.ox.ac.uk...

···

Carlos <angus@quovadis.com.ar> wrote:
> while idx = dust_seq.index /N+/, idx

Thanks - the interpreter didn't like this line, though.
However, I got it working and it seems better than the $`
method, which caused some nasty memory hogging problems

                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
??? Care to explain?

    robert

[Milo Thurston <nospam@linacreschoolofdefence.org>, 2004-10-07 15.04 CEST]

> while idx = dust_seq.index /N+/, idx

Thanks - the interpreter didn't like this line, though.

You are right, it should have parens:

  while idx = dust_seq.index(/N+/, idx)

Strange...

method, which caused some nasty memory hogging problems

                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
??? Care to explain?

You create a String object for each call, you don't have this problem with

$~.begin(0)

Guy Decoux

"ts" <decoux@moulon.inra.fr> schrieb im Newsbeitrag
news:200410071323.i97DNZR00289@moulon.inra.fr...

>> method, which caused some nasty memory hogging problems
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> ??? Care to explain?

You create a String object for each call, you don't have this problem

with

$~.begin(0)

True. I thought of $~ also, but oversaw this aspect - "$`.length" just
looked cuter. :slight_smile: Thx.

    robert

That would explain it. Some of the strings I'm looking at are several MB
in size. I've been writing out the data to disk and flushing stdout, but
$` seemed to leave each complete sequence in memory, causing it to run
out rather rapidly.

···

ts <decoux@moulon.inra.fr> wrote:

> ??? Care to explain?
You create a String object for each call, you don't have this problem with
$~.begin(0)

--
www.sirwilliamhope.org

"Milo Thurston" <nospam@linacreschoolofdefence.org> schrieb im Newsbeitrag
news:ck3grg$8bv$1@news.ox.ac.uk...

> > ??? Care to explain?
> You create a String object for each call, you don't have this problem

with

> $~.begin(0)

That would explain it. Some of the strings I'm looking at are several MB
in size. I've been writing out the data to disk and flushing stdout, but
$` seemed to leave each complete sequence in memory, causing it to run
out rather rapidly.

Yes, that's the reason. I haven't though about this, but as you can see
each reference to $` creates a new string instance:

15:55:29 [robert]: ruby -e '"f".scan(/./) { 5.times{ puts $`.id } }'
134690392
134690368
134690344
134690320
134690296
15:55:58 [robert]:

Kind regards

    robert

···

ts <decoux@moulon.inra.fr> wrote: