Index of multiple similar strings

Milo_Thurston2 · 7 October 2004 12:19

I'm trying to read through a file like this:
http://www.genomics.ceh.ac.uk/~milo/example.html
In order to count the number of N tracts and locate their
positions. My code goes like this:

dust_seq = # file in url above
nums = 0
d.dust_seq.scan(/[N]+/) do |blah|
nums += 1
puts "Index #{d.dust_seq.index(blah.to_s)}"
done
puts "Num of Ns: #{nums}"

In the example, the index for the third of the N groups
is reported as the same as the first, as it's small enough
to fit within it. Is there any way around this?
Thanks.

···

--
www.sirwilliamhope.org

Robert · 7 October 2004 12:29

"Milo Thurston" <nospam@linacreschoolofdefence.org> schrieb im Newsbeitrag
news:ck3c16$699$1@news.ox.ac.uk...

I'm trying to read through a file like this:
http://www.genomics.ceh.ac.uk/~milo/example.html
In order to count the number of N tracts and locate their
positions. My code goes like this:

dust_seq = # file in url above
nums = 0
d.dust_seq.scan(/[N]+/) do |blah|
nums += 1
puts "Index #{d.dust_seq.index(blah.to_s)}"
done
puts "Num of Ns: #{nums}"

In the example, the index for the third of the N groups
is reported as the same as the first, as it's small enough
to fit within it. Is there any way around this?
Thanks.

--
www.sirwilliamhope.org

dust_seq = # file in url above
nums = 0
dust_seq.scan(/N+/) do |blah|
nums += 1
puts "Index #{$`.length}"
end
puts "Num of Ns: #{nums}"

Kind regards

robert

Carlos · 7 October 2004 12:44

[Milo Thurston <nospam@linacreschoolofdefence.org>, 2004-10-07 14.19 CEST]

I'm trying to read through a file like this:
http://www.genomics.ceh.ac.uk/~milo/example.html
In order to count the number of N tracts and locate their
positions. My code goes like this:

dust_seq = # file in url above
nums = 0
d.dust_seq.scan(/[N]+/) do |blah|
nums += 1
puts "Index #{d.dust_seq.index(blah.to_s)}"
done
puts "Num of Ns: #{nums}"

In the example, the index for the third of the N groups
is reported as the same as the first, as it's small enough
to fit within it. Is there any way around this?

(not tested):

nums = 0
idx = 0

while idx = dust_seq.index /N+/, idx
  nums += 1
  puts "Index #{idx}"
  idx = Regexp.last_match.end(0)+1
end
puts "Num of Ns: #{nums}"

Milo_Thurston2 · 7 October 2004 12:44

Excellent, thanks.
In which book/manual is $` described? I've not seen it before.

···

Robert Klemme <bob.news@gmx.net> wrote:

puts "Index #{$`.length}"

--
www.sirwilliamhope.org

Milo_Thurston2 · 7 October 2004 13:04

Thanks - the interpreter didn't like this line, though.
However, I got it working and it seems better than the $`
method, which caused some nasty memory hogging problems
(I now regret not compiling in an kernel OOM killer...).

···

Carlos <angus@quovadis.com.ar> wrote:

while idx = dust_seq.index /N+/, idx

--
www.sirwilliamhope.org

Robert · 7 October 2004 13:04

"Milo Thurston" <nospam@linacreschoolofdefence.org> schrieb im Newsbeitrag
news:ck3dg6$6sr$1@news.ox.ac.uk...

> puts "Index #{$`.length}"

Excellent, thanks.
In which book/manual is $` described? I've not seen it before.

It's in the Pickaxe (both versions) although not in the online version of
the first edition AFAIK. You can find about the other way in the Regexp
doc:
http://www.ruby-doc.org/docs/ProgrammingRuby/html/ref_c_regexp.html
http://www.ruby-doc.org/core/classes/Regexp.html

Kind regards

robert

···

Robert Klemme <bob.news@gmx.net> wrote:

Robert · 7 October 2004 13:14

"Milo Thurston" <nospam@linacreschoolofdefence.org> schrieb im Newsbeitrag
news:ck3el8$7dm$1@news.ox.ac.uk...

···

Carlos <angus@quovadis.com.ar> wrote:
> while idx = dust_seq.index /N+/, idx

Thanks - the interpreter didn't like this line, though.
However, I got it working and it seems better than the $`
method, which caused some nasty memory hogging problems

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
??? Care to explain?

robert

Carlos · 7 October 2004 13:20

[Milo Thurston <nospam@linacreschoolofdefence.org>, 2004-10-07 15.04 CEST]

> while idx = dust_seq.index /N+/, idx

Thanks - the interpreter didn't like this line, though.

You are right, it should have parens:

while idx = dust_seq.index(/N+/, idx)

Strange...

ts1 · 7 October 2004 13:23

method, which caused some nasty memory hogging problems

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
??? Care to explain?

You create a String object for each call, you don't have this problem with

$~.begin(0)

Guy Decoux

Robert · 7 October 2004 13:29

"ts" <decoux@moulon.inra.fr> schrieb im Newsbeitrag
news:200410071323.i97DNZR00289@moulon.inra.fr...

>> method, which caused some nasty memory hogging problems
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> ??? Care to explain?

You create a String object for each call, you don't have this problem

with

$~.begin(0)

True. I thought of $~ also, but oversaw this aspect - "$`.length" just
looked cuter. Thx.

robert

Milo_Thurston2 · 7 October 2004 13:39

That would explain it. Some of the strings I'm looking at are several MB
in size. I've been writing out the data to disk and flushing stdout, but
$` seemed to leave each complete sequence in memory, causing it to run
out rather rapidly.

···

ts <decoux@moulon.inra.fr> wrote:

> ??? Care to explain?
You create a String object for each call, you don't have this problem with
$~.begin(0)

--
www.sirwilliamhope.org

Robert · 7 October 2004 13:59

"Milo Thurston" <nospam@linacreschoolofdefence.org> schrieb im Newsbeitrag
news:ck3grg$8bv$1@news.ox.ac.uk...

> > ??? Care to explain?
> You create a String object for each call, you don't have this problem

with

> $~.begin(0)

That would explain it. Some of the strings I'm looking at are several MB
in size. I've been writing out the data to disk and flushing stdout, but
$` seemed to leave each complete sequence in memory, causing it to run
out rather rapidly.

Yes, that's the reason. I haven't though about this, but as you can see
each reference to $` creates a new string instance:

15:55:29 [robert]: ruby -e '"f".scan(/./) { 5.times{ puts $`.id } }'
134690392
134690368
134690344
134690320
134690296
15:55:58 [robert]:

Kind regards

robert

···

ts <decoux@moulon.inra.fr> wrote:

Topic		Replies	Views
Q: n-times matching ruby-talk	4	97	20 June 2002
I need a string#all_indices method--is there such a thing? ruby-talk	21	180	29 August 2009
No way of looking for a regrexp match starting from a particular point in a string? ruby-talk	25	202	4 June 2007
Using reg expr with array.index ruby-talk	11	135	2 January 2008
Searching/regex ruby-talk	5	202	5 January 2016

Index of multiple similar strings

Related topics