I have a long string and I need to know the starting position of all
substrings matching a particular sequence. Most importantly, it needs
to be fast. Secondly, it would be nice if it was somewhat concise.
Here's the method:
def substring_positions(substring, string)
## fast, concise method??
end
my_substring = 'this'
my_string = 'this string this is a string this is it'
substring_positions(my_substring, my_string) # -> should return
[0, 12, 29]
This seems trivial to do, but looking at StringScanner, String#match,
and String#scan, nothing simple comes to mind. There must be a one-
liner somewhere for this kind of thing. I checked through facets and
didn't see anything...
I have a long string and I need to know the starting position of all
substrings matching a particular sequence. Most importantly, it needs
to be fast. Secondly, it would be nice if it was somewhat concise.
my_substring = 'this'
my_string = 'this string this is a string this is it'
substring_positions(my_substring, my_string) # -> should return
[0, 12, 29]
This seems trivial to do, but looking at StringScanner, String#match,
and String#scan, nothing simple comes to mind. There must be a one-
liner somewhere for this kind of thing. I checked through facets and
didn't see anything...
I'll get the ball rolling, though I doubt it's very efficient:
From: bwv549 [mailto:jtprince@gmail.com]
#
# This seems trivial to do, but looking at StringScanner, String#match,
# and String#scan, nothing simple comes to mind. There must be a one-
# liner somewhere for this kind of thing.
not a raw one-liner, but you can create one
root@pc4all:~# cat -n test.rb
1 class String
2 def scan_p ss
3 a = []
4 self.scan(ss){a << Regexp.last_match.begin(0)}
5 a
6 end
7 end
8
9 p "this string this is a string this is it".scan_p("this")
10
root@pc4all:~# ruby test.rb
[0, 12, 29]
root@pc4all:~#
Another idea is to use String#index() with the optional second minimum index parameter which you keep incrementing.
James Edward Gray II
···
On Jul 23, 2007, at 3:54 PM, bwv549 wrote:
I have a long string and I need to know the starting position of all
substrings matching a particular sequence. Most importantly, it needs
to be fast. Secondly, it would be nice if it was somewhat concise.
Here's the method:
def substring_positions(substring, string)
## fast, concise method??
end
my_substring = 'this'
my_string = 'this string this is a string this is it'
substring_positions(my_substring, my_string) # -> should return
[0, 12, 29]
> I have a long string and I need to know the starting position of all
> substrings matching a particular sequence. Most importantly, it needs
> to be fast. Secondly, it would be nice if it was somewhat concise.
<snip solutions>
Another idea is to use String#index() with the optional second
minimum index parameter which you keep incrementing.
James Edward Gray II
Indeed.
d-o-ibook-g4:~ d-o$ uname -a
Darwin d-o-ibook-g4.local 8.8.0 Darwin Kernel Version 8.8.0: Fri Sep
8 17:18:57 PDT 2006; root:xnu-792.12.6.obj~1/RELEASE_PPC Power
Macintosh powerp
class String
def scan_p ss
a =
self.scan(ss){a << Regexp.last_match.begin(0)}
a
end
end
substr = Proc.new do |s,seq|
pos=0
index=
while((i=s.index(seq,pos))!=nil)
index<<i
pos=i+seq.length
end
index
end
substr_positions = Proc.new do |string,substring|
string.enum_for(:scan, substring).map { $~.offset(0)[0] }
end
str="assbasscassmassthatassbass"
seq="ss"
Benchmark.bm do |x|
x.report("with index(): ") do; 1.upto(COUNT) do;
substr.call(seq,str); end end
x.report("with enum_for()/map(): ") do; 1.upto(COUNT) do;
substr_positions.call(str,seq); end end
x.report("with scan(): ") do; 1.upto(COUNT) do; str.scan_p(seq); end
end
end
d-o-ibook-g4:~ d-o$ ruby bs.rb
user system total real
with index(): 10.380000 0.060000 10.440000 ( 10.670129)
with enum_for()/map(): 88.740000 0.910000 89.650000 ( 98.726685)
with scan(): 57.620000 0.590000 58.210000 ( 65.996496)
···
On Jul 23, 9:10 pm, James Edward Gray II <ja...@grayproductions.net> wrote:
# On Jul 23, 9:10 pm, James Edward Gray II <ja...@grayproductions.net>
# wrote:
# > Another idea is to use String#index() with the optional second
# > minimum index parameter which you keep incrementing.