In the regular expression, you're saying start at the beginning and
skip at least 5 characters. But then we have to use parens to "note"
the part you're interested in, and then we have to pass 1 rather than
0 to begin, so it reports the location of the first noted match (0
would report where the entire Regexp matched, and that would be the
beginning of the line).
An alternative would be to slice the first n characters off the front
of the string and then do the match.
Eric
···
On Nov 25, 10:18 am, makoto kuwata <k...@kuwata-lab.com> wrote:
Hi, all.
Is it possible to specify start position of Regexp matching?
str = "foo bar baz"
m = /ba/.match(str)
p m.begin(0) #=> 4
m = /ba/.match(str, 5) # is it possible?
p m.begin(0) #=> 8 (if possible)
If it is possible, some kind of parser or scanner can be
implemented easily.
# StringScanner is a litte too big, I think.
====
Interested in hands-on, on-site Ruby training? See http://LearnRuby.com
for information about a well-reviewed class.
Another alternative is to use String#scan - we would have to know what the OP really wants to parse though to decide whether it's a feasible solution.
Kind regards
robert
···
On 25.11.2007 16:39, Eric I. wrote:
On Nov 25, 10:18 am, makoto kuwata <k...@kuwata-lab.com> wrote:
Hi, all.
Is it possible to specify start position of Regexp matching?
str = "foo bar baz"
m = /ba/.match(str)
p m.begin(0) #=> 4
m = /ba/.match(str, 5) # is it possible?
p m.begin(0) #=> 8 (if possible)
If it is possible, some kind of parser or scanner can be
implemented easily.
# StringScanner is a litte too big, I think.
You could try something like this:
m = /^.{5,}(ba)/.match(str)
p m.begin(1)
In the regular expression, you're saying start at the beginning and
skip at least 5 characters. But then we have to use parens to "note"
the part you're interested in, and then we have to pass 1 rather than
0 to begin, so it reports the location of the first noted match (0
would report where the entire Regexp matched, and that would be the
beginning of the line).
An alternative would be to slice the first n characters off the front
of the string and then do the match.
You could try something like this:
m = /^.{5,}(ba)/.match(str)
p m.begin(1)
In my program, start position is variable such as
def f(n)
m = /^.{n,}(ba)/.match(str)
...
end
In this case, /^.{n,}(ba)/ is created for each time.
It is not effective.
Robert Klemme wrote:
Another alternative is to use String#scan -
String#scan is useful only when regexp pattern is fixed.
input.scan(/FIXED-REGEXP/) do ... end
Using String#scan, it is not able to change regexp pattern
in the loop.
String#scan is useful only when regexp pattern is fixed.
input.scan(/FIXED-REGEXP/) do ... end
Using String#scan, it is not able to change regexp pattern
in the loop.
But in various situations it is possible to use a unified regexp for scanning or a regexp that comprises all other patterns.
I found that it is able to get MatchData by Regexp.last_match()
after String#index().
Well, I think Regexp#match(string, start=0) is the natural way,
but String#index(regexp, start) can be the good solution.
Thank you, Matz.
···
makoto kuwata <k...@kuwata-lab.com> wrote:
> str.index(/ba/, 5) ?
No, String#index returns Fixnum (position), but I want MatchData.
What's the difference between 1.9 Regexp#match(string, start=n) and
1.8 Regexp#match(string[n..-1])?? You have to create a sub-string with
the 1.8 version, but according to Robert Klemme (above) it's just
creating a pointer into the original string if you're not changing the
substring or original string. Besides, even if you did get a copy,
it's anonymous and should be garbage collected soon. If I understand
everything correctly, the 1.9 version would just basically be a
convenience feature over the 1.8 way?
You have to create a sub-string with
the 1.8 version, but according to Robert Klemme (above) it's just
creating a pointer into the original string if you're not changing the
substring or original string.
I'm not sure how to confirm it, other than just looking at the source,
and since I'm very poor at C programming, it probably wouldn't help
for me to try that. I'm sure Robert can demonstrate. But I will say
that I'm not suprised that they have different object_id, because they
are different objects. The copy on write is just a back-end
optimization where you pretend that two objects that point to the same
data are unique copies in the front-end, but you don't actually move
any data in the back-end until you have to (i,e., when one of the
objects is changed).
Regards,
Jordan
···
On Nov 26, 12:07 am, 7stud -- <bbxx789_0...@yahoo.com> wrote:
Jordan Callicoat wrote:
> You have to create a sub-string with
> the 1.8 version, but according to Robert Klemme (above) it's just
> creating a pointer into the original string if you're not changing the
> substring or original string.
A new ruby object is created, but the string buffer that it points to
is only copied on write.
···
On Nov 26, 2007 5:07 PM, 7stud -- <bbxx789_05ss@yahoo.com> wrote:
Jordan Callicoat wrote:
> You have to create a sub-string with
> the 1.8 version, but according to Robert Klemme (above) it's just
> creating a pointer into the original string if you're not changing the
> substring or original string.
And I think the functions of interest are str_new3 and str_new4
(called from rb_str_substr). Specifically, the assignment of
RSTRING(str2)->aux.shared. But like I said, I'm not great with C, so I
could be mistaken.
Regards,
Jordan
···
On Nov 26, 12:45 am, MonkeeSage <MonkeeS...@gmail.com> wrote:
I'm not sure how to confirm it, other than just looking at the source,
and since I'm very poor at C programming, it probably wouldn't help
for me to try that.
Here's a test to show that my reading of the source, and Robert's
assertion, is correct (there is probably a better way to do this...):
#!/usr/bin/env ruby
# disable GC to get fair reading of actual allocation cost
GC.disable
def free_megs
(`free -o`.split("\n")[1].split(' ')[3].to_i/1024).to_s
end
puts "Free megabytes " + free_megs
# make a one megabyte string
s1 = "a" * 1048576
s100 = "" # placeholder to be filled in below
# make 100 substrings of it
0.upto(101) { |i| eval("s#{i}=s1[0..-1]") }
Only one meg is used, which is the length of the original string. So,
by inductive inference, the substrings are only pointers back to the
original string rather than copies of the data.