Problems with oniguruma lookahead

Xiong_Chiamiov · 9 September 2008 22:45

Ruby 1.8.6 with Oniguruama installed and working (everywhere else, this
seems to be my problem).

Let me preface this by saying that I am new to Ruby (and kinda jumped
in, rather than learning it properly), and regexes are not my thing -
that why I have nifty regex-checkers.

I am trying to extract some parts out of a string
("'Algebra' ") that I scraped from some html. I'm getting
nil returned from the expression:

Oniguruma::ORegexp.new("(?<=').*(?=' )").scan(scraped_html)

with scraped_html being the string mentioned above.

Doing some experimenting, I have found that the first part works just as
planned (eg, everything except the lookahead). Using wildcards (. and
*) works as well:

Oniguruma::ORegexp.new("(?<=').*(?=.)").scan(scraped_html)

returns [#<MatchData "Foo'<br">, #<MatchData "Bar'<br">], as
expected. However, anything else (<, b, \w, etc.) causes the regex to
not match.

I am quite befuddled about this, though I (almost certainly) know it is
my fault. Any help would be much appreciated.

Also, if I am violating any mailing-list netiquette, I would like to
know as well.

···

--
Posted via http://www.ruby-forum.com/.

Robert_K1 · 10 September 2008 09:05

With 1.9:

irb(main):001:0> s="'Algebra' "
=> "'Algebra' "
irb(main):002:0> s.scan %r{(?<=').*(?=' )}
=>
irb(main):003:0> s.scan %r{(?<=').*?(?=' )}
=> ["Algebra"]

Note the non greedy match. I usually rather do this in those cases:

irb(main):005:0> s.scan %r{'(.*?)' }
=> [["Algebra"]]

I.e. use groups to extract the part that I am interested in.

Kind regards

robert

···

2008/9/10 Xiong Chiamiov <xiong.chiamiov+ruby_forum@gmail.com>:

Ruby 1.8.6 with Oniguruama installed and working (everywhere else, this
seems to be my problem).

Let me preface this by saying that I am new to Ruby (and kinda jumped
in, rather than learning it properly), and regexes are not my thing -
that why I have nifty regex-checkers.

I am trying to extract some parts out of a string
("'Algebra' ") that I scraped from some html. I'm getting
nil returned from the expression:

Oniguruma::ORegexp.new("(?<=').*(?=' )").scan(scraped_html)

with scraped_html being the string mentioned above.

Doing some experimenting, I have found that the first part works just as
planned (eg, everything except the lookahead). Using wildcards (. and
*) works as well:

Oniguruma::ORegexp.new("(?<=').*(?=.)").scan(scraped_html)

returns [#<MatchData "Foo'<br">, #<MatchData "Bar'<br">], as
expected. However, anything else (<, b, \w, etc.) causes the regex to
not match.

I am quite befuddled about this, though I (almost certainly) know it is
my fault. Any help would be much appreciated.

--
use.inject do |as, often| as.you_can - without end

Xiong_Chiamiov · 10 September 2008 15:53

Robert Klemme wrote:

With 1.9:

irb(main):001:0> s="'Algebra' "
=> "'Algebra' "
irb(main):002:0> s.scan %r{(?<=').*(?=' )}
=>
irb(main):003:0> s.scan %r{(?<=').*?(?=' )}
=> ["Algebra"]

Note the non greedy match. I usually rather do this in those cases:

irb(main):005:0> s.scan %r{'(.*?)' }
=> [["Algebra"]]

I.e. use groups to extract the part that I am interested in.

Kind regards

robert

Ah, thank you very much. My regex learning was with PHP (PCRE, not
POSIX), which has some odd rules, especially regarding
greedy/non-greedy, so I'm still trying to recover from that.

Thanks again.

···

--
Posted via http://www.ruby-forum.com/\.

Topic		Replies	Views
[ANN] oniguruma 1.0.0 Released ruby-talk	3	100	28 March 2007
Oniguruma question ruby-talk	1	65	7 August 2007
Oniguruma bugs? ruby-talk	1	101	13 October 2004
[ANN] oniguruma 1.1.0 Released ruby-talk	0	115	10 May 2007
Regular Expression help - Replacing Regexp that worked with Oniguruma in 1.8.6 ruby-talk	5	147	22 February 2011

Problems with oniguruma lookahead

Related topics