Look-behind in oniguruma

Apparently oniguruma supports look-behind. Is there any documentation on how
to use this feature?

for example, if I had the string "~ABC~DE" and I want to return a list of
letters in the string which are preceeded by '~' ( ['A','D'] in this case) how
might I use the look-behind feature in oniguruma to achieve this? or, how
would I get a list of letters in the string which are not preceeded by '~'
(['B','C', 'E'] in this example.

(I know there are other ways of doing this, I'm just posing this as an example
of using look-behind).

Here's one that's a bit trickier: What if I had "~(ABC)DE" and I want the
tilde (a negation operator) to apply to each letter within the parens that it
preceeds, so that I would get ['A','B','C'], but in the case where the input
string is "(ABC)DE" I would get an empty list.... and then of course I would
want them to be nestable: "~(~ABC)" ('A' should not appear in the list in this
case since it's doubly negated - OK, that's probably going too far and
maybe it's getting to the point where I should break out RACC :wink:

Phil

Apparently oniguruma supports look-behind. Is there any documentation on how
to use this feature?

Essentially, they are the same as look-aheads ... zero-width assertions,
except that the look-behind expression must be a fixed width pattern (no
indeterminate quantifiers), and no captures are allowed in a negative
look-behind

for example, if I had the string "~ABC~DE" and I want to return a list of
letters in the string which are preceeded by '~' ( ['A','D'] in this case) how
might I use the look-behind feature in oniguruma to achieve this? or, how
would I get a list of letters in the string which are not preceeded by '~'
(['B','C', 'E'] in this example.

  str = "~ABC~DE"
  p str.scan(/(?<=~)[A-Z]/)
  p str.scan(/(?<!~)[A-Z]/)

gives:

  ["A", "D"]
  ["B", "C", "E"]

regards,
andrew

···

On 12 Sep 2004 03:44:24 GMT, Phil Tomson <ptkwt@aracnet.com> wrote:

--
Andrew L. Johnson http://www.siaris.net/
      There are two types of programming languages; the ones that people bitch
      about and the ones that no one uses.
          -- Bjarne Stroustrup

In article <ABQ0d.397334$gE.56953@pd7tw3no>,
                          ^^^^^^^^
                          hmmm...

···

Andrew Johnson <ajohnson@cpan.org> wrote:

On 12 Sep 2004 03:44:24 GMT, Phil Tomson <ptkwt@aracnet.com> wrote:

Apparently oniguruma supports look-behind. Is there any documentation on how
to use this feature?

Essentially, they are the same as look-aheads ... zero-width assertions,
except that the look-behind expression must be a fixed width pattern (no
indeterminate quantifiers), and no captures are allowed in a negative
look-behind

for example, if I had the string "~ABC~DE" and I want to return a list of
letters in the string which are preceeded by '~' ( ['A','D'] in this case) how
might I use the look-behind feature in oniguruma to achieve this? or, how
would I get a list of letters in the string which are not preceeded by '~'
(['B','C', 'E'] in this example.

str = "~ABC~DE"
p str.scan(/(?<=~)[A-Z]/)
p str.scan(/(?<!~)[A-Z]/)

gives:

["A", "D"]
["B", "C", "E"]

Thanks. That's what I was looking for. Is this essentially the same way that
it works in Perl?

Phil

Andrew Johnson wrote:

Essentially, they are the same as look-aheads ... zero-width assertions,
except that the look-behind expression must be a fixed width pattern (no
indeterminate quantifiers), and no captures are allowed in a negative
look-behind

So it is implemented as zero-width look-ahead + eating as many characters as the content matches?

(I've thought about implementing /foo/.preceded_by('bar') as /(?!bar).{3}foo/.)

regards,
andrew

More regards,
Florian Gross

> Apparently oniguruma supports look-behind. Is there any documentation on
> how to use this feature?

Essentially, they are the same as look-aheads ... zero-width assertions,
except that the look-behind expression must be a fixed width pattern (no
indeterminate quantifiers), and no captures are allowed in a negative
look-behind

Oniguruma supports alternation inside lookbehind, so you can get a similar
behavior as quantifiers.

AEditor's regexp engine supports variable width lookbehind, where you
can use quantifiers inside lookbehind.. (with inversed left-most-longest
rule).

It would be good if Oniguruma had support for quantifiers inside lookbehind.

irb(main):007:0> re = NewRegexp.new('(?=.z).(?<=(?:ab){2,3}x.)')
=> +-Sequence
  +-Lookahead positive
  > +-Sequence
  > +-Outside set=U-000A
  > +-Inside set="z"
  +-Outside set=U-000A
  +-Lookbehind positive
    +-Sequence
      +-Repeat greedy{2,3} # quantifier inside lookbehind!!
      > +-Group non-capturing
      > +-Sequence
      > +-Inside set="a"
      > +-Inside set="b"
      +-Inside set="x"
      +-Outside set=U-000A
irb(main):008:0> 'xyz'.gsub5(re, 'Y')
=> "xyz"
irb(main):009:0> 'abxyz'.gsub5(re, 'Y')
=> "abxyz"
irb(main):010:0> 'ababxyz'.gsub5(re, 'Y')
=> "ababxYz"
irb(main):011:0> 'abababxyz'.gsub5(re, 'Y')
=> "abababxYz"

···

On Sunday 12 September 2004 06:54, Andrew Johnson wrote:

On 12 Sep 2004 03:44:24 GMT, Phil Tomson <ptkwt@aracnet.com> wrote:

--
Simon Strandgaard

> > Apparently oniguruma supports look-behind. Is there any documentation
> > on how to use this feature?
>
> Essentially, they are the same as look-aheads ... zero-width assertions,
> except that the look-behind expression must be a fixed width pattern (no
> indeterminate quantifiers), and no captures are allowed in a negative
> look-behind

Oniguruma supports alternation inside lookbehind, so you can get a similar
behavior as quantifiers.

AEditor's regexp engine supports variable width lookbehind, where you
can use quantifiers inside lookbehind.. (with inversed left-most-longest
rule).

It would be good if Oniguruma had support for quantifiers inside
lookbehind.

(here is an example with infinite quantifiers)

irb(main):016:0> re = NewRegexp.new('(?<!(ab)+|(cd){2,}).')
=> +-Sequence
  +-Lookbehind negative
  > +-Alternation
  > +-Repeat greedy{1,-1}
  > > +-Group capture=1
  > > +-Sequence
  > > +-Inside set="a"
  > > +-Inside set="b"
  > +-Repeat greedy{2,-1}
  > +-Group capture=2
  > +-Sequence
  > +-Inside set="c"
  > +-Inside set="d"
  +-Outside set=U-000A
irb(main):017:0> 'qwerty'.gsub5(re, 'Z')
=> "ZZZZZZ"
irb(main):018:0> 'qweabrty'.gsub5(re, 'Z')
=> "ZZZZZrZZ"
irb(main):019:0> 'cdcdqwerty'.gsub5(re, 'Z')
=> "ZZZZqZZZZZ"
irb(main):020:0> 'cdqwerty'.gsub5(re, 'Z')
=> "ZZZZZZZZ"
irb(main):021:0>

···

On Sunday 12 September 2004 13:07, Simon Strandgaard wrote:

On Sunday 12 September 2004 06:54, Andrew Johnson wrote:
> On 12 Sep 2004 03:44:24 GMT, Phil Tomson <ptkwt@aracnet.com> wrote:

--
Simon Strandgaard