Ruby regex lookahead/behind

Who of you here frequently use lookahead/lookbehind
in Ruby regular expressions?

I'm thinking hard about purpose/usage of these, and would
like some more examples...

If you consider it too far off-topic, you may email me.

Cheers,
Hal

My habits are better now, but if I'm lazy and know I only need to remove small amounts of HTML/XML tags in parsing docs, I will use them to grab what's inside the tags. However, that's really the only time I use them and really I shouldn't be quite so lazy and should parse them with nokogiri and xpath instead.

-Wayne

I have to admit, I've only found the need for them exactly twice in the past couple decades, and I can't even remember where, exactly (but they were the perfect answer). I always forget about them and find something else to solve my immediate problem. Maybe I should use them more…

···

On Sep 27, 2013, at 3:02 PM, Hal Fulton <rubyhacker@gmail.com> wrote:

Who of you here frequently use lookahead/lookbehind
in Ruby regular expressions?

I'm thinking hard about purpose/usage of these, and would
like some more examples...

If you consider it too far off-topic, you may email me.

Cheers,
Hal

Who of you here frequently use lookahead/lookbehind
in Ruby regular expressions?

I'm thinking hard about purpose/usage of these, and would
like some more examples...

IMHO very strange question... Purpose of /look(ahead|behind)/ is to
match regexp only in case some pattern comes before or after pattern
without "including" that subpattern into match. It is used when you
have to manipulate strings.

For example, consider you have a ruby code as text:

txt = <<-RUBY
class Foo
  def bar
    encode_json("...")
  end

  # ...

  def encode_json obj
    JSON.generate obj, :quirks_mode => true
  end
RUBY

Now you want replace `encode_json` with `JSON.dump`, if you'll gsub
with `/encode_json/` you'll get `def JSON.dump obj` which is smoething
that you don't wanted, so you gsub it like this instead:

txt.gsub /(<?!def )encode_json/, "JSON.dump"
···

--
Sincerely yours,
Aleksey V. Zapparov A.K.A. ixti
FSF Member #7118
Mobile Phone: +34 677 990 688
Homepage: http://ixti.net/
JID: zapparov@jabber.ru

*Origin: Happy Hacking!

I use them on a regular (sic!) basis. You can think of them as user
defined anchors, i.e. beyond ^, $, \A, \b and the like.

Kind regards

robert

···

On Fri, Sep 27, 2013 at 10:02 PM, Hal Fulton <rubyhacker@gmail.com> wrote:

Who of you here frequently use lookahead/lookbehind
in Ruby regular expressions?

I'm thinking hard about purpose/usage of these, and would
like some more examples...

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

I've used them for things like string matching:

/"(.*?)(?<!\\)"/

Note: that pattern doesn't allow you to escape backslashes, but it's a
quick example, and the sort of thing I've used in the past.

···

On 28 September 2013 12:05, Tamara Temple <tamouse.lists@gmail.com> wrote:

On Sep 27, 2013, at 3:02 PM, Hal Fulton <rubyhacker@gmail.com> wrote:

> Who of you here frequently use lookahead/lookbehind
> in Ruby regular expressions?
>
> I'm thinking hard about purpose/usage of these, and would
> like some more examples...
>
> If you consider it too far off-topic, you may email me.
>
> Cheers,
> Hal
>

I have to admit, I've only found the need for them exactly twice in the
past couple decades, and I can't even remember where, exactly (but they
were the perfect answer). I always forget about them and find something
else to solve my immediate problem. Maybe I should use them more…

--
  Matthew Kerwin, B.Sc (CompSci) (Hons)
  http://matthew.kerwin.net.au/

I guess one thing I'm wondering is:

The lookarounds are basically "don't-consume" matches
as I see it... Is there *always* a "do consume" match
associated with it?

To put it more clearly: Is it ever valid to use a lookaround
"by itself"?

I freely confess I am far from expert with regular expressions...
in the more complex cases, I usually write code rather than
use a regex. ("More complex" being a relative term, of course.)

Of course, there are times/places where a regex is *not* the
right tool. But it's my desire that, when they are the right tool,
I will use them more often.

Hal

···

On Sat, Sep 28, 2013 at 11:01 AM, Robert Klemme <shortcutter@googlemail.com>wrote:

On Fri, Sep 27, 2013 at 10:02 PM, Hal Fulton <rubyhacker@gmail.com> wrote:
> Who of you here frequently use lookahead/lookbehind
> in Ruby regular expressions?
>
> I'm thinking hard about purpose/usage of these, and would
> like some more examples...

I use them on a regular (sic!) basis. You can think of them as user
defined anchors, i.e. beyond ^, $, \A, \b and the like.

Kind regards

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

+1 on robertk's "user-defined" anchors. use case: very handy for
validating password or emails eg

···

On Sun, Sep 29, 2013 at 2:38 AM, Hal Fulton <rubyhacker@gmail.com> wrote:

I guess one thing I'm wondering is:

The lookarounds are basically "don't-consume" matches
as I see it... Is there *always* a "do consume" match
associated with it?

To put it more clearly: Is it ever valid to use a lookaround
"by itself"?

I freely confess I am far from expert with regular expressions...
in the more complex cases, I usually write code rather than
use a regex. ("More complex" being a relative term, of course.)

Of course, there are times/places where a regex is *not* the
right tool. But it's my desire that, when they are the right tool,
I will use them more often.

Hal

On Sat, Sep 28, 2013 at 11:01 AM, Robert Klemme <shortcutter@googlemail.com> > wrote:

On Fri, Sep 27, 2013 at 10:02 PM, Hal Fulton <rubyhacker@gmail.com> wrote:
> Who of you here frequently use lookahead/lookbehind
> in Ruby regular expressions?
>
> I'm thinking hard about purpose/usage of these, and would
> like some more examples...

I use them on a regular (sic!) basis. You can think of them as user
defined anchors, i.e. beyond ^, $, \A, \b and the like.

Kind regards

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

I guess one thing I'm wondering is:

The lookarounds are basically "don't-consume" matches
as I see it... Is there *always* a "do consume" match
associated with it?

To put it more clearly: Is it ever valid to use a lookaround
"by itself"?

It never occurred to me to try that. There is really no point in
doing it because if there is no consuming match you do not match
anything. Still Onigurum allows you to do it:

irb(main):001:0> "foo".scan /(?=\w)/
=> ["", "", ""]
irb(main):002:0> "foo".scan /(?=\w)/ do puts $` end

f
fo
=> "foo"
irb(main):003:0> "foo".scan /(?=\w)/ do puts $`.length end
0
1
2
=> "foo"

As you can see, with a trick you even get to know the matching positions. :slight_smile:

I freely confess I am far from expert with regular expressions...
in the more complex cases, I usually write code rather than
use a regex. ("More complex" being a relative term, of course.)

Of course, there are times/places where a regex is *not* the
right tool. But it's my desire that, when they are the right tool,
I will use them more often.

I think you are on the right track. :slight_smile: Not using regular expressions
when they are useful can make your code more complicated and even
slower. I personally like the power of regular expressions. But I do
admit that it took me a while to get there. :slight_smile:

I usually recommend "Mastering Regular Expressions" - even though it's
not an introductory book. If you want to see the matching process at
work you can use The Regex Coach - interactive regular expressions (Windows program but
works with WINE on Linux). That helps understanding the matching
process - you can even single step.

Kind regards

robert

···

On Sat, Sep 28, 2013 at 8:38 PM, Hal Fulton <rubyhacker@gmail.com> wrote:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

f
fo
=> "foo"
irb(main):003:0> "foo".scan /(?=\w)/ do puts $`.length end
0
1
2
=> "foo"

As you can see, with a trick you even get to know the matching positions.
:slight_smile:

Ahh, I had been trying to think of a way to do that. :slight_smile: Thank you for that
trick.

I usually recommend "Mastering Regular Expressions" - even though it's
not an introductory book.

Yes, I also recommend that book. :wink: I have had it for years, but much of
it
does not "stick to my brain."

If you want to see the matching process at
work you can use http://weitz.de/regex-coach/ (Windows program but
works with WINE on Linux). That helps understanding the matching
process - you can even single step.

That sounds cool. I've sometimes wished there was an Onigmo patch
to allow that sort of thing.

Imagine regular expressions debuggable at runtime... or maybe the real
experts would cringe at that thought. :slight_smile:

Anyway: About lookarounds. As I see it, there are four basic cases,
arising from two basic questions: 1) Is the nonconsuming match before or
after the consuming one? and 2) is the nonconsuming match positive or
negative?

I read an article that (sort of) implied that there might be eight cases --
but
I think I have convinced myself this was a notational issue.

As this relates to Regexador -- I am thinking of introducing three new
keywords (find, with, without) so that lookarounds would work this way:

    find X with Y # /(?=XY)X/ - pos lookahead
    find X without Y # /(?!=XY)X/ - neg lookahead
    with X find Y # /(?<=X)Y/ - pos lookbehind
    without X find Y # /(?<!X)Y/ - neg lookbehind

But there are some slight subtleties I am working through here.

For example, I have read that most engines require a lookbehind to be
a fixed-length expression (with .NET and ABA being exceptions -- and I
don't even know what ABA is).

I've confirmed that Ruby 2.0 doesn't allow variable-length lookbehinds.

Hal

···

On Sun, Sep 29, 2013 at 10:59 AM, Robert Klemme <shortcutter@googlemail.com>wrote: