[Regex] Alternative for look behind

Hi all,

I need to replace all '-' characters at the beginning of a sequence
(string) with 'X's (until a character appears that is not '-').

E.g.

-------abcde-gh----

should be changed into

XXXXXXXabcde-gh----

If Ruby would support look behind in Regular Expressions, I could
probably do something like this:

sequence.gsub(/(?=^-*)-/, "X")

But unfortunately Ruby does not support look behind.

Of course, I could count the number of '-' at the beginning of the
sequence:

x = sequence[/-*/].length

And then I could replace the first x characters with "X". But I don't
like that solution, it feels clumsy. Is there an elegant way of doing
this?

Thanks in advance & best regards!
Janus

···

--
Posted via http://www.ruby-forum.com/.

# I need to replace all '-' characters at the beginning of a sequence
# (string) with 'X's (until a character appears that is not '-').
# E.g.
# -------abcde-gh----
# should be changed into
# XXXXXXXabcde-gh----

my initial reaction was not a look behind,

irb(main):062:0> s.gsub(/(^-*)/){$1.tr("-","X")}
=> "XXXXXXXabcde-gh----"

the second was to use oniguruma,

re=Oniguruma::ORegexp.new( '(?<dashes>^-*)(?<after>.*)' )
#=> /(?<dashes>^-*)(?<after>.*)/
s
#=> "-------abcde-gh----"
m=re.match s
#=> #<MatchData:0x2c51b7c>
m[:dashes]
#=> "-------"
m[:after]
#=> "abcde-gh----"
...

nice naming feature, but overkill for this case. ymmv.

kind regards -botp

···

From: Janus Bor [mailto:janus@urban-youth.com]

It does - in 1.9. But:

irb(main):001:0> sequence = '-------abcde-gh----'
=> "-------abcde-gh----"
irb(main):002:0> sequence.gsub(/(?=^-*)-/, "X")
=> "X------abcde-gh----"
irb(main):003:0>

I am not sure lookbehind is the proper means here. After all you want to replace a sequence of dashes *before* a particular sequence (according to your description above). So you would rather use lookforward, wouldn't you?

irb(main):003:0> sequence.gsub(/-(?=-*abcde)/, "X")
=> "XXXXXXXabcde-gh----"

Granted, it does not anchor.

Here's another solution

irb(main):008:0> s = sequence.dup
=> "-------abcde-gh----"
irb(main):009:0> while s.sub! /^(X*)-/, '\\1X'; end
=> nil
irb(main):010:0> s
=> "XXXXXXXabcde-gh----"

But I'd rather use Pena's solution or another block form

irb(main):011:0> sequence.sub(/^-*/) {|m| "X" * m.length}
=> "XXXXXXXabcde-gh----"
irb(main):012:0> sequence.sub(/^-*/) {|m| m.tr '-','X'}
=> "XXXXXXXabcde-gh----"

Kind regards

  robert

···

On 16.08.2008 03:56, Janus Bor wrote:

I need to replace all '-' characters at the beginning of a sequence
(string) with 'X's (until a character appears that is not '-').

E.g.

-------abcde-gh----

should be changed into

XXXXXXXabcde-gh----

If Ruby would support look behind in Regular Expressions, I could
probably do something like this:

sequence.gsub(/(?=^-*)-/, "X")

But unfortunately Ruby does not support look behind.

Thanks to both of you! I like your solutions. I'm a fan of one liners...

I ended up using this one, as it has the best readability for a novice
programmer like me imho:

Robert Klemme wrote:

irb(main):011:0> sequence.sub(/^-*/) {|m| "X" * m.length}
=> "XXXXXXXabcde-gh----"

Robert Klemme wrote:

It does - in 1.9. But:

irb(main):001:0> sequence = '-------abcde-gh----'
=> "-------abcde-gh----"
irb(main):002:0> sequence.gsub(/(?=^-*)-/, "X")
=> "X------abcde-gh----"

Thanks for the info. Just downloaded 1.9 to check it out. But I still
don't comprehend why look behind is not working like I expected. If I
write the equivalent replacing characters at the end of the string using
look ahead it works just fine:

irb(main):001:0> sequence = '-------abcde-gh----'
=> "-------abcde-gh----"
irb(main):002:0> sequence.gsub(/-(?=-*$)/, "X")
=> "-------abcde-ghXXXX"

Robert Klemme wrote:

I am not sure lookbehind is the proper means here. After all you want
to replace a sequence of dashes *before* a particular sequence
(according to your description above). So you would rather use
lookforward, wouldn't you?

irb(main):003:0> sequence.gsub(/-(?=-*abcde)/, "X")
=> "XXXXXXXabcde-gh----"

Granted, it does not anchor.

The problem with look ahead/forward is, that I have to make sure the
sequence starts with "-". Otherwise something like this could happen:

irb(main):003:0> sequence = 'abcde-gh----'
=> "abcde-gh----"
irb(main):004:0> sequence.gsub(/-(?=-*[abcdefgh])/, "X")
=> "abcdeXgh----"

Kind regards,
Janus

···

--
Posted via http://www.ruby-forum.com/\.

Thanks to both of you! I like your solutions. I'm a fan of one liners...

You're welcome!

Robert Klemme wrote:

It does - in 1.9. But:

irb(main):001:0> sequence = '-------abcde-gh----'
=> "-------abcde-gh----"
irb(main):002:0> sequence.gsub(/(?=^-*)-/, "X")
=> "X------abcde-gh----"

Thanks for the info. Just downloaded 1.9 to check it out. But I still don't comprehend why look behind is not working like I expected. If I write the equivalent replacing characters at the end of the string using look ahead it works just fine:

irb(main):001:0> sequence = '-------abcde-gh----'
=> "-------abcde-gh----"
irb(main):002:0> sequence.gsub(/-(?=-*$)/, "X")
=> "-------abcde-ghXXXX"

I believe the reason is that with lookbehind the regular expression needs start matching at the *same* location (the beginning of the sequence) multiple times. Even though lookbehind does not consume characters I believe RX implementations prohibit matching at the same location over and over again - partly for efficiency reasons but also to avoid endless loops. The lookforward solution with replacement at the end of the sequence moves the start position of the match one character forward for every match. That's how I explain myself why the straight forward lookbehind does not work.

There's another reason, why lookbehind won't work: the docs state

"Subexp of look-behind must be fixed character length."

See サービス終了のお知らせ

At the moment I cannot think of a solution with lookbehind that would avoid these issues because all lookbehinds must start matching at the beginning of the sequence in order to fulfill your requirement that the initial sequence must be replaced.

Robert Klemme wrote:

I am not sure lookbehind is the proper means here. After all you want
to replace a sequence of dashes *before* a particular sequence
(according to your description above). So you would rather use
lookforward, wouldn't you?

irb(main):003:0> sequence.gsub(/-(?=-*abcde)/, "X")
=> "XXXXXXXabcde-gh----"

Granted, it does not anchor.

The problem with look ahead/forward is, that I have to make sure the sequence starts with "-". Otherwise something like this could happen:

irb(main):003:0> sequence = 'abcde-gh----'
=> "abcde-gh----"
irb(main):004:0> sequence.gsub(/-(?=-*[abcdefgh])/, "X")
=> "abcdeXgh----"

Well, this won't happen if the sequence "abcde" is known to appear only after an initial sequence of dashes. (Note the difference between your lookforward solution and mine: you created a character class while I just matched the plain sequence). But of course it's not the same as "replace initial portion of dashes".

I can recommend "Mastering regular expressions" if you want to dive deeper into this: it's pretty well written and does only have as much theory of regular languages as necessary but explains very well differences between regular expression implementations and also considers efficiency aspects.

Kind regards

  robert

···

On 16.08.2008 18:46, Janus Bor wrote: