Subsequence regular expression

diz_rael · 25 January 2006 06:07

Hi,

I'm trying to extract a certain sequence from a
string. Best described by example:

s =
"5b300ba00260bababababababababababababababababa000bd1007bd10b810ba92"
slice^--------------------------------------------^

I want to extract a slice of the string starting from
the beginning and extending upto the end of the long
"ba" sequence. What I'm trying to do is post-process a
large dump of memory from an embedded system. The
stack region is initially cleared out to bababa... So
after the application finishes, the "dirty" portion
gives the depth of the stack. There may be stray "ba"
values at various places in memory written during the
normal course of the application's execution. In the
above
example, the "dirty" portion is
5b300ba00260.

Anyway, I tried the following:

s=~/.+?(ba)+/
$& => "5b300ba"

It so happens that a "ba" was created on the stack, so
the regexp thinks it ends there, when in fact the
string I want it to return is:
5b300ba00260

Is there a regexp that can handle this or is this a
fundamentally difficult algorithmic problem (kind of
related to longest common subsequence I guess).

Any help will be appreciated. Thanks a lot!

···

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Andrew_Johnson · 25 January 2006 07:13

[snip]

Anyway, I tried the following:

s=3D~/.+?(ba)+/
$& =3D> "5b300ba"

It so happens that a "ba" was created on the stack, so
the regexp thinks it ends there, when in fact the
string I want it to return is:
5b300ba00260

Well, you could just check for 2 or more 'ba' sequences:

s =~ /(.*?)(ba){2,}/
p $1

but of course, your "dirty portion" just might just have two, or even more,
stray 'ba' in a row -- so I suspect you want to grab everything from the
beginning up to the longest subsequence of repeated 'ba's? One way:

s = "5bbaba300ba00260babababababababababababa000bd1007b810ba92"
puts s[0,s.index(s.scan(/(?:ba)+/).max)]

But I may well be missing an easier way

cheers,
andrew

···

On Wed, 25 Jan 2006 15:07:31 +0900, diz rael <dizraelus@yahoo.com> wrote:

--
Andrew L. Johnson http://www.siaris.net/
The generation of random numbers is too
important to be left to chance.

Simon_Strandgaard2 · 25 January 2006 07:30

[snip]

In the above example, the "dirty" portion is
5b300ba00260.

s = "5b300ba00260bababababababababababababababababa000bd1007bd10b810ba92"
p s.scan(/(?:ba){2,}|(?:[^b][^a])+/)
#["5b300ba00260", "bababababababababababababababababa", "000bd1007bd10b810ba9"]

···

On 1/25/06, diz rael <dizraelus@yahoo.com> wrote:

--
Simon Strandgaard

diz_rael · 25 January 2006 09:40

Thanks a lot for the suggestions. This is close to
what I'm looking for, but I don't quite get the effect
of "?:"

The docs say that it simply makes the regexp into a
group without generating backreferences. Doesn't this
mean that *not* using ?: will only have the
side-effect of setting $1, $2, etc.

However,:
p s.scan(/(ba){2,}|([^b][^a])+/)
=> [[nil, "60"], ["ba", nil], [nil, "a9"]]

Another thing is that in the sample string in my
original post, the first "ba" happens to occur on an
odd location (string indexed from 0). If I shift it
forward by a character it doesn't work as well:

s =
"5b3000ba00260bababababababababababababababababa000bd1007bd10b810ba92"
p s.scan(/(?:ba){2,}|(?:[^b][^a])+/)
=> ["5b3000", "a00260",
"bababababababababababababababababa",
"000bd1007bd10b810ba9"]

ideally, it should be:
=> "5b3000ba00260"
"bababababababababababababababababa"
"000bd1007bd10b810ba92"

Thanks in advance..

s =

"5b300ba00260bababababababababababababababababa000bd1007bd10b810ba92"

···

--- Simon Strandgaard <neoneye@gmail.com> wrote:

p s.scan(/(?:ba){2,}|(?:[^b][^a])+/)
#["5b300ba00260",
"bababababababababababababababababa",
"000bd1007bd10b810ba9"]

--
Simon Strandgaard

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around

Simon_Strandgaard2 · 25 January 2006 10:16

[snip]

s =
"5b3000ba00260bababababababababababababababababa000bd1007bd10b810ba92"
p s.scan(/(?:ba){2,}|(?:[^b][^a])+/)
=> ["5b3000", "a00260",
"bababababababababababababababababa",
"000bd1007bd10b810ba9"]

ideally, it should be:
=> "5b3000ba00260"
"bababababababababababababababababa"
"000bd1007bd10b810ba92"

Hmm.. the many ba's is at an odd offset.. don't you want them only at
equal offsets?

Maybe like this?

s = "5b3000ba00260bababababababababababababababababa000bd1007bd10b810ba92"
p s.scan(/\G(?: (?:ba)+ | (?:(?!ba)..)+ )/x)
# ["5b3000", "ba",
"00260bababababababababababababababababa000bd1007bd10b810", "ba",
"92"]

···

On 1/25/06, diz rael <dizraelus@yahoo.com> wrote:

--
Simon Strandgaard

Topic		Replies	Views
Regular Expression question ruby-talk	6	56	30 June 2007
Newbie: Strings and RegExp ruby-talk	7	54	20 October 2006
Need help for regular expression ruby-talk	3	97	2 July 2008
Simple regexp question ruby-talk	0	49	26 October 2005
Can't find appropriate regexp ruby-talk	16	58	24 June 2003

Subsequence regular expression

Related Topics