Regular Expression problems

Jonas_Bengtsson · 3 July 2002 17:25

Hello,

I got a problem with regular expressions in Ruby. The closest I have
come to a solution looks like this:

···

re = /<name: ([a-zA-Z_]+)>\n(.)(<name:.)?/m
unprocessed = <<HERE
<name: a_name>
a little content
containing all sorts of
characters, even < and >
<name: another_name>
also containing all sorts
of things
<name: third_name>
i don’t know in advance how
many of these there are but
i settle with three here
HERE

match = re.match unprocessed

while match
name = match[1]
content = match[2]
unprocessed = match[3]

    puts "MATCH", "name:#{name}", "content:#{content}",
            "unproc:#{unprocessed}"

    match = re.match unprocessed

end

Here is the output:

MATCH
name:a_name
content:a little content
containing all sorts of
characters, even < and >
<name: another_name>
also containing all sorts
of things
<name: third_name>
i don’t know in advance how
many of these there are but
i settle with three here

But I want the output to look like this:

MATCH
name:a_name
content:a little content
containing all sorts of
characters, even < and >
name:another_name
content:also containing all sorts
of things
name:third_name
content:i don’t know in advance how
many of these there are but
i settle with three here

So the problem is that the second group–(.)–is too ‘hungry’ and
doesn’t stop on the first occurrence of the third group–(<name:.).

Is it possible to change this behavior of the second group? Or are
there any better ways to solve this problem?

–
Best regards,
Jonas

I always thought that record would stand until it was broken.

Yogi Berra

Jonas_Bengtsson · 3 July 2002 17:47

Hello again,

Perhaps I should have mentioned that I’m a Ruby-newbie. So I am would
appreciate any comments on my programming style. I have not had the
time to read Programming Ruby yet (besides as a reference) thus I’m
not too familiar with the Ruby-way.

By the way, I really like what I’ve seen about Ruby so far!

···

–
Best regards,
Jonas

We don’t know a millionth of one percent about anything.
Thomas A. Edison

Wednesday, July 03, 2002, 7:25:01 PM, you wrote:

Hello,

I got a problem with regular expressions in Ruby. The closest I have
come to a solution looks like this:

re = /<name: ([a-zA-Z_]+)>\n(.)(<name:.)?/m
unprocessed = <<HERE
<name: a_name>
a little content
containing all sorts of
characters, even < and >
<name: another_name>
also containing all sorts
of things
<name: third_name>
i don’t know in advance how
many of these there are but
i settle with three here
HERE

match = re.match unprocessed

while match
name = match[1]
content = match[2]
unprocessed = match[3]

    puts "MATCH", "name:#{name}", "content:#{content}",
            "unproc:#{unprocessed}"

    match = re.match unprocessed
end

Here is the output:

MATCH
name:a_name
content:a little content
containing all sorts of
characters, even < and >
<name: another_name>
also containing all sorts
of things
<name: third_name>
i don’t know in advance how
many of these there are but
i settle with three here

But I want the output to look like this:

MATCH
name:a_name
content:a little content
containing all sorts of
characters, even < and >
name:another_name
content:also containing all sorts
of things
name:third_name
content:i don’t know in advance how
many of these there are but
i settle with three here

So the problem is that the second group–(.)–is too ‘hungry’ and
doesn’t stop on the first occurrence of the third group–(<name:.).

Is it possible to change this behavior of the second group? Or are
there any better ways to solve this problem?

Nobuyoshi_Nakada · 3 July 2002 18:56

Hi,

So the problem is that the second group–(.)–is too ‘hungry’ and
doesn’t stop on the first occurrence of the third group–(<name:.).

Is it possible to change this behavior of the second group? Or are
there any better ways to solve this problem?

re = /<name: ([a-zA-Z_]+)>\n(.?)(<name:.|\z)/m

Or:
re = /<name: ([a-zA-Z_]+)>\n(.*?)(?=<name:|\z)/m
while match = re.match(unprocessed)
name = match[1]
content = match[2]
unprocessed = match.post_match # ←
puts “MATCH”, “name:#{name}”, “content:#{content}”
end

···

At Thu, 4 Jul 2002 02:25:01 +0900, Jonas Bengtsson wrote:

–
Nobu Nakada

Jonas_Bengtsson · 3 July 2002 21:32

Hello Nobu,

Wednesday, July 03, 2002, 8:56:22 PM, you wrote:

re = /<name: ([a-zA-Z_]+)>\n(.?)(<name:.|\z)/m

Or:
re = /<name: ([a-zA-Z_]+)>\n(.*?)(?=<name:|\z)/m
while match = re.match(unprocessed)
name = match[1]
content = match[2]
unprocessed = match.post_match # ←
puts “MATCH”, “name:#{name}”, “content:#{content}”
end

Thanks!
I didn’t see this in Programming Ruby before:
re ?
Matches zero or one occurrence of re. The *, +, and {m,n} modifiers
are greedy by default. Append a question mark to make them minimal.

···

–
Best regards,
Jonas

Anyone can make the simple complicated. Creativity is making the complicated simple.

Charles Mingus

David_Alan_Black1 · 5 July 2002 22:39

Hi –

Hi,

So the problem is that the second group–(.)–is too ‘hungry’ and
doesn’t stop on the first occurrence of the third group–(<name:.).

Is it possible to change this behavior of the second group? Or are
there any better ways to solve this problem?

re = /<name: ([a-zA-Z_]+)>\n(.?)(<name:.|\z)/m

Or:
re = /<name: ([a-zA-Z_]+)>\n(.*?)(?=<name:|\z)/m
while match = re.match(unprocessed)
name = match[1]
content = match[2]
unprocessed = match.post_match # ←
puts “MATCH”, “name:#{name}”, “content:#{content}”
end

One more variant, using the mighty #scan:

re = /<name: ([a-zA-Z_]+)>\n(.*?)(?=<name:|\z)/m
unprocessed.scan(re) do |n,c|
puts “MATCH:”,“name:#{n}”,“content:#{c}”
end

(Jonas: contrary to your hand-made output, you did want “MATCH”
three times, didn’t you?

David

···

On Thu, 4 Jul 2002 nobu.nokada@softhome.net wrote:

At Thu, 4 Jul 2002 02:25:01 +0900, > Jonas Bengtsson wrote:

–
David Alan Black
home: dblack@candle.superlink.net
work: blackdav@shu.edu
Web: http://pirate.shu.edu/~blackdav

Daniel10 · 3 July 2002 22:22

For a good book on regular expressions, check out Jeffrey Friedells book
on O’Reilley - Mastering Regular Expressions.

You can find this answer, and many more there
For Programming Ruby to cover all the regex things, it would take another
book in itself.

Daniel

···

On Thu, 4 Jul 2002, Jonas Bengtsson wrote:

Hello Nobu,

Wednesday, July 03, 2002, 8:56:22 PM, you wrote:

re = /<name: ([a-zA-Z_]+)>\n(.?)(<name:.|\z)/m

Or:
re = /<name: ([a-zA-Z_]+)>\n(.*?)(?=<name:|\z)/m
while match = re.match(unprocessed)
name = match[1]
content = match[2]
unprocessed = match.post_match # ←
puts “MATCH”, “name:#{name}”, “content:#{content}”
end

Thanks!
I didn’t see this in Programming Ruby before:
re ?
Matches zero or one occurrence of re. The *, +, and {m,n} modifiers
are greedy by default. Append a question mark to make them minimal.

–
Best regards,
Jonas

Anyone can make the simple complicated. Creativity is making the complicated simple.

Charles Mingus

–
A consultant is a person who borrows your watch, tells you what time it
is, pockets the watch, and sends you a bill for it.

Topic		Replies	Views
A regular expression problem ruby-talk	6	70	5 March 2007
About Regular Expressions ruby-talk	30	118	20 November 2004
Regular Expression Help ruby-talk	5	117	6 October 2012
Processing regular expressions? ruby-talk	2	100	15 October 2010
Regular expression. newbie problem ruby-talk	14	83	7 December 2007

Regular Expression problems

Related topics