Why, oh, why, little regexp?

Daniel_Waite · 30 October 2007 23:14

'cost * tax'.match(/([a-z]+)*/).to_a
=> ["cost", "cost"]

Why?

I'm reading it as... Take one or more characters between a and z, store
them into a back reference, then repeat the previous match zero or more
times.

Now, that regexp doesn't do what I want it to do, but what it IS doing
doesn't make sense to me.

What I'd like is to grab all the "words" in the string. So in the above
example I'd like two matches, cost and tax.

Any ideas?

PS: match(...).captures always, always returns an empty array...

···

--
Posted via http://www.ruby-forum.com/.

Joel_VanderWerf1 · 30 October 2007 23:24

Daniel Waite wrote:

'cost * tax'.match(/([a-z]+)*/).to_a
=> ["cost", "cost"]

Why?

I'm reading it as... Take one or more characters between a and z, store
them into a back reference, then repeat the previous match zero or more
times.

Now, that regexp doesn't do what I want it to do, but what it IS doing
doesn't make sense to me.

What I'd like is to grab all the "words" in the string. So in the above
example I'd like two matches, cost and tax.

Any ideas?

'cost * tax'.scan(/\w+/)
=> ["cost", "tax"]

PS: match(...).captures always, always returns an empty array...

How are you using it?

"foo".match(/(foo)/).captures
=> ["foo"]
'cost * tax'.match(/([a-z]+)*/).captures
=> ["cost"]

···

--
vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407

Stanislav_Sedov · 30 October 2007 23:36

On Wed, Oct 31, 2007 at 08:14:01AM +0900 Daniel Waite mentioned:

'cost * tax'.match(/([a-z]+)*/).to_a
=> ["cost", "cost"]

Why?

Well, the regexp always matches the longest possible string.
What did you wrote is effectively equialent to ([a-z]*).
The single regexp can't match multiple strings, it always matches
one. It can't match the space after the 'cost' either, since this
symbol wasn't included to your regexp.

In case, if you want to match two words, you should write e.g.
([[:alpha:]]+)[[:space:]]+([[:alpha:]]+)
This regexp will match two words separated by a space.
Regexp can't match an undefined number of words, you should know
in advance which number of words you want to match.

For more infor on regexps see e.g. re_format(7).

···

--
Stanislav Sedov
ST4096-RIPE

7stud · 31 October 2007 01:23

Daniel Waite wrote:

What I'd like is to grab all the "words" in the string.
So how does that work if I wanted to match ALL occurrences
of \w+ WITHOUT scan?

Your using the wrong method. match() only returns the first match:

pattern = /x.x/
str = "xax hello xbx"

puts pattern1.match(str)

--output:--
xax

So how does that work if I wanted to match ALL occurrences
of \w+ WITHOUT scan?

str = " cost * tax"
words = str.split("*").map {|elmt| elmt.strip()}
p words

--output:--
["cost", "tax"]

str = " cost * tax = 123"
words =

str.split().map do |word|
good_word = true

  word.each_byte do |code|
    if code < ?a or code > ?z
      good_word = false
      break
    end
  end

  if good_word
    words << word
  end
end

p words

--output:--
["cost", "tax"]

···

--
Posted via http://www.ruby-forum.com/\.

Daniel_Waite · 30 October 2007 23:33

Joel VanderWerf wrote:

What I'd like is to grab all the "words" in the string. So in the above
example I'd like two matches, cost and tax.

Any ideas?

'cost * tax'.scan(/\w+/)
=> ["cost", "tax"]

How do you people do that? The last time I had a regexp question someone
came down from the clouds and handed me something about that short. Why
do I think it's more difficult than it is?

After making the example a little more complex I had to change it
every-so-slightly...

'cost * tax + 0.075'.scan(/[a-z]+/)
=> ["cost", "tax"]

But it's effectively the same. Thank you Joel, you rock!

Is there a book you recommend to learn more about regular expressions?
How did YOU learn them?

PS: match(...).captures always, always returns an empty array...

How are you using it?

"foo".match(/(foo)/).captures
=> ["foo"]
'cost * tax'.match(/([a-z]+)*/).captures
=> ["cost"]

LOL I'm an idiot -- *captures* -- back references, right. Gotcha...

···

--
Posted via http://www.ruby-forum.com/\.

Daniel_Waite · 30 October 2007 23:51

Stanislav Sedov wrote:

On Wed, Oct 31, 2007 at 08:14:01AM +0900 Daniel Waite mentioned:

'cost * tax'.match(/([a-z]+)*/).to_a
=> ["cost", "cost"]

Why?

Well, the regexp always matches the longest possible string.
What did you wrote is effectively equialent to ([a-z]*).
The single regexp can't match multiple strings, it always matches
one. It can't match the space after the 'cost' either, since this
symbol wasn't included to your regexp.

In case, if you want to match two words, you should write e.g.
([[:alpha:]]+)[[:space:]]+([[:alpha:]]+)
This regexp will match two words separated by a space.
Regexp can't match an undefined number of words, you should know
in advance which number of words you want to match.

For more infor on regexps see e.g. re_format(7).

Hmm... if what you say is true, why does the second poster's solution
capture multiple words? Wait, I know why. String#scan is different than
string#match. Interesting...

So how does that work if I wanted to match ALL occurrences of \w+
WITHOUT scan?

···

--
Posted via http://www.ruby-forum.com/\.

Daniel_Waite · 31 October 2007 03:30

7stud -- wrote:

str = " cost * tax = 123"
words =

str.split().map do |word|
  good_word = true

  word.each_byte do |code|
    if code < ?a or code > ?z
      good_word = false
      break
    end
  end

  if good_word
    words << word
  end
end

p words

--output:--
["cost", "tax"]

That's clever use of ?a, which I recognize but have never seen anyone
use before. Thanks for the example!

Jim Clark wrote:

"Mastering Regular Expressions" by Jeffrey Friedl. I haven't seen the
third edition to see if there is any Ruby specific examples but even
with all the Perl examples in the first edition, I still use it as a
reference because of the similarities between Perl and Ruby's regular
expressions.

I shall check that out Jim, thanks much.

···

--
Posted via http://www.ruby-forum.com/\.

Jim_Clark · 31 October 2007 00:13

Daniel Waite wrote:

Is there a book you recommend to learn more about regular expressions? How did YOU learn them?

"Mastering Regular Expressions" by Jeffrey Friedl. I haven't seen the third edition to see if there is any Ruby specific examples but even with all the Perl examples in the first edition, I still use it as a reference because of the similarities between Perl and Ruby's regular expressions.

-Jim

Gavin_Kistner3 · 31 October 2007 04:00

My current favorite use for the ?x syntax is converting single-
character strings representing digits into their integer form:

  # Jenny jenny, who can I turn to?
  irb(main):006:0> "8675309".each_byte{ |x| p x - ?0 }
  8
  6
  7
  5
  3
  0
  9

···

On Oct 30, 9:30 pm, Daniel Waite <rabbitb...@gmail.com> wrote:

That's clever use of ?a, which I recognize but have never seen anyone
use before. Thanks for the example!

Brian_Adkins · 31 October 2007 04:30

Yeah, so you can squeeze Ruby code into small places

1.upto(?d){|i|i%3<1&&x=:Fizz;puts i%5<1?"#{x}Buzz":x||i}

···

On Oct 30, 11:58 pm, Phrogz <phr...@mac.com> wrote:

On Oct 30, 9:30 pm, Daniel Waite <rabbitb...@gmail.com> wrote:

> That's clever use of ?a, which I recognize but have never seen anyone
> use before. Thanks for the example!

My current favorite use for the ?x syntax is converting single-
character strings representing digits into their integer form:

7stud · 31 October 2007 12:46

Gavin Kistner wrote:

···

On Oct 30, 9:30 pm, Daniel Waite <rabbitb...@gmail.com> wrote:

That's clever use of ?a, which I recognize but have never seen anyone
use before. Thanks for the example!

My current favorite use for the ?x syntax is converting single-
character strings representing digits into their integer form:

  # Jenny jenny, who can I turn to?
  irb(main):006:0> "8675309".each_byte{ |x| p x - ?0 }
  8
  6
  7
  5
  3
  0
  9

Perhaps this is clearer:

"8675309".each_byte{|code| puts code.chr}

...although slightly slower.
--
Posted via http://www.ruby-forum.com/\.

Rick_DeNatale1 · 31 October 2007 11:17

Except under the upcoming revision (1.9) of the (Ruby) Rules of Golf,
the R(uby)&A(ncient) has outlawed that usage, and instituted the
penalty that ?d will no longer be 100, but "d".

···

On 10/31/07, Brian Adkins <lojicdotcom@gmail.com> wrote:

On Oct 30, 11:58 pm, Phrogz <phr...@mac.com> wrote:
> On Oct 30, 9:30 pm, Daniel Waite <rabbitb...@gmail.com> wrote:
>
> > That's clever use of ?a, which I recognize but have never seen anyone
> > use before. Thanks for the example!
>
> My current favorite use for the ?x syntax is converting single-
> character strings representing digits into their integer form:

Yeah, so you can squeeze Ruby code into small places

1.upto(?d){|i|i%3<1&&x=:Fizz;puts i%5<1?"#{x}Buzz":x||i}

--
Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/

James_Edward_Gray_II · 31 October 2007 13:00

Printed content aside, it's not equivalent. The original code is making Integers, not Strings.

James Edward Gray II

···

On Oct 31, 2007, at 7:46 AM, 7stud -- wrote:

Gavin Kistner wrote:

On Oct 30, 9:30 pm, Daniel Waite <rabbitb...@gmail.com> wrote:

That's clever use of ?a, which I recognize but have never seen anyone
use before. Thanks for the example!

My current favorite use for the ?x syntax is converting single-
character strings representing digits into their integer form:

  # Jenny jenny, who can I turn to?
  irb(main):006:0> "8675309".each_byte{ |x| p x - ?0 }
  8
  6
  7
  5
  3
  0
  9

Perhaps this is clearer:

"8675309".each_byte{|code| puts code.chr}

...although slightly slower.

Brian_Adkins · 31 October 2007 14:55

Well, then the least they can do is add Integer#to as an alias for
Integer#upto so we can have a net loss of 1 character in the above
code

···

On Oct 31, 7:17 am, "Rick DeNatale" <rick.denat...@gmail.com> wrote:

On 10/31/07, Brian Adkins <lojicdot...@gmail.com> wrote:

> On Oct 30, 11:58 pm, Phrogz <phr...@mac.com> wrote:
> > On Oct 30, 9:30 pm, Daniel Waite <rabbitb...@gmail.com> wrote:

> > > That's clever use of ?a, which I recognize but have never seen anyone
> > > use before. Thanks for the example!

> > My current favorite use for the ?x syntax is converting single-
> > character strings representing digits into their integer form:

> Yeah, so you can squeeze Ruby code into small places

> 1.upto(?d){|i|i%3<1&&x=:Fizz;puts i%5<1?"#{x}Buzz":x||i}

Except under the upcoming revision (1.9) of the (Ruby) Rules of Golf,
the R(uby)&A(ncient) has outlawed that usage, and instituted the
penalty that ?d will no longer be 100, but "d".

7stud · 31 October 2007 18:25

James Gray wrote:

···

On Oct 31, 2007, at 7:46 AM, 7stud -- wrote:

irb(main):006:0> "8675309".each_byte{ |x| p x - ?0 }

"8675309".each_byte{|code| puts code.chr}

...although slightly slower.

Printed content aside, it's not equivalent. The original code is
making Integers, not Strings.

Whoops.
--
Posted via http://www.ruby-forum.com/\.

Topic		Replies	Views
Specification of Ruby regex? ruby-talk	31	244	28 August 2003
Do You Understand Regular Expressions? ruby-talk	19	183	22 June 2007
Multiple matching with ()* ruby-talk	17	148	6 August 2007
Regular expressions question ruby-talk	69	330	19 December 2005
Regexp Error? ruby-talk	14	134	14 May 2004

Why, oh, why, little regexp?

Related topics