Specification of Ruby regex?

HAL_9000 · 26 August 2003 21:23

Ruby’s regular expressions are almost identical to Perl’s.

Except where they are different. The biggest glaring difference is that
^ and $ do not mean “match start of string” and “match end of string”

a.untaint if /^[1]+$/ =~ a # WRONG and maybe dangerous
a.untaint if /\A[a-z]+\z/ =~ a # right

Regards,

Brian.

what do ^ and $ mean then? they do match start and end for me. what
else do they match? *shudders at thought of changing lots of code

Correct me if I’m wrong. (Famous last words on most newsgroups.)

Isn’t it an issue only in multiline mode? In that case, I think
^ and $ would match the start and end of the line rather than
the entire string.

Hal

a-z ↩︎

Emmanuel_Touzery · 26 August 2003 13:03

Mark Slagell wrote:

does ruby support named matches (sorry i don’t know the proper
terminology)?
C# does it like this:
“(?\d{4})-(?\d{1,2})-(?\d{1,2})”

Is this helpful at all?

year, month, day =
/(\d{4})-(\d{1,2})-(\d{1,2})/.match(s).to_a

(where s is the string to be matched)

sure, i just asked for this particular feature “because it’s cool”, and
also i can probably think about some cases in which it’s actually useful
;O) (although probably not terribly useful)
but otherwise i’m a happy user of ruby regexps even without it.

emmanuel

Florian_Frank2 · 26 August 2003 14:10

You probably meant to write this:

year, month, day = /(\d{4})-(\d{1,2})-(\d{1,2})/.match(s).captures

···

On 2003-08-26 21:56:39 +0900, Mark Slagell wrote:

year, month, day =
/(\d{4})-(\d{1,2})-(\d{1,2})/.match(s).to_a
(where s is the string to be matched)

–
Claiming Java is easier than C++ is like saying that K2 is shorter than
Everest.
– Larry O’Brian

Wesley_J_Landaker · 26 August 2003 19:29

Apparently, Austin Ziegler recently wrote:

btw, since there is a thread about that, i wanted to ask: does ruby
support named matches (sorry i don’t know the proper terminology)? C#
does it like this: “(?\d{4})-(?\d{1,2})-(?\d{1,2})”
matches “2002-4-6”
and then in my match groups i have “year”, “month”, “day”.
(looked in pickaxe + google ruby “regexp match group”)
I’m 99.99% sure it doesn’t.

The latest Oniguruma supports it. I’m not sure how to use/enable that, but
it does support it.

Something like this?

class NamedMatchRegex
def initialize(regex)
regex_string = regex.inspect

@match_names = ["MATCH"]
regex_string.gsub!(/\?:<[^>]+>/) { |match|
  name = match[3..-2]
  @match_names << name
  ""
}

@regex = eval regex_string

end

def match(string)
match_data = @regex.match(string)
return nil if match_data.nil?
match_data = match_data.to_a
named_match_data = {}
@match_names.each { |name|
named_match_data[name] = match_data.shift
}
named_match_data
end
end

irb(main):001:0> load ‘test.rb’
=> true
irb(main):002:0> re = /^\s*(?:\d+)\s*(?:\w+).$/
=> /^\s(?:\d+)\s*(?:\w+).$/
irb(main):003:0> string = " 12345678 we thing ruby is really great"
=> " 12345678 we thing ruby is really great"
irb(main):004:0> re.match(string)
=> nil
irb(main):005:0> nmre = NamedMatchRegex.new(re)
=> #<NamedMatchRegex:0x401f5194 @regex=/^\s(\d+)\s*(\w+).*$/,
@match_names=[“MATCH”, “number”, “word”]>
irb(main):006:0> matches = nmre.match(string)
=> {“number”=>“12345678”, “word”=>“we”, “MATCH”=>" 12345678 we thing ruby
is really great"}
irb(main):007:0> matches[“MATCH”]
=> " 12345678 we thing ruby is really great"
irb(main):008:0> matches[“number”]
=> “12345678”
irb(main):009:0> matches[“word”]
=> “we”
irb(main):010:0>

Obviously this isn’t complete, I just whipped it up to respond to this
message. =)

Wes

···

On Tue, 26 Aug 2003 21:28:07 +0900, Gavin Sinclair wrote:

On Tuesday, August 26, 2003, 10:18:24 PM, Emmanuel wrote:

Iain_Spoon_Truskett2 · 27 August 2003 03:04

Hal Fulton (hal9000@hypermetrics.com) [27 Aug 2003 07:22]:

[…]

what do ^ and $ mean then? they do match start and end
for me. what else do they match? *shudders at thought
of changing lots of code

Correct me if I’m wrong. (Famous last words on most newsgroups.)

Isn’t it an issue only in multiline mode? In that case, I
think ^ and $ would match the start and end of the line
rather than the entire string.

In Perl’s multiline mode (/m), yes.

Perl’s multiline mode is on by default in Ruby. Thus this is
true in Ruby by default.

This means that ^ and $ match start and end of line not
string. Always.

Ruby’s multiline mode (/m) is Perl’s single line mode (/s).

^ and $ aren’t affected by Ruby’s multiline mode (/m).

It makes ‘.’ match newline (rather than every char except
newline).

cheers,

···

–
Iain.

Sabbyxtabby · 27 August 2003 09:10

In Ruby, ^ and $ match the start and end of lines not strings.
Multiline mode only tweaks whether . matches newline or not. So
using Brian’s example:

a = “srand\nrm -rf /”
a.untaint if /^[1]+$/ =~ a # matches “srand”
eval a # BOOM!

···

Hal Fulton hal9000@hypermetrics.com wrote:

Ruby’s regular expressions are almost identical to Perl’s.

Except where they are different. The biggest glaring difference is that
^ and $ do not mean “match start of string” and “match end of string”

a.untaint if /^[2]+$/ =~ a # WRONG and maybe dangerous
a.untaint if /\A[a-z]+\z/ =~ a # right

what do ^ and $ mean then? they do match start and end for me. what
else do they match? *shudders at thought of changing lots of code

Isn’t it an issue only in multiline mode? In that case, I think
^ and $ would match the start and end of the line rather than
the entire string.

a-z ↩︎
a-z ↩︎

Mark_Slagell · 26 August 2003 15:35

Florian Frank wrote:

···

On 2003-08-26 21:56:39 +0900, Mark Slagell wrote:

year, month, day =
/(\d{4})-(\d{1,2})-(\d{1,2})/.match(s).to_a
(where s is the string to be matched)

You probably meant to write this:

year, month, day = /(\d{4})-(\d{1,2})-(\d{1,2})/.match(s).captures

um, no, I wrote what I meant, but is something wrong with to_a there?

Austin_Ziegler2 · 26 August 2003 19:37

No. See [ruby-talk:79047] and following.

···

On Wed, 27 Aug 2003 04:29:22 +0900, Wesley J. Landaker wrote:

Apparently, Austin Ziegler recently wrote:

On Tue, 26 Aug 2003 21:28:07 +0900, Gavin Sinclair wrote:

On Tuesday, August 26, 2003, 10:18:24 PM, Emmanuel wrote:

btw, since there is a thread about that, i wanted to ask: does ruby
support named matches (sorry i don’t know the proper terminology)?
I’m 99.99% sure it doesn’t.
The latest Oniguruma supports it. I’m not sure how to use/enable that,
but it does support it.

[ruby-dev:21147] [Oniguruma] list of all captures
[ruby-dev:21174] [Oniguruma] Version 1.9.2

TANAKA Akira suggested a new function, to capture all matchings for
the one expression. e.g.

m = /(?@<name>\/\w+)+/.match("/usr/local/bin/ruby")
p m['name']   #=> ["/usr", "/local", "/bin", "/ruby"]

-austin

austin ziegler * austin@halostatue.ca * Toronto, ON, Canada
software designer * pragmatic programmer * 2003.08.26
* 15.35.20

HAL_9000 · 27 August 2003 15:46

Sabby and Tabby wrote:

Ruby’s regular expressions are almost identical to Perl’s.

Except where they are different. The biggest glaring difference is that
^ and $ do not mean “match start of string” and “match end of string”

a.untaint if /^[1]+$/ =~ a # WRONG and maybe dangerous
a.untaint if /\A[a-z]+\z/ =~ a # right

what do ^ and $ mean then? they do match start and end for me. what
else do they match? *shudders at thought of changing lots of code

Isn’t it an issue only in multiline mode? In that case, I think
^ and $ would match the start and end of the line rather than
the entire string.

In Ruby, ^ and $ match the start and end of lines not strings.
Multiline mode only tweaks whether . matches newline or not. So
using Brian’s example:

a = “srand\nrm -rf /”
a.untaint if /^[2]+$/ =~ a # matches “srand”
eval a # BOOM!

Quite right, thank you.

But in nearly all cases, I have a string that has no newlines.
In that situation, as in classical uses of regexes such as vi,
there’s no problem:

 "abc" =~ /^abc$/    # 0 (true)

I grant you, strings containing newlines will be different.

Hal

···

Hal Fulton hal9000@hypermetrics.com wrote:

a-z ↩︎
a-z ↩︎

ts1 · 26 August 2003 15:40

um, no, I wrote what I meant, but is something wrong with to_a there?

it add $&

svg% ruby -e 'p /.(.)/.match("ab").to_a'
["ab", "b"]
svg%

svg% ruby -e 'p /.(.)/.match("ab").captures'
["b"]
svg%

Guy Decoux

Wesley_J_Landaker · 26 August 2003 19:55

Apparently, Austin Ziegler recently wrote:

···

On Wed, 27 Aug 2003 04:29:22 +0900, Wesley J. Landaker wrote:

Apparently, Austin Ziegler recently wrote:

On Tue, 26 Aug 2003 21:28:07 +0900, Gavin Sinclair wrote:

On Tuesday, August 26, 2003, 10:18:24 PM, Emmanuel wrote:

btw, since there is a thread about that, i wanted to ask: does ruby
support named matches (sorry i don’t know the proper terminology)?
I’m 99.99% sure it doesn’t.
The latest Oniguruma supports it. I’m not sure how to use/enable that,
but it does support it.

No. See [ruby-talk:79047] and following.

[ruby-dev:21147] [Oniguruma] list of all captures
[ruby-dev:21174] [Oniguruma] Version 1.9.2

TANAKA Akira suggested a new function, to capture all matchings for
the one expression. e.g.
m = /(?@<name>\/\w+)+/.match("/usr/local/bin/ruby")
p m['name']   #=> ["/usr", "/local", "/bin", "/ruby"]

Well, that is useful too, but quite different from what the OP asked for
(their example was a regex from C# with named grouping) – you can do that
easily in Ruby, as I just demonstrated. =)

If Oniguruma does that also, then that is great! (I haven’t looked at
Oniguruma and don’t really know what it is.)

Wes

Brian_Candler · 28 August 2003 20:49

In that case you are fine. But if a string is coming from an untrusted
source - and a HTML FORM is a classic example of that - you cannot always be
so sure.

The behaviour is also important for strings which have a newline at the end,
which is a common case in Ruby. I have just tried this again, and it appears
to have changed between ruby-1.6.8 and ruby-1.8.0:

a = “hello\n”
a.sub!(/[\r\n]+$/,‘’)

In Ruby 1.6.8, “a” contains “hello\n” after this. In Ruby 1.8.0, “a”
contains “hello”

[ruby-1.6.8]
irb(main):001:0> “abc\n” =~ /c$/
=> 2
irb(main):002:0> “abc\n” =~ /c\n$/
=> nil

[ruby-1.8.0]
irb(main):003:0> “abc\n” =~ /c$/
=> 2
irb(main):004:0> “abc\n” =~ /c\n$/
=> 2

Regards,

Brian.

···

On Thu, Aug 28, 2003 at 12:46:33AM +0900, Hal Fulton wrote:

In Ruby, ^ and $ match the start and end of lines not strings.
Multiline mode only tweaks whether . matches newline or not. So
using Brian’s example:

a = “srand\nrm -rf /”
a.untaint if /^[1]+$/ =~ a # matches “srand”
eval a # BOOM!

Quite right, thank you.

But in nearly all cases, I have a string that has no newlines.

a-z ↩︎

Topic		Replies	Views
RegExps: are they full Perl5? ruby-talk	1	107	2 December 2003
About Regular Expressions ruby-talk	30	118	20 November 2004
Regular expression ruby-talk	12	101	1 June 2009
Regular expression mismatch? ruby-talk	1	62	7 April 2005
Regular Expression question ruby-talk	8	115	29 July 2002

Specification of Ruby regex?

-austin

Related topics