Someone can correct me if I'm wrong about this, but since Regex match from
left to right, your expression is complete at the end of the first match. I
don't think it parses the whole string into subsequently matching groupings.
message[/(#\w+).*(#\w+)/, 2] would give you "#Ram" since you'd be telling
it to expect the second identifier ("#text"), but that may not be the
functionality you're looking for.
Andrew
···
On Mon, Jan 27, 2014 at 9:21 AM, Arup Rakshit <lists@ruby-forum.com> wrote:
Why I am not getting second capture from the string
irb(main):001:0> message = '#bat with some #Ram'
=> "#Identifier with some #text"
irb(main):004:0> message[/(#\w+)/,2]
=> nil
irb(main):005:0> message[/(#\w+)/,1]
=> "#bat"
irb(main):006:0>
can the below reular expression can be written in another way, to get
the same output ?
You should starting by matching only once to avoid unnecessary work.
(arup~>~)$ pry --simple-prompt
s = "315 Kw (422 Engine power (HP))"
=> "315 Kw (422 Engine power (HP))"
s[/(\d+)[^0-9]*(\d+)/,2]
=> "422"
s[/(\d+)[^0-9]*(\d+)/,1]
=> "315"
irb(main):001:0> s = "315 Kw (422 Engine power (HP))"
=> "315 Kw (422 Engine power (HP))"
irb(main):002:0> /(\d+)\D+(\d+)/ =~ s
=> 0
irb(main):003:0> kw = Integer($1)
=> 315
irb(main):004:0> hp = Integer($2)
=> 422
In this case you can also use String#scan
irb(main):005:0> kw, hp = s.scan(/\d+/).map {|m| Integer(m)}
=> [315, 422]
irb(main):006:0> kw
=> 315
irb(main):007:0> hp
=> 422
Downside is that you do not have good control over the match. I'ts
probably better to do something like
irb(main):008:0> /(\d+)\s*kw\s*\(\s*(\d+)/i =~ s
=> 0
irb(main):009:0> kw = Integer($1)
=> 315
irb(main):010:0> hp = Integer($2)
=> 422
That gives you a bit more confidence that the string looks the way you
expect. Of course you can extend that even more by adding anchors and
pattern for the trailing portion.
You can also use named captures:
irb(main):011:0> kw = hp = nil
=> nil
irb(main):012:0> /(?<kw>\d+)\s*kw\s*\(\s*(?<hp>\d+)/i =~ s
=> 0
irb(main):013:0> kw
=> "315"
irb(main):014:0> hp
=> "422"
Kind regards
robert
···
On Wed, Feb 5, 2014 at 9:59 PM, Arup Rakshit <lists@ruby-forum.com> wrote:
I'm guessing that Rubular is checking for all the matches across the
string, kind of like what String#scan does:
2.0.0p195 :001 > message = '#bat with some #Ram'
=> "#bat with some #Ram"
2.0.0p195 :002 > message.scan(/(#\w+)/)
=> [["#bat"], ["#Ram"]]
In your example, Rubular reports 2 matches, and within each match a
single group. If you check in Rubular what Andrew and Joel proposed,
you will see just one match with 2 captured groups:
Jesus.
···
On Mon, Jan 27, 2014 at 3:47 PM, Arup Rakshit <lists@ruby-forum.com> wrote:
Joel Pearson wrote in post #1134542:
You're only looking for 1 match group. If you use 2 match groups, you
can look for the second one:
message[/(#\w+).*(#\w+)/,2]
=> "#Ram"
But here http://rubular.com/r/rZgnEP3hSP I can see the matches as 1,2.
Why not then String# doesn't work that way. Still I am in a confusion.
On Mon, Jan 27, 2014 at 9:06 AM, Jesús Gabriel y Galán < jgabrielygalan@gmail.com> wrote:
On Mon, Jan 27, 2014 at 3:47 PM, Arup Rakshit <lists@ruby-forum.com> > wrote:
> Joel Pearson wrote in post #1134542:
>> You're only looking for 1 match group. If you use 2 match groups, you
>> can look for the second one:
>>
>> message[/(#\w+).*(#\w+)/,2]
>> => "#Ram"
>
> But here Rubular: (#\w+) I can see the matches as 1,2.
> Why not then String# doesn't work that way. Still I am in a confusion.
>
> --
> Posted via http://www.ruby-forum.com/\.
I'm guessing that Rubular is checking for all the matches across the
string, kind of like what String#scan does:
2.0.0p195 :001 > message = '#bat with some #Ram'
=> "#bat with some #Ram"
2.0.0p195 :002 > message.scan(/(#\w+)/)
=> [["#bat"], ["#Ram"]]
The regex /[aeiou](.)\1/ matches the substring "ell". Specifically [aeiou]
matches the "e", the dot matches the "l" and \1 matches the second "l".
Using 0 as the second argument to String# selects the whole match -
just like when you don't supply a second argument at all. Using 1 selects
the contents matched by the first capturing group, which is "(.)". Since that
matched "l", that's what you get. Using 2 selects the second capturing group,
but the regex /[aeiou](.)\1/ only contains one capturing group, so you get nil.
Note that the concept of a capturing group has nothing to do with how often
the regex can be matched in a given string. It's solely a property of the regex.
Specifically a capturing group is a part of the regex that's enclosed in parentheses
and does not start with "?:", "?=" or similar modifiers that make a group non-
capturing.
If I do, message[/(#\w+)/], I had perception that, all the match groups
created, then using 1,2,3, as the second argument, I can access the
respective matched group's content. But which is not the case,
understood from - https://www.ruby-forum.com/topic/4422155?reply_to=1134864#1134556\.
But yes, `String#scan` is enough for this purpose, as each match will be
a separate entry inside the array. So If I want first match I can call,
say ar[0], for second ar[1], so on..
···
On Mon, Jan 27, 2014 at 9:06 AM, Jesús Gabriel y Galán < > jgabrielygalan@gmail.com> wrote:
You are right, I just copy pasted the original Regexp.
Jesus.
···
On Thu, Jan 30, 2014 at 12:24 AM, tamouse pontiki <tamouse.lists@gmail.com> wrote:
On Mon, Jan 27, 2014 at 9:06 AM, Jesús Gabriel y Galán > <jgabrielygalan@gmail.com> wrote:
On Mon, Jan 27, 2014 at 3:47 PM, Arup Rakshit <lists@ruby-forum.com> >> wrote:
> Joel Pearson wrote in post #1134542:
>> You're only looking for 1 match group. If you use 2 match groups, you
>> can look for the second one:
>>
>> message[/(#\w+).*(#\w+)/,2]
>> => "#Ram"
>
> But here Rubular: (#\w+) I can see the matches as 1,2.
> Why not then String# doesn't work that way. Still I am in a confusion.
>
> --
> Posted via http://www.ruby-forum.com/\.
I'm guessing that Rubular is checking for all the matches across the
string, kind of like what String#scan does:
2.0.0p195 :001 > message = '#bat with some #Ram'
=> "#bat with some #Ram"
2.0.0p195 :002 > message.scan(/(#\w+)/)
=> [["#bat"], ["#Ram"]]
Not exactly sure why you'd want the subgrouping with scan as it's creating a
nested array here.