String#[] confusions

Why I am not getting second capture from the string

irb(main):001:0> message = '#bat with some #Ram'
=> "#Identifier with some #text"
irb(main):004:0> message[/(#\w+)/,2]
=> nil
irb(main):005:0> message[/(#\w+)/,1]
=> "#bat"
irb(main):006:0>

Why does `message[/(#\w+)/,2`] return nil ?]

Rubular - http://rubular.com/r/rZgnEP3hSP

···

--
Posted via http://www.ruby-forum.com/.

Someone can correct me if I'm wrong about this, but since Regex match from
left to right, your expression is complete at the end of the first match. I
don't think it parses the whole string into subsequently matching groupings.

message[/(#\w+).*(#\w+)/, 2] would give you "#Ram" since you'd be telling
it to expect the second identifier ("#text"), but that may not be the
functionality you're looking for.

Andrew

···

On Mon, Jan 27, 2014 at 9:21 AM, Arup Rakshit <lists@ruby-forum.com> wrote:

Why I am not getting second capture from the string

irb(main):001:0> message = '#bat with some #Ram'
=> "#Identifier with some #text"
irb(main):004:0> message[/(#\w+)/,2]
=> nil
irb(main):005:0> message[/(#\w+)/,1]
=> "#bat"
irb(main):006:0>

Why does `message[/(#\w+)/,2`] return nil ?]

Rubular - Rubular: (#\w+)

--
Posted via http://www.ruby-forum.com/\.

You're only looking for 1 match group. If you use 2 match groups, you
can look for the second one:

message[/(#\w+).*(#\w+)/,2]
=> "#Ram"

···

--
Posted via http://www.ruby-forum.com/.

can the below reular expression can be written in another way, to get
the same output ?

(arup~>~)$ pry --simple-prompt

s = "315 Kw (422 Engine power (HP))"

=> "315 Kw (422 Engine power (HP))"

s[/(\d+)[^0-9]*(\d+)/,2]

=> "422"

s[/(\d+)[^0-9]*(\d+)/,1]

=> "315"

···

--
Posted via http://www.ruby-forum.com/\.

@Robert - Thanks for mentioning all these possibilities. A good learning
for me.

···

--
Posted via http://www.ruby-forum.com/.

Joel Pearson wrote in post #1134542:

You're only looking for 1 match group. If you use 2 match groups, you
can look for the second one:

message[/(#\w+).*(#\w+)/,2]
=> "#Ram"

But here Rubular: (#\w+) I can see the matches as 1,2.
Why not then String# doesn't work that way. Still I am in a confusion.

···

--
Posted via http://www.ruby-forum.com/\.

can the below reular expression can be written in another way, to get
the same output ?

You should starting by matching only once to avoid unnecessary work.

(arup~>~)$ pry --simple-prompt

s = "315 Kw (422 Engine power (HP))"

=> "315 Kw (422 Engine power (HP))"

s[/(\d+)[^0-9]*(\d+)/,2]

=> "422"

s[/(\d+)[^0-9]*(\d+)/,1]

=> "315"

irb(main):001:0> s = "315 Kw (422 Engine power (HP))"
=> "315 Kw (422 Engine power (HP))"
irb(main):002:0> /(\d+)\D+(\d+)/ =~ s
=> 0
irb(main):003:0> kw = Integer($1)
=> 315
irb(main):004:0> hp = Integer($2)
=> 422

In this case you can also use String#scan

irb(main):005:0> kw, hp = s.scan(/\d+/).map {|m| Integer(m)}
=> [315, 422]
irb(main):006:0> kw
=> 315
irb(main):007:0> hp
=> 422

Downside is that you do not have good control over the match. I'ts
probably better to do something like

irb(main):008:0> /(\d+)\s*kw\s*\(\s*(\d+)/i =~ s
=> 0
irb(main):009:0> kw = Integer($1)
=> 315
irb(main):010:0> hp = Integer($2)
=> 422

That gives you a bit more confidence that the string looks the way you
expect. Of course you can extend that even more by adding anchors and
pattern for the trailing portion.

You can also use named captures:

irb(main):011:0> kw = hp = nil
=> nil
irb(main):012:0> /(?<kw>\d+)\s*kw\s*\(\s*(?<hp>\d+)/i =~ s
=> 0
irb(main):013:0> kw
=> "315"
irb(main):014:0> hp
=> "422"

Kind regards

robert

···

On Wed, Feb 5, 2014 at 9:59 PM, Arup Rakshit <lists@ruby-forum.com> wrote:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

I'm guessing that Rubular is checking for all the matches across the
string, kind of like what String#scan does:

2.0.0p195 :001 > message = '#bat with some #Ram'
=> "#bat with some #Ram"
2.0.0p195 :002 > message.scan(/(#\w+)/)
=> [["#bat"], ["#Ram"]]

In your example, Rubular reports 2 matches, and within each match a
single group. If you check in Rubular what Andrew and Joel proposed,
you will see just one match with 2 captured groups:

Jesus.

···

On Mon, Jan 27, 2014 at 3:47 PM, Arup Rakshit <lists@ruby-forum.com> wrote:

Joel Pearson wrote in post #1134542:

You're only looking for 1 match group. If you use 2 match groups, you
can look for the second one:

message[/(#\w+).*(#\w+)/,2]
=> "#Ram"

But here http://rubular.com/r/rZgnEP3hSP I can see the matches as 1,2.
Why not then String# doesn't work that way. Still I am in a confusion.

--
Posted via http://www.ruby-forum.com/\.

Jesús Gabriel y Galán wrote in post #1134549:

--
Posted via http://www.ruby-forum.com/\.

I'm guessing that Rubular is checking for all the matches across the
string, kind of like what String#scan does:

2.0.0p195 :001 > message = '#bat with some #Ram'
=> "#bat with some #Ram"
2.0.0p195 :002 > message.scan(/(#\w+)/)
=> [["#bat"], ["#Ram"]]

Thanks you...

Can you explain the below 3 outputs below :

a = "hello there"

a[/[aeiou](.)\1/, 0] #=> "ell"
a[/[aeiou](.)\1/, 1] #=> "l"
a[/[aeiou](.)\1/, 2] #=> nil

Hope that will help me what's the actual use case of talking out
captures using numbers..

···

On Mon, Jan 27, 2014 at 3:47 PM, Arup Rakshit <lists@ruby-forum.com> > wrote:

--
Posted via http://www.ruby-forum.com/\.

Not exactly sure why you'd want the subgrouping with scan as it's creating
a nested array here.

[6] pry(main)> message.scan(*/*#*\w*+*/*) # => [*"*#bat*"*, *"*#Ram*"*]

···

On Mon, Jan 27, 2014 at 9:06 AM, Jesús Gabriel y Galán < jgabrielygalan@gmail.com> wrote:

On Mon, Jan 27, 2014 at 3:47 PM, Arup Rakshit <lists@ruby-forum.com> > wrote:
> Joel Pearson wrote in post #1134542:
>> You're only looking for 1 match group. If you use 2 match groups, you
>> can look for the second one:
>>
>> message[/(#\w+).*(#\w+)/,2]
>> => "#Ram"
>
> But here Rubular: (#\w+) I can see the matches as 1,2.
> Why not then String# doesn't work that way. Still I am in a confusion.
>
> --
> Posted via http://www.ruby-forum.com/\.

I'm guessing that Rubular is checking for all the matches across the
string, kind of like what String#scan does:

2.0.0p195 :001 > message = '#bat with some #Ram'
=> "#bat with some #Ram"
2.0.0p195 :002 > message.scan(/(#\w+)/)
=> [["#bat"], ["#Ram"]]

The regex /[aeiou](.)\1/ matches the substring "ell". Specifically [aeiou]
matches the "e", the dot matches the "l" and \1 matches the second "l".

Using 0 as the second argument to String# selects the whole match -
just like when you don't supply a second argument at all. Using 1 selects
the contents matched by the first capturing group, which is "(.)". Since that
matched "l", that's what you get. Using 2 selects the second capturing group,
but the regex /[aeiou](.)\1/ only contains one capturing group, so you get nil.

Note that the concept of a capturing group has nothing to do with how often
the regex can be matched in a given string. It's solely a property of the regex.
Specifically a capturing group is a part of the regex that's enclosed in parentheses
and does not start with "?:", "?=" or similar modifiers that make a group non-
capturing.

···

On 27.01.2014 16:38, Arup Rakshit wrote:

Can you explain the below 3 outputs below :

a = "hello there"

a[/[aeiou](.)\1/, 0] #=> "ell"
a[/[aeiou](.)\1/, 1] #=> "l"
a[/[aeiou](.)\1/, 2] #=> nil

tamouse m. wrote in post #1134864:

> Why not then String# doesn't work that way. Still I am in a confusion.
=> [["#bat"], ["#Ram"]]

Not exactly sure why you'd want the subgrouping with scan as it's
creating
a nested array here.

[6] pry(main)> message.scan(*/*#*\w*+*/*) # => [*"*#bat*"*, *"*#Ram*"*]

I had some wrong perception about `String#`.

If I do, message[/(#\w+)/], I had perception that, all the match groups
created, then using 1,2,3, as the second argument, I can access the
respective matched group's content. But which is not the case,
understood from -
https://www.ruby-forum.com/topic/4422155?reply_to=1134864#1134556\.

But yes, `String#scan` is enough for this purpose, as each match will be
a separate entry inside the array. So If I want first match I can call,
say ar[0], for second ar[1], so on..

···

On Mon, Jan 27, 2014 at 9:06 AM, Jesús Gabriel y Galán < > jgabrielygalan@gmail.com> wrote:

--
Posted via http://www.ruby-forum.com/\.

You are right, I just copy pasted the original Regexp.

Jesus.

···

On Thu, Jan 30, 2014 at 12:24 AM, tamouse pontiki <tamouse.lists@gmail.com> wrote:

On Mon, Jan 27, 2014 at 9:06 AM, Jesús Gabriel y Galán > <jgabrielygalan@gmail.com> wrote:

On Mon, Jan 27, 2014 at 3:47 PM, Arup Rakshit <lists@ruby-forum.com> >> wrote:
> Joel Pearson wrote in post #1134542:
>> You're only looking for 1 match group. If you use 2 match groups, you
>> can look for the second one:
>>
>> message[/(#\w+).*(#\w+)/,2]
>> => "#Ram"
>
> But here Rubular: (#\w+) I can see the matches as 1,2.
> Why not then String# doesn't work that way. Still I am in a confusion.
>
> --
> Posted via http://www.ruby-forum.com/\.

I'm guessing that Rubular is checking for all the matches across the
string, kind of like what String#scan does:

2.0.0p195 :001 > message = '#bat with some #Ram'
=> "#bat with some #Ram"
2.0.0p195 :002 > message.scan(/(#\w+)/)
=> [["#bat"], ["#Ram"]]

Not exactly sure why you'd want the subgrouping with scan as it's creating a
nested array here.

[6] pry(main)> message.scan(/#\w+/) # => ["#bat", "#Ram"]

Sebastian Hungerecker wrote in post #1134556:

···

On 27.01.2014 16:38, Arup Rakshit wrote:

Can you explain the below 3 outputs below :

a = "hello there"

a[/[aeiou](.)\1/, 0] #=> "ell"
a[/[aeiou](.)\1/, 1] #=> "l"
a[/[aeiou](.)\1/, 2] #=> nil

The regex /[aeiou](.)\1/ matches the substring "ell". Specifically
[aeiou]
matches the "e", the dot matches the "l" and \1 matches the second "l".

Thank you very much! I got it now fully..

--
Posted via http://www.ruby-forum.com/\.