String#[] confusions

7stud2 · 27 January 2014 14:21

Why I am not getting second capture from the string

irb(main):001:0> message = '#bat with some #Ram'
=> "#Identifier with some #text"
irb(main):004:0> message[/(#\w+)/,2]
=> nil
irb(main):005:0> message[/(#\w+)/,1]
=> "#bat"
irb(main):006:0>

Why does `message[/(#\w+)/,2`] return nil ?]

Rubular - http://rubular.com/r/rZgnEP3hSP

···

--
Posted via http://www.ruby-forum.com/.

Andrew_Loucky · 27 January 2014 14:31

Someone can correct me if I'm wrong about this, but since Regex match from
left to right, your expression is complete at the end of the first match. I
don't think it parses the whole string into subsequently matching groupings.

message[/(#\w+).*(#\w+)/, 2] would give you "#Ram" since you'd be telling
it to expect the second identifier ("#text"), but that may not be the
functionality you're looking for.

Andrew

···

On Mon, Jan 27, 2014 at 9:21 AM, Arup Rakshit <lists@ruby-forum.com> wrote:

Why I am not getting second capture from the string

irb(main):001:0> message = '#bat with some #Ram'
=> "#Identifier with some #text"
irb(main):004:0> message[/(#\w+)/,2]
=> nil
irb(main):005:0> message[/(#\w+)/,1]
=> "#bat"
irb(main):006:0>

Why does `message[/(#\w+)/,2`] return nil ?]

Rubular - Rubular: (#\w+)

--
Posted via http://www.ruby-forum.com/\.

7stud2 · 27 January 2014 14:32

You're only looking for 1 match group. If you use 2 match groups, you
can look for the second one:

message[/(#\w+).*(#\w+)/,2]
=> "#Ram"

···

--
Posted via http://www.ruby-forum.com/.

7stud2 · 5 February 2014 20:59

can the below reular expression can be written in another way, to get
the same output ?

(arup~>~)$ pry --simple-prompt

s = "315 Kw (422 Engine power (HP))"

=> "315 Kw (422 Engine power (HP))"

s[/(\d+)[^0-9]*(\d+)/,2]

=> "422"

s[/(\d+)[^0-9]*(\d+)/,1]

=> "315"

···

--
Posted via http://www.ruby-forum.com/\.

7stud2 · 5 February 2014 22:34

@Robert - Thanks for mentioning all these possibilities. A good learning
for me.

···

--
Posted via http://www.ruby-forum.com/.

7stud2 · 27 January 2014 14:47

Joel Pearson wrote in post #1134542:

You're only looking for 1 match group. If you use 2 match groups, you
can look for the second one:

message[/(#\w+).*(#\w+)/,2]
=> "#Ram"

But here Rubular: (#\w+) I can see the matches as 1,2.
Why not then String# doesn't work that way. Still I am in a confusion.

···

--
Posted via http://www.ruby-forum.com/\.

Robert_K1 · 5 February 2014 22:23

can the below reular expression can be written in another way, to get
the same output ?

You should starting by matching only once to avoid unnecessary work.

(arup~>~)$ pry --simple-prompt

s = "315 Kw (422 Engine power (HP))"

=> "315 Kw (422 Engine power (HP))"

s[/(\d+)[^0-9]*(\d+)/,2]

=> "422"

s[/(\d+)[^0-9]*(\d+)/,1]

=> "315"

irb(main):001:0> s = "315 Kw (422 Engine power (HP))"
=> "315 Kw (422 Engine power (HP))"
irb(main):002:0> /(\d+)\D+(\d+)/ =~ s
=> 0
irb(main):003:0> kw = Integer($1)
=> 315
irb(main):004:0> hp = Integer($2)
=> 422

In this case you can also use String#scan

irb(main):005:0> kw, hp = s.scan(/\d+/).map {|m| Integer(m)}
=> [315, 422]
irb(main):006:0> kw
=> 315
irb(main):007:0> hp
=> 422

Downside is that you do not have good control over the match. I'ts
probably better to do something like

irb(main):008:0> /(\d+)\s*kw\s*\(\s*(\d+)/i =~ s
=> 0
irb(main):009:0> kw = Integer($1)
=> 315
irb(main):010:0> hp = Integer($2)
=> 422

That gives you a bit more confidence that the string looks the way you
expect. Of course you can extend that even more by adding anchors and
pattern for the trailing portion.

You can also use named captures:

irb(main):011:0> kw = hp = nil
=> nil
irb(main):012:0> /(?<kw>\d+)\s*kw\s*\(\s*(?<hp>\d+)/i =~ s
=> 0
irb(main):013:0> kw
=> "315"
irb(main):014:0> hp
=> "422"

Kind regards

robert

···

On Wed, Feb 5, 2014 at 9:59 PM, Arup Rakshit <lists@ruby-forum.com> wrote:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Jesus_Gabriel_y_Gala · 27 January 2014 15:06

I'm guessing that Rubular is checking for all the matches across the
string, kind of like what String#scan does:

2.0.0p195 :001 > message = '#bat with some #Ram'
=> "#bat with some #Ram"
2.0.0p195 :002 > message.scan(/(#\w+)/)
=> [["#bat"], ["#Ram"]]

In your example, Rubular reports 2 matches, and within each match a
single group. If you check in Rubular what Andrew and Joel proposed,
you will see just one match with 2 captured groups:

Jesus.

···

On Mon, Jan 27, 2014 at 3:47 PM, Arup Rakshit <lists@ruby-forum.com> wrote:

Joel Pearson wrote in post #1134542:

You're only looking for 1 match group. If you use 2 match groups, you
can look for the second one:

message[/(#\w+).*(#\w+)/,2]
=> "#Ram"

But here http://rubular.com/r/rZgnEP3hSP I can see the matches as 1,2.
Why not then String# doesn't work that way. Still I am in a confusion.

--
Posted via http://www.ruby-forum.com/\.

7stud2 · 27 January 2014 15:38

Jesús Gabriel y Galán wrote in post #1134549:

--
Posted via http://www.ruby-forum.com/\.

I'm guessing that Rubular is checking for all the matches across the
string, kind of like what String#scan does:

2.0.0p195 :001 > message = '#bat with some #Ram'
=> "#bat with some #Ram"
2.0.0p195 :002 > message.scan(/(#\w+)/)
=> [["#bat"], ["#Ram"]]

Thanks you...

Can you explain the below 3 outputs below :

a = "hello there"

a[/[aeiou](.)\1/, 0] #=> "ell"
a[/[aeiou](.)\1/, 1] #=> "l"
a[/[aeiou](.)\1/, 2] #=> nil

Hope that will help me what's the actual use case of talking out
captures using numbers..

···

On Mon, Jan 27, 2014 at 3:47 PM, Arup Rakshit <lists@ruby-forum.com> > wrote:

--
Posted via http://www.ruby-forum.com/\.

Tamara_Temple1 · 29 January 2014 23:24

Not exactly sure why you'd want the subgrouping with scan as it's creating
a nested array here.

[6] pry(main)> message.scan(*/*#*\w*+*/*) # => [*"*#bat*"*, *"*#Ram*"*]

···

On Mon, Jan 27, 2014 at 9:06 AM, Jesús Gabriel y Galán < jgabrielygalan@gmail.com> wrote:

On Mon, Jan 27, 2014 at 3:47 PM, Arup Rakshit <lists@ruby-forum.com> > wrote:
> Joel Pearson wrote in post #1134542:
>> You're only looking for 1 match group. If you use 2 match groups, you
>> can look for the second one:
>>
>> message[/(#\w+).*(#\w+)/,2]
>> => "#Ram"
>
> But here Rubular: (#\w+) I can see the matches as 1,2.
> Why not then String# doesn't work that way. Still I am in a confusion.
>
> --
> Posted via http://www.ruby-forum.com/\.

I'm guessing that Rubular is checking for all the matches across the
string, kind of like what String#scan does:

2.0.0p195 :001 > message = '#bat with some #Ram'
=> "#bat with some #Ram"
2.0.0p195 :002 > message.scan(/(#\w+)/)
=> [["#bat"], ["#Ram"]]

Sebastian_Hungereck1 · 27 January 2014 15:48

The regex /[aeiou](.)\1/ matches the substring "ell". Specifically [aeiou]
matches the "e", the dot matches the "l" and \1 matches the second "l".

Using 0 as the second argument to String# selects the whole match -
just like when you don't supply a second argument at all. Using 1 selects
the contents matched by the first capturing group, which is "(.)". Since that
matched "l", that's what you get. Using 2 selects the second capturing group,
but the regex /[aeiou](.)\1/ only contains one capturing group, so you get nil.

Note that the concept of a capturing group has nothing to do with how often
the regex can be matched in a given string. It's solely a property of the regex.
Specifically a capturing group is a part of the regex that's enclosed in parentheses
and does not start with "?:", "?=" or similar modifiers that make a group non-
capturing.

···

On 27.01.2014 16:38, Arup Rakshit wrote:

Can you explain the below 3 outputs below :

a = "hello there"

a[/[aeiou](.)\1/, 0] #=> "ell"
a[/[aeiou](.)\1/, 1] #=> "l"
a[/[aeiou](.)\1/, 2] #=> nil

7stud2 · 30 January 2014 13:26

tamouse m. wrote in post #1134864:

> Why not then String# doesn't work that way. Still I am in a confusion.
=> [["#bat"], ["#Ram"]]

Not exactly sure why you'd want the subgrouping with scan as it's
creating
a nested array here.

[6] pry(main)> message.scan(*/*#*\w*+*/*) # => [*"*#bat*"*, *"*#Ram*"*]

I had some wrong perception about `String#`.

If I do, message[/(#\w+)/], I had perception that, all the match groups
created, then using 1,2,3, as the second argument, I can access the
respective matched group's content. But which is not the case,
understood from -
https://www.ruby-forum.com/topic/4422155?reply_to=1134864#1134556\.

But yes, `String#scan` is enough for this purpose, as each match will be
a separate entry inside the array. So If I want first match I can call,
say ar[0], for second ar[1], so on..

···

On Mon, Jan 27, 2014 at 9:06 AM, Jesús Gabriel y Galán < > jgabrielygalan@gmail.com> wrote:

--
Posted via http://www.ruby-forum.com/\.

Jesus_Gabriel_y_Gala · 30 January 2014 15:05

You are right, I just copy pasted the original Regexp.

Jesus.

···

On Thu, Jan 30, 2014 at 12:24 AM, tamouse pontiki <tamouse.lists@gmail.com> wrote:

On Mon, Jan 27, 2014 at 9:06 AM, Jesús Gabriel y Galán > <jgabrielygalan@gmail.com> wrote:

On Mon, Jan 27, 2014 at 3:47 PM, Arup Rakshit <lists@ruby-forum.com> >> wrote:
> Joel Pearson wrote in post #1134542:
>> You're only looking for 1 match group. If you use 2 match groups, you
>> can look for the second one:
>>
>> message[/(#\w+).*(#\w+)/,2]
>> => "#Ram"
>
> But here Rubular: (#\w+) I can see the matches as 1,2.
> Why not then String# doesn't work that way. Still I am in a confusion.
>
> --
> Posted via http://www.ruby-forum.com/\.

I'm guessing that Rubular is checking for all the matches across the
string, kind of like what String#scan does:

2.0.0p195 :001 > message = '#bat with some #Ram'
=> "#bat with some #Ram"
2.0.0p195 :002 > message.scan(/(#\w+)/)
=> [["#bat"], ["#Ram"]]

Not exactly sure why you'd want the subgrouping with scan as it's creating a
nested array here.

[6] pry(main)> message.scan(/#\w+/) # => ["#bat", "#Ram"]

7stud2 · 27 January 2014 17:28

Sebastian Hungerecker wrote in post #1134556:

···

On 27.01.2014 16:38, Arup Rakshit wrote:

Can you explain the below 3 outputs below :

a = "hello there"

a[/[aeiou](.)\1/, 0] #=> "ell"
a[/[aeiou](.)\1/, 1] #=> "l"
a[/[aeiou](.)\1/, 2] #=> nil

The regex /[aeiou](.)\1/ matches the substring "ell". Specifically
[aeiou]
matches the "e", the dot matches the "l" and \1 matches the second "l".

Thank you very much! I got it now fully..

--
Posted via http://www.ruby-forum.com/\.

Topic		Replies	Views
Multiline Regexps ruby-talk	3	83	9 December 2003
String#scan strangeness ruby-talk	6	121	10 June 2004
Do You Understand Regular Expressions? ruby-talk	19	112	22 June 2007
Puzzling regex behaviour ruby-talk	23	118	16 February 2007
String#scan puzzlement ruby-talk	2	61	12 July 2006

String#[] confusions

Related topics