Ruby Regex - Unable to capture specific data

Hi all,

I have several strings of data, which are all very similar, however I
only wish to look at some strings which match a specific criteria and
ignore the rest. Some samples are below - I want the first and the last
string and to ignore the middle string.

/software/$$$PROTO.HIV/Microsoft/Windows NT/CurrentVersion

/software/CMI-CreateHive{199ADFC2-6E16-4946-BE90-5A3EC3A60902}/Wow6432Node/Microsoft/Office/12.0/User
Settings/Word_Core/Delete/Software/Microsoft/Windows NT/CurrentVersion

/software/CMI-CreateHive{199ADFC2-6E16-4946-BE90-5A3EC3A60902}/Microsoft/Windows
NT/CurrentVersion

I have constructed a regex to say capture any string starting with
/software/ and ending with /Microsoft/Windows NT/CurrentVersion, but
with only one string within slashes in the middle, see below:

\/software\/(.*?)\/Microsoft\/Windows NT\/CurrentVersion$

This regex captures everything because (.*?) takes everything. Any ideas
how I can achieve this? My brain is frying.

Many thanks
S

···

--
Posted via http://www.ruby-forum.com/.

Try this:

strings = ["/software/$$$PROTO.HIV/Microsoft/Windows
NT/CurrentVersion",
"/software/CMI-CreateHive{199ADFC2-6E16-4946-BE90-5A3EC3A60902}/Wow6432Node/Microsoft/Office/12.0/User
Settings/Word_Core/Delete/Software/Microsoft/Windows
NT/CurrentVersion",
"/software/CMI-CreateHive{199ADFC2-6E16-4946-BE90-5A3EC3A60902}/Microsoft/Windows
NT/CurrentVersion"]

re = /\/software\/([^\/]*)\/Microsoft\/Windows NT\/CurrentVersion$/

2.0.0p195 :018 > strings.each do |s|
2.0.0p195 :019 > m = re.match(s)
2.0.0p195 :020?> puts m.captures if m
2.0.0p195 :021?> end
$$$PROTO.HIV
CMI-CreateHive{199ADFC2-6E16-4946-BE90-5A3EC3A60902}

[^\/] is a character class that will match everything that is not a
forward slash. This is repeated zero or more times.

Jesus.

···

On Wed, Dec 4, 2013 at 5:36 PM, Stuart Clarke <lists@ruby-forum.com> wrote:

Hi all,

I have several strings of data, which are all very similar, however I
only wish to look at some strings which match a specific criteria and
ignore the rest. Some samples are below - I want the first and the last
string and to ignore the middle string.

/software/$$$PROTO.HIV/Microsoft/Windows NT/CurrentVersion

/software/CMI-CreateHive{199ADFC2-6E16-4946-BE90-5A3EC3A60902}/Wow6432Node/Microsoft/Office/12.0/User
Settings/Word_Core/Delete/Software/Microsoft/Windows NT/CurrentVersion

/software/CMI-CreateHive{199ADFC2-6E16-4946-BE90-5A3EC3A60902}/Microsoft/Windows
NT/CurrentVersion

I have constructed a regex to say capture any string starting with
/software/ and ending with /Microsoft/Windows NT/CurrentVersion, but
with only one string within slashes in the middle, see below:

\/software\/(.*?)\/Microsoft\/Windows NT\/CurrentVersion$

This regex captures everything because (.*?) takes everything. Any ideas
how I can achieve this? My brain is frying.

Jesus - this still returns all strings for me in Ruby 1.9.

Anmar - I will try your suggestion. Was keen to keep a regex.

···

--
Posted via http://www.ruby-forum.com/.

The scan method might be a better tool for this job. Then all you have to do is specify the elements you want out of the resulting array.

  text.scan(/[^\/]+/)

HTH,
Ammar

···

On Dec 4, 2013, at 6:36 PM, Stuart Clarke <lists@ruby-forum.com> wrote:

/software/$$$PROTO.HIV/Microsoft/Windows NT/CurrentVersion

Jesus,

Apologies - on reflection your sample is exactly what I need. I was a
little to hasty in my reply.

Thanks for the help and also thanks Mike for the tidy up tips.

S

···

--
Posted via http://www.ruby-forum.com/.

Remember that you can use %r to quote regular expressions if the presence of the forward slash causes a lot of escaping. For example in

ratdog:mcqd mike$ pry
[1] pry(main)> re = %r(/software/([^/]*)/Microsoft/Windows NT/CurrentVersion\z) => /\/software\/([^\/]*)\/Microsoft\/Windows NT\/CurrentVersion\z/

%r(/software/([^/]*)/Microsoft/Windows NT/CurrentVersion\z)

seems more reasonable than

/\/software\/([^\/]*)\/Microsoft\/Windows NT\/CurrentVersion$/

I changed $ to \z for matching the end of string as well because it matches the real end of string:

[3] pry(main)> /Hello$/.match "Hello\n"
=> #<MatchData "Hello">
[4] pry(main)> /Hello\z/.match "Hello\n"
=> nil

Hope this helps,

Mike

···

On Dec 4, 2013, at 11:50 AM, Jesús Gabriel y Galán <jgabrielygalan@gmail.com> wrote:

On Wed, Dec 4, 2013 at 5:36 PM, Stuart Clarke <lists@ruby-forum.com> wrote:

Hi all,

I have several strings of data, which are all very similar, however I
only wish to look at some strings which match a specific criteria and
ignore the rest. Some samples are below - I want the first and the last
string and to ignore the middle string.

/software/$$$PROTO.HIV/Microsoft/Windows NT/CurrentVersion

/software/CMI-CreateHive{199ADFC2-6E16-4946-BE90-5A3EC3A60902}/Wow6432Node/Microsoft/Office/12.0/User
Settings/Word_Core/Delete/Software/Microsoft/Windows NT/CurrentVersion

/software/CMI-CreateHive{199ADFC2-6E16-4946-BE90-5A3EC3A60902}/Microsoft/Windows
NT/CurrentVersion

I have constructed a regex to say capture any string starting with
/software/ and ending with /Microsoft/Windows NT/CurrentVersion, but
with only one string within slashes in the middle, see below:

\/software\/(.*?)\/Microsoft\/Windows NT\/CurrentVersion$

This regex captures everything because (.*?) takes everything. Any ideas
how I can achieve this? My brain is frying.

Try this:

strings = ["/software/$$$PROTO.HIV/Microsoft/Windows
NT/CurrentVersion",
"/software/CMI-CreateHive{199ADFC2-6E16-4946-BE90-5A3EC3A60902}/Wow6432Node/Microsoft/Office/12.0/User
Settings/Word_Core/Delete/Software/Microsoft/Windows
NT/CurrentVersion",
"/software/CMI-CreateHive{199ADFC2-6E16-4946-BE90-5A3EC3A60902}/Microsoft/Windows
NT/CurrentVersion"]

re = /\/software\/([^\/]*)\/Microsoft\/Windows NT\/CurrentVersion$/

2.0.0p195 :018 > strings.each do |s|
2.0.0p195 :019 > m = re.match(s)
2.0.0p195 :020?> puts m.captures if m
2.0.0p195 :021?> end
$$$PROTO.HIV
CMI-CreateHive{199ADFC2-6E16-4946-BE90-5A3EC3A60902}

[^\/] is a character class that will match everything that is not a
forward slash. This is repeated zero or more times.

Jesus.

--

Mike Stok <mike@stok.ca>
http://www.stok.ca/~mike/

The "`Stok' disclaimers" apply.

Jesus - this still returns all strings for me in Ruby 1.9.

Works for me in 1.9 too:

1.9.3p448 :027 > re = /\/software\/([^\/]*)\/Microsoft\/Windows
NT\/CurrentVersion$/
=> /\/software\/([^\/]*)\/Microsoft\/Windows NT\/CurrentVersion$/
1.9.3p448 :028 > strings = ["/software/$$$PROTO.HIV/Microsoft/Windows
NT/CurrentVersion",
"/software/CMI-CreateHive{199ADFC2-6E16-4946-BE90-5A3EC3A60902}/Wow6432Node/Microsoft/Office/12.0/UserSettings/Word_Core/Delete/Software/Microsoft/Windows
NT/CurrentVersion",
"/software/CMI-CreateHive{199ADFC2-6E16-4946-BE90-5A3EC3A60902}/Microsoft/Windows
NT/CurrentVersion"]
=> ["/software/$$$PROTO.HIV/Microsoft/Windows NT/CurrentVersion",
"/software/CMI-CreateHive{199ADFC2-6E16-4946-BE90-5A3EC3A60902}/Wow6432Node/Microsoft/Office/12.0/UserSettings/Word_Core/Delete/Software/Microsoft/Windows
NT/CurrentVersion",
"/software/CMI-CreateHive{199ADFC2-6E16-4946-BE90-5A3EC3A60902}/Microsoft/Windows
NT/CurrentVersion"]
1.9.3p448 :029 > strings.each {|s| (m = re.match(s)) && puts(m.captures)}
$$$PROTO.HIV
CMI-CreateHive{199ADFC2-6E16-4946-BE90-5A3EC3A60902}
=> ["/software/$$$PROTO.HIV/Microsoft/Windows NT/CurrentVersion",
"/software/CMI-CreateHive{199ADFC2-6E16-4946-BE90-5A3EC3A60902}/Wow6432Node/Microsoft/Office/12.0/UserSettings/Word_Core/Delete/Software/Microsoft/Windows
NT/CurrentVersion",
"/software/CMI-CreateHive{199ADFC2-6E16-4946-BE90-5A3EC3A60902}/Microsoft/Windows
NT/CurrentVersion"]

Anmar - I will try your suggestion. Was keen to keep a regex.

His solution is using a regex too :).

Jesus.

···

On Wed, Dec 4, 2013 at 6:08 PM, Stuart Clarke <lists@ruby-forum.com> wrote:

That's almost exactly what I'd do. It's just lacking the anchor at
the beginning:

%r(\A/software/([^/]*)/Microsoft/Windows NT/CurrentVersion\z)

Kind regards

robert

···

On Wed, Dec 4, 2013 at 6:10 PM, Mike Stok <mike@stok.ca> wrote:

%r(/software/([^/]*)/Microsoft/Windows NT/CurrentVersion\z)

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

···

On Wed, Dec 4, 2013 at 11:59 AM, Jesús Gabriel y Galán < jgabrielygalan@gmail.com> wrote:

On Wed, Dec 4, 2013 at 6:08 PM, Stuart Clarke <lists@ruby-forum.com> > wrote:
> Jesus - this still returns all strings for me in Ruby 1.9.

Works for me in 1.9 too:

1.9.3p448 :027 > re = /\/software\/([^\/]*)\/Microsoft\/Windows
NT\/CurrentVersion$/
=> /\/software\/([^\/]*)\/Microsoft\/Windows NT\/CurrentVersion$/
1.9.3p448 :028 > strings = ["/software/$$$PROTO.HIV/Microsoft/Windows
NT/CurrentVersion",

"/software/CMI-CreateHive{199ADFC2-6E16-4946-BE90-5A3EC3A60902}/Wow6432Node/Microsoft/Office/12.0/UserSettings/Word_Core/Delete/Software/Microsoft/Windows
NT/CurrentVersion",

"/software/CMI-CreateHive{199ADFC2-6E16-4946-BE90-5A3EC3A60902}/Microsoft/Windows
NT/CurrentVersion"]
=> ["/software/$$$PROTO.HIV/Microsoft/Windows NT/CurrentVersion",

"/software/CMI-CreateHive{199ADFC2-6E16-4946-BE90-5A3EC3A60902}/Wow6432Node/Microsoft/Office/12.0/UserSettings/Word_Core/Delete/Software/Microsoft/Windows
NT/CurrentVersion",

"/software/CMI-CreateHive{199ADFC2-6E16-4946-BE90-5A3EC3A60902}/Microsoft/Windows
NT/CurrentVersion"]
1.9.3p448 :029 > strings.each {|s| (m = re.match(s)) && puts(m.captures)}
$$$PROTO.HIV
CMI-CreateHive{199ADFC2-6E16-4946-BE90-5A3EC3A60902}
=> ["/software/$$$PROTO.HIV/Microsoft/Windows NT/CurrentVersion",

"/software/CMI-CreateHive{199ADFC2-6E16-4946-BE90-5A3EC3A60902}/Wow6432Node/Microsoft/Office/12.0/UserSettings/Word_Core/Delete/Software/Microsoft/Windows
NT/CurrentVersion",

"/software/CMI-CreateHive{199ADFC2-6E16-4946-BE90-5A3EC3A60902}/Microsoft/Windows
NT/CurrentVersion"]

> Anmar - I will try your suggestion. Was keen to keep a regex.

His solution is using a regex too :).

Jesus.