Help: Efficient regular expression

string = "root 14051 14033 3 08:39 pts/2 00:00:00 /bin/bash"

i need to fetch 14051 and /bin/bash from the string

can someone help me to write an efficient regular expression for that.

i am a beginner, i wrote
string =~ /(\d+)\s+(\d+)\s+\d+\s+\d+:\d+\s+.*\s+\d+:\d+:\d+\s+(.*)\s/

i know this is not the efficient way of doing it.

Please help.

···

--
Posted via http://www.ruby-forum.com/.

Divya Badrinath wrote:

string = "root 14051 14033 3 08:39 pts/2 00:00:00 /bin/bash"

i need to fetch 14051 and /bin/bash from the string

i mean i need the 2nd column and the last column.

···

can someone help me to write an efficient regular expression for that.

i am a beginner, i wrote
string =~ /(\d+)\s+(\d+)\s+\d+\s+\d+:\d+\s+.*\s+\d+:\d+:\d+\s+(.*)\s/

i know this is not the efficient way of doing it.

Please help.

--
Posted via http://www.ruby-forum.com/.

Divya Badrinath schrieb:

string = "root 14051 14033 3 08:39 pts/2 00:00:00 /bin/bash"

i need to fetch 14051 and /bin/bash from the string

can someone help me to write an efficient regular expression for that.

i am a beginner, i wrote
string =~ /(\d+)\s+(\d+)\s+\d+\s+\d+:\d+\s+.*\s+\d+:\d+:\d+\s+(.*)\s/

i know this is not the efficient way of doing it.

Please help.

talking about efficient, I was just curious...

#!/usr/bin/env ruby -w

···

#
# Created by Florian Aßmann on 2007-07-10.
# Copyright (c) 2007. All rights reserved.

string = "root 14051 14033 3 08:39 pts/2 00:00:00 /bin/bash"
require 'profiler'

puts <<-EOS

pid = string[/\s(\d+)/, 1]
cmd = string[/\s(\S+)$/, 1]

EOS
Profiler__::start_profile

10000.times do
  pid = string[/\s(\d+)/, 1]
  cmd = string[/\s(\S+)$/, 1]
end

Profiler__::stop_profile
Profiler__::print_profile STDOUT

puts <<-EOS

cols = string.split
sec, last = cols.values_at(1, -1)

EOS
Profiler__::start_profile

10000.times do
  cols = string.split
  sec, last = cols.values_at(1, -1)
end

Profiler__::stop_profile
Profiler__::print_profile STDOUT

puts <<-EOS

number = string.split[1]
program = string.split.last

EOS
Profiler__::start_profile

10000.times do
  number = string.split[1]
  program = string.split.last
end

Profiler__::stop_profile
Profiler__::print_profile STDOUT

*grin*

Florian

sorry for being OT since I'm not going to talk about ruby or regexp

If the string you're parsing is an output from the ps command you can
simplify your life using the -o option that prints only the fields you
need.

I.E. in gnu Linux

ps -ao pid,command

just outputs pid and command columns.
Be careful since the command column can contain spaces.

Paolo

···

On 10/07/07, Divya Badrinath <dbadrinath@dash.net> wrote:

string = "root 14051 14033 3 08:39 pts/2 00:00:00 /bin/bash"

i need to fetch 14051 and /bin/bash from the string

can someone help me to write an efficient regular expression for that.

i am a beginner, i wrote
string =~ /(\d+)\s+(\d+)\s+\d+\s+\d+:\d+\s+.*\s+\d+:\d+:\d+\s+(.*)\s/

i know this is not the efficient way of doing it.

Please help.

--
Posted via http://www.ruby-forum.com/.

cols = string.split
   sec, last = cols.values_at(1, -1)

Hope that helps.

James Edward Gray II

···

On Jul 10, 2007, at 3:25 PM, Divya Badrinath wrote:

Divya Badrinath wrote:

string = "root 14051 14033 3 08:39 pts/2 00:00:00 /bin/bash"

i need to fetch 14051 and /bin/bash from the string

i mean i need the 2nd column and the last column.

Hi Divya, use

string[/\s(\d+)/, 1]

see String.[]

Regards
Florian

I love regex, so it hurts me to say it, there are other ways of solving this :wink:

for instance:

string = "root 14051 14033 3 08:39 pts/2 00:00:00 /bin/bash"
number = string.split[1]
program=string.split.last

now regexes!

string = "root 14051 14033 3 08:39 pts/2 00:00:00 /bin/bash"
number=string[/[0-9]+/]
program=string[/[a-z\/]+$/]

You know you can get values out of an array with the [] operator.
Well you can get strings out of strings that same way, and it works
with regexes!

string[/[0-9]+/] will return the first match of 1 or more numbers

Here's the magic use [ ] inside of a regular expression to create your
own groups. Individual characters in there are included in the group,
and ranges may be included using the -. so a-b is
abcdefghijklmnopqrstuvwxyz.
The + afterwards means 1 or more times.
What if you want _exactly 5 consecutive numbers? use the {}
string[/[0-9]{5}/]
ranges also work here
string[/[0-9]{3-5}/] would match 3, 4 or 5 digit numbers

and
string[/[a-z\/]+$/] will match a text string containing the forward
slash at the end. The $ is a special char to represent the end of a
line, and since / is a special char itself, it needed to be escaped
with a \.

BUT it could even be easier.
the [] groups, can be negative!
/[^a]*/ would match any string that did not have an a in it
/[^ ]*/ would match any string that did not have a space in it...soo
string[/[^ ]+$/] would be a good way to get the last bit.

Ooooh fun.
so are you going to announce the winner :wink:

Paolo Negri wrote:

sorry for being OT since I'm not going to talk about ruby or regexp

If the string you're parsing is an output from the ps command you can
simplify your life using the -o option that prints only the fields you
need.

I.E. in gnu Linux

ps -ao pid,command

just outputs pid and command columns.
Be careful since the command column can contain spaces.

Paolo

i saw that too. But i can not use all the options in a ps command where
i am using.
i am limited to using ps -aef

i need to take care of fetching the stuff i need using from this result.

···

--
Posted via http://www.ruby-forum.com/.

Your regexp isn't too bad. With only a little tweaking I get this
which is not too inefficient IMHO:

irb(main):001:0> s = "root 14051 14033 3 08:39 pts/2 00:00:00 /bin/bash"
=> "root 14051 14033 3 08:39 pts/2 00:00:00 /bin/bash"
irb(main):004:0> s =~
/^\S+\s+(\d+)\s+\d+\s+\d+\s+\d+:\d+\s+\S+\s+\d+:\d+:\d+\s+(\S+)/
=> 0
irb(main):005:0> [$1, $2]
=> ["14051", "/bin/bash"]

Or you can do

irb(main):007:0> s =~ /^\S+\s+(\d+)\s+(?:\S+\s+){5}(\S+)/
=> 0
irb(main):008:0> [$1, $2]
=> ["14051", "/bin/bash"]
irb(main):011:0> /^\S+\s+(\d+)\s+(?:\S+\s+){5}(\S+)/.match(s)[1..-1]
=> ["14051", "/bin/bash"]

Kind regards

robert

···

2007/7/10, Divya Badrinath <dbadrinath@dash.net>:

Divya Badrinath wrote:
> string = "root 14051 14033 3 08:39 pts/2 00:00:00 /bin/bash"
>
> i need to fetch 14051 and /bin/bash from the string

i mean i need the 2nd column and the last column.
>
> can someone help me to write an efficient regular expression for that.
>
> i am a beginner, i wrote
> string =~ /(\d+)\s+(\d+)\s+\d+\s+\d+:\d+\s+.*\s+\d+:\d+:\d+\s+(.*)\s/
>
> i know this is not the efficient way of doing it.

Florian Aßmann schrieb:

Hi Divya, use

string[/\s(\d+)/, 1]

see String.[]

Regards
Florian

pid = string[/\s(\d+)/, 1]
cmd = string[/\s(\S+)$/, 1] # is missing

Florian Aßmann wrote:

Hi Divya, use

string[/\s(\d+)/, 1]

see String.[]

Regards
Florian

string = "root 14051 14033 3 08:39 pts/2 00:00:00 /bin/bash"
with this,
string =~ /(\d+)\s+(\d+)\s+\d+\s+\d+:\d+\s+.*\s+\d+:\d+:\d+\s+(.*)\s/

$1 gives me 14051
and
$3 gives me /bin/bash

what i am trying to do is to get $1 and $3 into a hash.

···

--
Posted via http://www.ruby-forum.com/.

> Divya Badrinath wrote:
>> string = "root 14051 14033 3 08:39 pts/2 00:00:00 /bin/bash"
>>
>> i need to fetch 14051 and /bin/bash from the string
>
> i mean i need the 2nd column and the last column.

   cols = string.split
   sec, last = cols.values_at(1, -1)

Very interesting James, I seem to be rather extreme and

  sec, last = string.split.values_at(1, -1)
might be a tad to long for one line in your style, however Ruby syntax
just supports this marvelous syntax :slight_smile:

   sec, last = string.split.
                       values_at(1, -1)

Robert

···

On 7/10/07, James Edward Gray II <james@grayproductions.net> wrote:

On Jul 10, 2007, at 3:25 PM, Divya Badrinath wrote:

Hope that helps.

James Edward Gray II

--
I always knew that one day Smalltalk would replace Java.
I just didn't know it would be called Ruby
-- Kent Beck

Ok, it was hard to beat Edward, but at least building the simplest
regular expression to do somthing like a String.split seems to faster:

#!/usr/bin/env ruby -w

···

#
# Created by Florian Aßmann on 2007-07-10.
# Copyright (c) 2007. All rights reserved.

string = "root 14051 14033 3 08:39 pts/2 00:00:00 /bin/bash"
require 'profiler'

puts <<-EOS

pid_rx = /\s(\d+)/
cmd_rx = /\s(\S+)$/
pid, cmd = string[pid_rx, 1], string[cmd_rx, 1]

EOS
Profiler__::start_profile

pid_rx = /\s(\d+)/
cmd_rx = /\s(\S+)$/
100000.times do
  pid, cmd = string[pid_rx, 1], string[cmd_rx, 1]
end

Profiler__::stop_profile
Profiler__::print_profile STDOUT

puts <<-EOS

pid, cmd = string.split.values_at(1, -1)

EOS
Profiler__::start_profile

100000.times do
  pid, cmd = string.split.values_at(1, -1)
end

Profiler__::stop_profile
Profiler__::print_profile STDOUT

puts <<-EOS

rx = Regexp.new('\S+\s(\d+).*\s(\S+$)')
pid, cmd = rx.match(string).values_at( 1, -1 )

EOS
Profiler__::start_profile

rx = Regexp.new('\S+\s(\d+).*\s(\S+$)')
100000.times do
  pid, cmd = rx.match(string).values_at( 1, -1 )
end

Profiler__::stop_profile
Profiler__::print_profile STDOUT

puts <<-EOS

rx = Regexp.new('(\S+)')
pid, cmd = rx.match(string).values_at( 1, -1 )

EOS
Profiler__::start_profile

rx = Regexp.new('(\S+)')
100000.times do
  pid, cmd = rx.match(string).values_at( 1, -1 )
end

Profiler__::stop_profile
Profiler__::print_profile STDOUT

Sincerely
Florian

It's just profiler code, you can run it yourself... But on my machine:

pid = string[/ (d+)/, 1]
cmd = string[/ (S+)$/, 1]

  % cumulative self self total
time seconds seconds calls ms/call ms/call name
58.26 0.67 0.67 1 670.00 1150.00 Integer#times
41.74 1.15 0.48 20000 0.02 0.02 String#[]
  0.00 1.15 0.00 1 0.00 1150.00 #toplevel

cols = string.split
sec, last = cols.values_at(1, -1)

  % cumulative self self total
time seconds seconds calls ms/call ms/call name
66.67 0.70 0.70 1 700.00 1050.00 Integer#times
18.10 0.89 0.19 10000 0.02 0.02 String#split
15.24 1.05 0.16 10000 0.02 0.02 Array#values_at
  0.00 1.05 0.00 1 0.00 1050.00 #toplevel

number = string.split[1]
program = string.split.last

  % cumulative self self total
time seconds seconds calls ms/call ms/call name
61.70 1.16 1.16 1 1160.00 1880.00 Integer#times
23.94 1.61 0.45 20000 0.02 0.02 String#split
  8.51 1.77 0.16 10000 0.02 0.02 Array#last
  5.85 1.88 0.11 10000 0.01 0.01 Array#[]
  0.00 1.88 0.00 1 0.00 1880.00 #toplevel

···

On 7/10/07, Kyle Schmitt <kyleaschmitt@gmail.com> wrote:

Ooooh fun.
so are you going to announce the winner :wink:

You can process the ps output using other commands.

ps -aef|tr -s ' '|cut -d ' ' -f2,8-

This will just print your second column the 8th and anything that
comes after. At this point your regexp only need to split the string
at the first space.

Paolo

···

On 11/07/07, Divya Badrinath <dbadrinath@dash.net> wrote:

Paolo Negri wrote:
> sorry for being OT since I'm not going to talk about ruby or regexp
>
> If the string you're parsing is an output from the ps command you can
> simplify your life using the -o option that prints only the fields you
> need.
>
> I.E. in gnu Linux
>
> ps -ao pid,command
>
> just outputs pid and command columns.
> Be careful since the command column can contain spaces.
>
> Paolo

i saw that too. But i can not use all the options in a ps command where
i am using.
i am limited to using ps -aef

i need to take care of fetching the stuff i need using from this result.

--
Posted via http://www.ruby-forum.com/.

What is your terminal width, 30?

···

On 7/10/07, Robert Dober <robert.dober@gmail.com> wrote:

On 7/10/07, James Edward Gray II <james@grayproductions.net> wrote:
> On Jul 10, 2007, at 3:25 PM, Divya Badrinath wrote:
>
> > Divya Badrinath wrote:
> >> string = "root 14051 14033 3 08:39 pts/2 00:00:00 /bin/bash"
> >>
> >> i need to fetch 14051 and /bin/bash from the string
> >
> > i mean i need the 2nd column and the last column.
>
> cols = string.split
> sec, last = cols.values_at(1, -1)
Very interesting James, I seem to be rather extreme and

  sec, last = string.split.values_at(1, -1)
might be a tad to long for one line in your style, however Ruby syntax
just supports this marvelous syntax :slight_smile:

   sec, last = string.split.
                       values_at(1, -1)

except that the last regexp match sh**... lol

Robert Dober wrote:

   sec, last = cols.values_at(1, -1)

Very interesting James, I seem to be rather extreme and

  sec, last = string.split.values_at(1, -1)
might be a tad to long for one line in your style, however Ruby syntax
just supports this marvelous syntax :slight_smile:

   sec, last = string.split.
                       values_at(1, -1)

Robert

cmd = string[/\s(\S+)$/, 1]
doesnt fetch me anything:)

program=string.split.last
what if
string = "root 14051 14033 3 08:39 pts/2 00:00:00 /bin/bash -x
-s"
it fetches only -s for me.
sec, last = string.split.values_at(1, -1)
doesnt work for the same reason
i need everything after 00.00.00 till the end
i.e., /bin/bash -x -s

program=string[/[a-z\/]+$/]
the command column mauy start with character. i dont want to limit it in
my regexp. it has to be generic.

with all your comments, i tried
      pid = run_process[/\s(\d+)/, 1]
      cmd = run_process[/:\d+:\d+\s(\S.*)\s$/, 1]

is there any other way?

···

On 7/10/07, James Edward Gray II <james@grayproductions.net> wrote:

--
Posted via http://www.ruby-forum.com/.

You're in trouble if any of your fields have spaces, though the
specific case of ps -aef looks safe enough.

martin

···

On 7/11/07, Paolo Negri <hungrylist@gmail.com> wrote:

You can process the ps output using other commands.

ps -aef|tr -s ' '|cut -d ' ' -f2,8-

This will just print your second column the 8th and anything that
comes after. At this point your regexp only need to split the string
at the first space.