Rubish Way of extracting elements

Daniel_Volkerts2 · 16 August 2004 15:06

I started written a little script to analyse my syslogs. The development went on very fast, but today I'm searching the rubish way to dissect a string into some parts. For example in my syslog there is a line (valid as described in rfc3146)

<165> Aug 16 17:01:35 localhost Just a test

I was trying to reach this form

var = content

pri = 165
timestamp = Aug 16 17:01:35
device = localhost
msg = Just a test

But how do I accomplish this? I read the pickaxe book, but the example I found was about repeating values e.g. | as seperator. Is a suitable regexp the way or should use another technique e.g. String#index etc.?

Thanks for your time helping me, I'll pay it back if I become a little more rubisher

···

--
Daniel Völkerts ::
"Ich habe einen Drachen, und ich WERDE ihn benutzen!" - Esel in Shrek

Daniel_Volkerts2 · 16 August 2004 15:11

Daniel Völkerts wrote:

I started written a little script to analyse my syslogs.

I feel sorry, 'I started writting..' is the correct way.

···

--
Daniel Völkerts ::
"Ich habe einen Drachen, und ich WERDE ihn benutzen!" - Esel in Shrek

David_A_Black3 · 16 August 2004 15:48

Hi --

I started written a little script to analyse my syslogs. The development
went on very fast, but today I'm searching the rubish way to dissect a
string into some parts. For example in my syslog there is a line (valid
as described in rfc3146)

<165> Aug 16 17:01:35 localhost Just a test

I was trying to reach this form

var = content

pri = 165
timestamp = Aug 16 17:01:35
device = localhost
msg = Just a test

But how do I accomplish this? I read the pickaxe book, but the example I
found was about repeating values e.g. | as seperator. Is a suitable
regexp the way or should use another technique e.g. String#index etc.?

Thanks for your time helping me, I'll pay it back if I become a little
more rubisher

You could match it to a regular expression, and grab the results in
()-expressions:

str = "<165> Aug 16 17:01:35 localhost Just a test"

pri, timestamp, device, msg =
/<(\d+)>\s+(\w+\s+\d+\s+[\d:]+)\s+(\S+)\s+(.*)/.match(str).captures

Another way would be to use scanf. This has the advantage that you
get your 165 as an integer (if that's important):

require 'scanf'
pri, timestamp, device, msg = str.scanf("<%\d> %15c %s%*c %[\\S\\s]"

(You might have to adjust either the regex or the format string
depending on how consistent and predictable the lines are.)

David

···

On Tue, 17 Aug 2004, Daniel Völkerts wrote:

--
David A. Black
dblack@wobblini.net

Charles_Mills1 · 16 August 2004 15:49

I started written a little script to analyse my syslogs. The development went on very fast, but today I'm searching the rubish way to dissect a string into some parts. For example in my syslog there is a line (valid as described in rfc3146)

<165> Aug 16 17:01:35 localhost Just a test

I was trying to reach this form

var = content

pri = 165
timestamp = Aug 16 17:01:35
device = localhost
msg = Just a test

But how do I accomplish this? I read the pickaxe book, but the example I found was about repeating values e.g. | as seperator. Is a suitable regexp the way or should use another technique e.g. String#index etc.?

Probably use regular expressions. You could have one big regexp or one for each field like so:
var =~ /<([0-9]+)>/
pri = $1
$' =~ /some regexp/ # I'm lazy
timestamp = $1
# etc
You could also use \A along with the post match ($') to make sure the fields come in the order you expect.
-Charlie

···

On Aug 16, 2004, at 8:06 AM, Daniel Völkerts wrote:

Thanks for your time helping me, I'll pay it back if I become a little more rubisher

--
Daniel Völkerts ::
"Ich habe einen Drachen, und ich WERDE ihn benutzen!" - Esel in Shrek

Florian_Gross · 16 August 2004 15:56

Daniel Völkerts wrote:

<165> Aug 16 17:01:35 localhost Just a test
I was trying to reach this form

var = content

pri = 165
timestamp = Aug 16 17:01:35
device = localhost
msg = Just a test

This ought to work, but there might be other ways to do this:

if md = /^<(\d+)> (\S+ \d+ \d+:\d+:\d+) (\S+) (.*?)$/.match(text)
pri, timestamp, device, msg = *md.captures
# Do something with the captures
end

Regards,
Florian Gross

Daniel_Volkerts2 · 16 August 2004 16:01

Daniel Völkerts wrote:

I feel sorry, 'I started writting..' is the correct way.

What the hell, writting is also wrong, tzzz. Too much caffeine in my head!

After I posted the above thread I have written this line

pri,timestamp,device,msg = aMsg.scan(/<\d{1,5}>|\w{3,} \d\d \d\d:\d\d:\d\d|\w+/)

Is this the right way? Please feel free to post comments. I'll looking for it to improve my ruby skills.

···

--
Daniel Völkerts ::
"Ich habe einen Drachen, und ich WERDE ihn benutzen!" - Esel in Shrek

Daniel_Volkerts2 · 16 August 2004 16:06

David A. Black wrote:

You could match it to a regular expression, and grab the results in
()-expressions:

  str = "<165> Aug 16 17:01:35 localhost Just a test"

  pri, timestamp, device, msg =
  /<(\d+)>\s+(\w+\s+\d+\s+[\d:]+)\s+(\S+)\s+(.*)/.match(str).captures

Another way would be to use scanf. This has the advantage that you
get your 165 as an integer (if that's important):

  require 'scanf'
  pri, timestamp, device, msg = str.scanf("<%\d> %15c %s%*c %[\\S\\s]"

(You might have to adjust either the regex or the format string
depending on how consistent and predictable the lines are.)

Thanks a lot. Thats the way I would expect it. Simple and nice to understand. I'll try it.

Many regards.

···

--
Daniel Völkerts ::
"Ich habe einen Drachen, und ich WERDE ihn benutzen!" - Esel in Shrek

Robert · 16 August 2004 18:31

"Florian Gross" <flgr@ccan.de> schrieb im Newsbeitrag
news:2oc3f4F88pl1U1@uni-berlin.de...

Daniel Völkerts wrote:

> <165> Aug 16 17:01:35 localhost Just a test
> I was trying to reach this form
>
> var = content
>
> pri = 165
> timestamp = Aug 16 17:01:35
> device = localhost
> msg = Just a test

This ought to work, but there might be other ways to do this:

if md = /^<(\d+)> (\S+ \d+ \d+:\d+:\d+) (\S+) (.*?)$/.match(text)
pri, timestamp, device, msg = *md.captures
# Do something with the captures
end

Some more admittedly ugly constructions:

val = "<165> Aug 16 17:01:35 localhost Just a test"
unless ( ( pri, timestamp, device, msg = * /^<(\d+)> \s+ (\S+ \s+ \d+ \s+
\d+:\d+:\d+) \s+ (\S+) \s+ (.*)$/x.match(val).to_a ).empty? )
puts "matched"
end

pri, timestamp, device, msg = * /^<(\d+)> \s+ (\S+ \s+ \d+ \s+ \d+:\d+:\d+)
\s+ (\S+) \s+ (.*)$/x.match(val).to_a
if pri
puts "matched"
end

LOG_RX = /^<(\d+)> \s+ (\S+ \s+ \d+ \s+ \d+:\d+:\d+) \s+ (\S+) \s+ (.*)$/x

unless ( ( pri, timestamp, device, msg = * LOG_RX.match(val).to_a ).empty? )
puts "matched"
end

if ( line, pri, timestamp, device, msg = * /^<(\d+)> \s+ (\S+ \s+ \d+ \s+
\d+:\d+:\d+) \s+ (\S+) \s+ (.*)$/x.match(val) ) && line
puts "matched"
end

if ( line, pri, timestamp, device, msg = * LOG_RX.match(val) ) && line
puts "matched"
end

robert

Daniel_Volkerts2 · 16 August 2004 20:20

Robert Klemme wrote:

Some more admittedly ugly constructions:

val = "<165> Aug 16 17:01:35 localhost Just a test"
unless ( ( pri, timestamp, device, msg = * /^<(\d+)> \s+ (\S+ \s+ \d+ \s+
\d+:\d+:\d+) \s+ (\S+) \s+ (.*)$/x.match(val).to_a ).empty? )
  puts "matched"
end

pri, timestamp, device, msg = * /^<(\d+)> \s+ (\S+ \s+ \d+ \s+ \d+:\d+:\d+)
\s+ (\S+) \s+ (.*)$/x.match(val).to_a
if pri
  puts "matched"
end

LOG_RX = /^<(\d+)> \s+ (\S+ \s+ \d+ \s+ \d+:\d+:\d+) \s+ (\S+) \s+ (.*)$/x

unless ( ( pri, timestamp, device, msg = * LOG_RX.match(val).to_a ).empty? )
  puts "matched"
end

if ( line, pri, timestamp, device, msg = * /^<(\d+)> \s+ (\S+ \s+ \d+ \s+
\d+:\d+:\d+) \s+ (\S+) \s+ (.*)$/x.match(val) ) && line
  puts "matched"
end

if ( line, pri, timestamp, device, msg = * LOG_RX.match(val) ) && line
  puts "matched"
end

    robert

*boom* That blow my mind away! No no, thanks a lot for that piece of code.

But I prefer the scanf and one-line-regexp.

I'll test which kind performs better for my needs. As I said, I'm a ruby newbie and personal programming rule is: keep it simple! I've to understand the things I wrote.

If the point is reached where my little script becomes interesting for others than me, I'll post an [Ann] thread.

Bye,

···

--
Daniel Völkerts ::
"Ich habe einen Drachen, und ich WERDE ihn benutzen!" - Esel in Shrek

Robert · 17 August 2004 07:55

"Daniel Völkerts" <dvoelkerts@web.de> schrieb im Newsbeitrag
news:cfr4ml$d13$00$1@news.t-online.com...

Robert Klemme wrote:

> Some more admittedly ugly constructions:
>
> val = "<165> Aug 16 17:01:35 localhost Just a test"

(1) This one converts the RX MatchData into an array and tests for emptyness
to determine whether it matched. And along the way values are assigned to
local vars.

> unless ( ( pri, timestamp, device, msg = * /^<(\d+)> \s+ (\S+ \s+ \d+

\s+

> \d+:\d+:\d+) \s+ (\S+) \s+ (.*)$/x.match(val).to_a ).empty? )
> puts "matched"
> end

(2) Similar, but now just one local var is used as match check: if "pri" is
not nil, the RX matched.

> pri, timestamp, device, msg = * /^<(\d+)> \s+ (\S+ \s+ \d+ \s+

\d+:\d+:\d+)

> \s+ (\S+) \s+ (.*)$/x.match(val).to_a
> if pri
> puts "matched"
> end

(3) Same approach as (1) but the regexp is defined as a constant to make
stuff more readable.

> LOG_RX = /^<(\d+)> \s+ (\S+ \s+ \d+ \s+ \d+:\d+:\d+) \s+ (\S+) \s+

(.*)$/x

>
> unless ( ( pri, timestamp, device, msg = *

LOG_RX.match(val).to_a ).empty? )

> puts "matched"
> end

(4) Similar approach to (2) but the test is included ("&& line"). Note that
this time no conversion to array is done here so we need the additional
local "line" to receive the complete capture.

> if ( line, pri, timestamp, device, msg = * /^<(\d+)> \s+ (\S+ \s+ \d+

\s+

> \d+:\d+:\d+) \s+ (\S+) \s+ (.*)$/x.match(val) ) && line
> puts "matched"
> end

(5) Same as (4) but with regexp in constant as in (3).

> if ( line, pri, timestamp, device, msg = * LOG_RX.match(val) ) && line
> puts "matched"
> end
>
>
>
> robert
>

*boom* That blow my mind away! No no, thanks a lot for that piece of code.

I *should've* put some comments in... Ok, inserting them above now.

But I prefer the scanf and one-line-regexp.

Basically I used extended regular expressions (switched by the "/x" flag).
Whitespace is ignored, that's why you see more "\s+" in there. And that's
why the regexp is longer.

I'll test which kind performs better for my needs. As I said, I'm a ruby
newbie and personal programming rule is: keep it simple! I've to
understand the things I wrote.

That's an excellent road to walk down! Handcrafted, simple code is better
than a mindless copy of something found somewhere.

Kind regards

robert

Daniel_Volkerts2 · 17 August 2004 14:40

Hi Robert,

thank you very very much for your short lesson. It's very intresting and I'll see how I can profite from these information.

Ruby becomes more and more usable for me (normally my language of choice is java but for such little scripts ruby is a great of fun!).

Bye,

···

--
Daniel Völkerts ::
"Ich habe einen Drachen, und ich WERDE ihn benutzen!" - Esel in Shrek

Dany_Cayouette · 18 August 2004 15:35

Thanks for taking the time to put in explainations. Also a ruby newbie that never wrote anything useful yet, but started to follow this list a bit. Always look forward to your posting since I'm sure you'll put some line of code I won't understand.. Part of my learning is 'trying' to understand them. Thanks for the extra hand on this one!

Dany

···

On Tue, 17 Aug 2004 09:52:32 +0200 "Robert Klemme" <bob.news@gmx.net> wrote:

"Daniel Völkerts" <dvoelkerts@web.de> schrieb im Newsbeitrag
news:cfr4ml$d13$00$1@news.t-online.com...
> Robert Klemme wrote:
>
> > Some more admittedly ugly constructions:
> >
> > val = "<165> Aug 16 17:01:35 localhost Just a test"

(1) This one converts the RX MatchData into an array and tests for emptyness
to determine whether it matched. And along the way values are assigned to
local vars.

> > unless ( ( pri, timestamp, device, msg = * /^<(\d+)> \s+ (\S+ \s+ \d+
\s+
> > \d+:\d+:\d+) \s+ (\S+) \s+ (.*)$/x.match(val).to_a ).empty? )
> > puts "matched"
> > end

(2) Similar, but now just one local var is used as match check: if "pri" is
not nil, the RX matched.

> > pri, timestamp, device, msg = * /^<(\d+)> \s+ (\S+ \s+ \d+ \s+
\d+:\d+:\d+)
> > \s+ (\S+) \s+ (.*)$/x.match(val).to_a
> > if pri
> > puts "matched"
> > end

(3) Same approach as (1) but the regexp is defined as a constant to make
stuff more readable.

> > LOG_RX = /^<(\d+)> \s+ (\S+ \s+ \d+ \s+ \d+:\d+:\d+) \s+ (\S+) \s+
(.*)$/x
> >
> > unless ( ( pri, timestamp, device, msg = *
LOG_RX.match(val).to_a ).empty? )
> > puts "matched"
> > end

(4) Similar approach to (2) but the test is included ("&& line"). Note that
this time no conversion to array is done here so we need the additional
local "line" to receive the complete capture.

> > if ( line, pri, timestamp, device, msg = * /^<(\d+)> \s+ (\S+ \s+ \d+
\s+
> > \d+:\d+:\d+) \s+ (\S+) \s+ (.*)$/x.match(val) ) && line
> > puts "matched"
> > end

(5) Same as (4) but with regexp in constant as in (3).

> > if ( line, pri, timestamp, device, msg = * LOG_RX.match(val) ) && line
> > puts "matched"
> > end
> >
> >
> >
> > robert
> >
>
> *boom* That blow my mind away! No no, thanks a lot for that piece of code.

I *should've* put some comments in... Ok, inserting them above now.

> But I prefer the scanf and one-line-regexp.

Basically I used extended regular expressions (switched by the "/x" flag).
Whitespace is ignored, that's why you see more "\s+" in there. And that's
why the regexp is longer.

> I'll test which kind performs better for my needs. As I said, I'm a ruby
> newbie and personal programming rule is: keep it simple! I've to
> understand the things I wrote.

That's an excellent road to walk down! Handcrafted, simple code is better
than a mindless copy of something found somewhere.

Kind regards

robert

Robert · 18 August 2004 10:20

"Daniel Völkerts" <dvoelkerts@web.de> schrieb im Newsbeitrag
news:cft54c$872$02$1@news.t-online.com...

Hi Robert,

thank you very very much for your short lesson. It's very intresting and
I'll see how I can profite from these information.

I'm glad I could be of any help.

Ruby becomes more and more usable for me (normally my language of choice
is java but for such little scripts ruby is a great of fun!).

Same here. I even use Ruby sometimes to manipulate Java code or search
through piles of Java code...

Daniel Völkerts ::
"Ich habe einen Drachen, und ich WERDE ihn benutzen!" - Esel in Shrek

Ohooo... *shake in fear*

Kind regards

robert

Topic		Replies	Views
Ruby regexpresion ruby-talk	6	132	17 September 2010
Regexp Parsing -- What's the right way? ruby-talk	6	133	12 August 2006
Help: Efficient regular expression ruby-talk	26	128	11 July 2007
String manuplation example ruby-talk	14	92	28 July 2011
Suggestion for string parsing ruby-talk	17	99	19 September 2008

Rubish Way of extracting elements

Related topics