Rubish Way of extracting elements

I started written a little script to analyse my syslogs. The development went on very fast, but today I'm searching the rubish way to dissect a string into some parts. For example in my syslog there is a line (valid as described in rfc3146)

<165> Aug 16 17:01:35 localhost Just a test

I was trying to reach this form

var = content

pri = 165
timestamp = Aug 16 17:01:35
device = localhost
msg = Just a test

But how do I accomplish this? I read the pickaxe book, but the example I found was about repeating values e.g. | as seperator. Is a suitable regexp the way or should use another technique e.g. String#index etc.?

Thanks for your time helping me, I'll pay it back if I become a little more rubisher :wink:

···

--
Daniel Völkerts ::
"Ich habe einen Drachen, und ich WERDE ihn benutzen!" - Esel in Shrek

Daniel Völkerts wrote:

I started written a little script to analyse my syslogs.

I feel sorry, 'I started writting..' is the correct way.

···

--
Daniel Völkerts ::
"Ich habe einen Drachen, und ich WERDE ihn benutzen!" - Esel in Shrek

Hi --

I started written a little script to analyse my syslogs. The development
  went on very fast, but today I'm searching the rubish way to dissect a
string into some parts. For example in my syslog there is a line (valid
as described in rfc3146)

<165> Aug 16 17:01:35 localhost Just a test

I was trying to reach this form

var = content

pri = 165
timestamp = Aug 16 17:01:35
device = localhost
msg = Just a test

But how do I accomplish this? I read the pickaxe book, but the example I
found was about repeating values e.g. | as seperator. Is a suitable
regexp the way or should use another technique e.g. String#index etc.?

Thanks for your time helping me, I'll pay it back if I become a little
more rubisher :wink:

You could match it to a regular expression, and grab the results in
()-expressions:

  str = "<165> Aug 16 17:01:35 localhost Just a test"

  pri, timestamp, device, msg =
  /<(\d+)>\s+(\w+\s+\d+\s+[\d:]+)\s+(\S+)\s+(.*)/.match(str).captures

Another way would be to use scanf. This has the advantage that you
get your 165 as an integer (if that's important):

  require 'scanf'
  pri, timestamp, device, msg = str.scanf("<%\d> %15c %s%*c %[\\S\\s]"

(You might have to adjust either the regex or the format string
depending on how consistent and predictable the lines are.)

David

···

On Tue, 17 Aug 2004, Daniel Völkerts wrote:

--
David A. Black
dblack@wobblini.net

I started written a little script to analyse my syslogs. The development went on very fast, but today I'm searching the rubish way to dissect a string into some parts. For example in my syslog there is a line (valid as described in rfc3146)

<165> Aug 16 17:01:35 localhost Just a test

I was trying to reach this form

var = content

pri = 165
timestamp = Aug 16 17:01:35
device = localhost
msg = Just a test

But how do I accomplish this? I read the pickaxe book, but the example I found was about repeating values e.g. | as seperator. Is a suitable regexp the way or should use another technique e.g. String#index etc.?

Probably use regular expressions. You could have one big regexp or one for each field like so:
var =~ /<([0-9]+)>/
pri = $1
$' =~ /some regexp/ # I'm lazy
timestamp = $1
# etc
You could also use \A along with the post match ($') to make sure the fields come in the order you expect.
-Charlie

···

On Aug 16, 2004, at 8:06 AM, Daniel Völkerts wrote:

Thanks for your time helping me, I'll pay it back if I become a little more rubisher :wink:

--
Daniel Völkerts ::
"Ich habe einen Drachen, und ich WERDE ihn benutzen!" - Esel in Shrek

Daniel Völkerts wrote:

<165> Aug 16 17:01:35 localhost Just a test
I was trying to reach this form

var = content

pri = 165
timestamp = Aug 16 17:01:35
device = localhost
msg = Just a test

This ought to work, but there might be other ways to do this:

if md = /^<(\d+)> (\S+ \d+ \d+:\d+:\d+) (\S+) (.*?)$/.match(text)
   pri, timestamp, device, msg = *md.captures
   # Do something with the captures
end

Regards,
Florian Gross

Daniel Völkerts wrote:

I feel sorry, 'I started writting..' is the correct way.

What the hell, writting is also wrong, tzzz. Too much caffeine in my head!

After I posted the above thread I have written this line

pri,timestamp,device,msg = aMsg.scan(/<\d{1,5}>|\w{3,} \d\d \d\d:\d\d:\d\d|\w+/)

Is this the right way? Please feel free to post comments. I'll looking for it to improve my ruby skills.

···

--
Daniel Völkerts ::
"Ich habe einen Drachen, und ich WERDE ihn benutzen!" - Esel in Shrek

David A. Black wrote:

You could match it to a regular expression, and grab the results in
()-expressions:

  str = "<165> Aug 16 17:01:35 localhost Just a test"

  pri, timestamp, device, msg =
  /<(\d+)>\s+(\w+\s+\d+\s+[\d:]+)\s+(\S+)\s+(.*)/.match(str).captures

Another way would be to use scanf. This has the advantage that you
get your 165 as an integer (if that's important):

  require 'scanf'
  pri, timestamp, device, msg = str.scanf("<%\d> %15c %s%*c %[\\S\\s]"

(You might have to adjust either the regex or the format string
depending on how consistent and predictable the lines are.)

Thanks a lot. Thats the way I would expect it. Simple and nice to understand. I'll try it.

Many regards.

···

--
Daniel Völkerts ::
"Ich habe einen Drachen, und ich WERDE ihn benutzen!" - Esel in Shrek

"Florian Gross" <flgr@ccan.de> schrieb im Newsbeitrag
news:2oc3f4F88pl1U1@uni-berlin.de...

Daniel Völkerts wrote:

> <165> Aug 16 17:01:35 localhost Just a test
> I was trying to reach this form
>
> var = content
>
> pri = 165
> timestamp = Aug 16 17:01:35
> device = localhost
> msg = Just a test

This ought to work, but there might be other ways to do this:

if md = /^<(\d+)> (\S+ \d+ \d+:\d+:\d+) (\S+) (.*?)$/.match(text)
   pri, timestamp, device, msg = *md.captures
   # Do something with the captures
end

Some more admittedly ugly constructions:

val = "<165> Aug 16 17:01:35 localhost Just a test"
unless ( ( pri, timestamp, device, msg = * /^<(\d+)> \s+ (\S+ \s+ \d+ \s+
\d+:\d+:\d+) \s+ (\S+) \s+ (.*)$/x.match(val).to_a ).empty? )
  puts "matched"
end

pri, timestamp, device, msg = * /^<(\d+)> \s+ (\S+ \s+ \d+ \s+ \d+:\d+:\d+)
\s+ (\S+) \s+ (.*)$/x.match(val).to_a
if pri
  puts "matched"
end

LOG_RX = /^<(\d+)> \s+ (\S+ \s+ \d+ \s+ \d+:\d+:\d+) \s+ (\S+) \s+ (.*)$/x

unless ( ( pri, timestamp, device, msg = * LOG_RX.match(val).to_a ).empty? )
  puts "matched"
end

if ( line, pri, timestamp, device, msg = * /^<(\d+)> \s+ (\S+ \s+ \d+ \s+
\d+:\d+:\d+) \s+ (\S+) \s+ (.*)$/x.match(val) ) && line
  puts "matched"
end

if ( line, pri, timestamp, device, msg = * LOG_RX.match(val) ) && line
  puts "matched"
end

:slight_smile:

    robert

Robert Klemme wrote:

Some more admittedly ugly constructions:

val = "<165> Aug 16 17:01:35 localhost Just a test"
unless ( ( pri, timestamp, device, msg = * /^<(\d+)> \s+ (\S+ \s+ \d+ \s+
\d+:\d+:\d+) \s+ (\S+) \s+ (.*)$/x.match(val).to_a ).empty? )
  puts "matched"
end

pri, timestamp, device, msg = * /^<(\d+)> \s+ (\S+ \s+ \d+ \s+ \d+:\d+:\d+)
\s+ (\S+) \s+ (.*)$/x.match(val).to_a
if pri
  puts "matched"
end

LOG_RX = /^<(\d+)> \s+ (\S+ \s+ \d+ \s+ \d+:\d+:\d+) \s+ (\S+) \s+ (.*)$/x

unless ( ( pri, timestamp, device, msg = * LOG_RX.match(val).to_a ).empty? )
  puts "matched"
end

if ( line, pri, timestamp, device, msg = * /^<(\d+)> \s+ (\S+ \s+ \d+ \s+
\d+:\d+:\d+) \s+ (\S+) \s+ (.*)$/x.match(val) ) && line
  puts "matched"
end

if ( line, pri, timestamp, device, msg = * LOG_RX.match(val) ) && line
  puts "matched"
end

:slight_smile:

    robert

*boom* That blow my mind away! No no, thanks a lot for that piece of code.

But I prefer the scanf and one-line-regexp.

I'll test which kind performs better for my needs. As I said, I'm a ruby newbie and personal programming rule is: keep it simple! :wink: I've to understand the things I wrote.

If the point is reached where my little script becomes interesting for others than me, I'll post an [Ann] thread.

Bye,

···

--
Daniel Völkerts ::
"Ich habe einen Drachen, und ich WERDE ihn benutzen!" - Esel in Shrek

"Daniel Völkerts" <dvoelkerts@web.de> schrieb im Newsbeitrag
news:cfr4ml$d13$00$1@news.t-online.com...

Robert Klemme wrote:

> Some more admittedly ugly constructions:
>
> val = "<165> Aug 16 17:01:35 localhost Just a test"

(1) This one converts the RX MatchData into an array and tests for emptyness
to determine whether it matched. And along the way values are assigned to
local vars.

> unless ( ( pri, timestamp, device, msg = * /^<(\d+)> \s+ (\S+ \s+ \d+

\s+

> \d+:\d+:\d+) \s+ (\S+) \s+ (.*)$/x.match(val).to_a ).empty? )
> puts "matched"
> end

(2) Similar, but now just one local var is used as match check: if "pri" is
not nil, the RX matched.

> pri, timestamp, device, msg = * /^<(\d+)> \s+ (\S+ \s+ \d+ \s+

\d+:\d+:\d+)

> \s+ (\S+) \s+ (.*)$/x.match(val).to_a
> if pri
> puts "matched"
> end

(3) Same approach as (1) but the regexp is defined as a constant to make
stuff more readable.

> LOG_RX = /^<(\d+)> \s+ (\S+ \s+ \d+ \s+ \d+:\d+:\d+) \s+ (\S+) \s+

(.*)$/x

>
> unless ( ( pri, timestamp, device, msg = *

LOG_RX.match(val).to_a ).empty? )

> puts "matched"
> end

(4) Similar approach to (2) but the test is included ("&& line"). Note that
this time no conversion to array is done here so we need the additional
local "line" to receive the complete capture.

> if ( line, pri, timestamp, device, msg = * /^<(\d+)> \s+ (\S+ \s+ \d+

\s+

> \d+:\d+:\d+) \s+ (\S+) \s+ (.*)$/x.match(val) ) && line
> puts "matched"
> end

(5) Same as (4) but with regexp in constant as in (3).

> if ( line, pri, timestamp, device, msg = * LOG_RX.match(val) ) && line
> puts "matched"
> end
>
> :slight_smile:
>
> robert
>

*boom* That blow my mind away! No no, thanks a lot for that piece of code.

:slight_smile: I *should've* put some comments in... Ok, inserting them above now.

But I prefer the scanf and one-line-regexp.

Basically I used extended regular expressions (switched by the "/x" flag).
Whitespace is ignored, that's why you see more "\s+" in there. And that's
why the regexp is longer.

I'll test which kind performs better for my needs. As I said, I'm a ruby
newbie and personal programming rule is: keep it simple! :wink: I've to
understand the things I wrote.

That's an excellent road to walk down! Handcrafted, simple code is better
than a mindless copy of something found somewhere.

Kind regards

    robert

Hi Robert,

thank you very very much for your short lesson. It's very intresting and I'll see how I can profite from these information.

Ruby becomes more and more usable for me (normally my language of choice is java but for such little scripts ruby is a great of fun!).

Bye,

···

--
Daniel Völkerts ::
"Ich habe einen Drachen, und ich WERDE ihn benutzen!" - Esel in Shrek

Thanks for taking the time to put in explainations. Also a ruby newbie that never wrote anything useful yet, but started to follow this list a bit. Always look forward to your posting since I'm sure you'll put some line of code I won't understand.. :wink: Part of my learning is 'trying' to understand them. Thanks for the extra hand on this one!

Dany

···

On Tue, 17 Aug 2004 09:52:32 +0200 "Robert Klemme" <bob.news@gmx.net> wrote:

"Daniel Völkerts" <dvoelkerts@web.de> schrieb im Newsbeitrag
news:cfr4ml$d13$00$1@news.t-online.com...
> Robert Klemme wrote:
>
> > Some more admittedly ugly constructions:
> >
> > val = "<165> Aug 16 17:01:35 localhost Just a test"

(1) This one converts the RX MatchData into an array and tests for emptyness
to determine whether it matched. And along the way values are assigned to
local vars.

> > unless ( ( pri, timestamp, device, msg = * /^<(\d+)> \s+ (\S+ \s+ \d+
\s+
> > \d+:\d+:\d+) \s+ (\S+) \s+ (.*)$/x.match(val).to_a ).empty? )
> > puts "matched"
> > end

(2) Similar, but now just one local var is used as match check: if "pri" is
not nil, the RX matched.

> > pri, timestamp, device, msg = * /^<(\d+)> \s+ (\S+ \s+ \d+ \s+
\d+:\d+:\d+)
> > \s+ (\S+) \s+ (.*)$/x.match(val).to_a
> > if pri
> > puts "matched"
> > end

(3) Same approach as (1) but the regexp is defined as a constant to make
stuff more readable.

> > LOG_RX = /^<(\d+)> \s+ (\S+ \s+ \d+ \s+ \d+:\d+:\d+) \s+ (\S+) \s+
(.*)$/x
> >
> > unless ( ( pri, timestamp, device, msg = *
LOG_RX.match(val).to_a ).empty? )
> > puts "matched"
> > end

(4) Similar approach to (2) but the test is included ("&& line"). Note that
this time no conversion to array is done here so we need the additional
local "line" to receive the complete capture.

> > if ( line, pri, timestamp, device, msg = * /^<(\d+)> \s+ (\S+ \s+ \d+
\s+
> > \d+:\d+:\d+) \s+ (\S+) \s+ (.*)$/x.match(val) ) && line
> > puts "matched"
> > end

(5) Same as (4) but with regexp in constant as in (3).

> > if ( line, pri, timestamp, device, msg = * LOG_RX.match(val) ) && line
> > puts "matched"
> > end
> >
> > :slight_smile:
> >
> > robert
> >
>
> *boom* That blow my mind away! No no, thanks a lot for that piece of code.

:slight_smile: I *should've* put some comments in... Ok, inserting them above now.

> But I prefer the scanf and one-line-regexp.

Basically I used extended regular expressions (switched by the "/x" flag).
Whitespace is ignored, that's why you see more "\s+" in there. And that's
why the regexp is longer.

> I'll test which kind performs better for my needs. As I said, I'm a ruby
> newbie and personal programming rule is: keep it simple! :wink: I've to
> understand the things I wrote.

That's an excellent road to walk down! Handcrafted, simple code is better
than a mindless copy of something found somewhere.

Kind regards

    robert

"Daniel Völkerts" <dvoelkerts@web.de> schrieb im Newsbeitrag
news:cft54c$872$02$1@news.t-online.com...

Hi Robert,

thank you very very much for your short lesson. It's very intresting and
I'll see how I can profite from these information.

I'm glad I could be of any help.

Ruby becomes more and more usable for me (normally my language of choice
is java but for such little scripts ruby is a great of fun!).

Same here. I even use Ruby sometimes to manipulate Java code or search
through piles of Java code... :slight_smile:

Daniel Völkerts ::
"Ich habe einen Drachen, und ich WERDE ihn benutzen!" - Esel in Shrek

Ohooo... *shake in fear*
:slight_smile:

Kind regards

    robert