Parsing an apache access log line

Joe_Nciri · 16 July 2007 11:59

a have a line to parse....

10.88.90.75 - - [16/Jul/2007:07:46:09 -0400] "GET /star/images/main.gif
HTTP/1.1" 200 334

anyone has a better idea... got stuck coming up with one simple regex (
the double quote...) Need to tokenize the line,

token 1 = 10.88.90.75
token 2 = -
token 3 = -
token 4 = [16/Jul/2007:07:46:09 -0400]
token 5 = "GET /star/images/main.gif HTTP/1.1"
token 6 = 200
token 7 = 234

can some one help please.

Joe.

···

--
Posted via http://www.ruby-forum.com/.

Robert_K1 · 16 July 2007 12:05

a have a line to parse....

10.88.90.75 - - [16/Jul/2007:07:46:09 -0400] "GET /star/images/main.gif
HTTP/1.1" 200 334

anyone has a better idea... got stuck coming up with one simple regex (
the double quote...) Need to tokenize the line,

token 1 = 10.88.90.75
token 2 = -
token 3 = -
token 4 = [16/Jul/2007:07:46:09 -0400]
token 5 = "GET /star/images/main.gif HTTP/1.1"
token 6 = 200
token 7 = 234

can some one help please.

Try this as a starting point:

line.scan %r{
\S+

\[[^\]]*\]
"[^"]*"

}x

(untested)

Kind regards

robert

···

2007/7/16, Joe Nciri <dev@logixcel.com>:

Jens_Wille1 · 16 July 2007 12:08

hi joe!

Joe Nciri [2007-07-16 13:59]:

a have a line to parse....

10.88.90.75 - - [16/Jul/2007:07:46:09 -0400] "GET /star/images/main.gif
HTTP/1.1" 200 334

maybe you want to have a look at log_parser:
<http://topfunky.net/svn/plugins/mint/lib/log_parser.rb>

cheers
jens

···

--
Jens Wille, Dipl.-Bibl. (FH)
prometheus - Das verteilte digitale Bildarchiv für Forschung & Lehre
Kunsthistorisches Institut der Universität zu Köln
Albertus-Magnus-Platz, D-50923 Köln
Tel.: +49 (0)221 470-6668, E-Mail: jens.wille@uni-koeln.de
http://www.prometheus-bildarchiv.de/

Aur_Saraf · 16 July 2007 12:10

/([0-9.]*) (-) (-) (\[.*\]) (\".*\") ([0-9]*) ([0-9]*)/ comes to mind,
although I'm probably wrong with the backslashes - some of the things
I escaped probably aren't significant characters and some other ones
probably are.

Could you provide a test suite with more lines?

Hey, wouldn't /(?^| )[^\S]*|\".*\")(?| )/, work to find each of the
tokens (that is, iterate it to find ALL matches)?

Aur

···

On 7/16/07, Joe Nciri <dev@logixcel.com> wrote:

a have a line to parse....

10.88.90.75 - - [16/Jul/2007:07:46:09 -0400] "GET /star/images/main.gif
HTTP/1.1" 200 334

anyone has a better idea... got stuck coming up with one simple regex (
the double quote...) Need to tokenize the line,

token 1 = 10.88.90.75
token 2 = -
token 3 = -
token 4 = [16/Jul/2007:07:46:09 -0400]
token 5 = "GET /star/images/main.gif HTTP/1.1"
token 6 = 200
token 7 = 234

can some one help please.

Joe.

--
Posted via http://www.ruby-forum.com/\.

Phil4 · 16 July 2007 14:38

Joe Nciri schrieb:

a have a line to parse....

10.88.90.75 - - [16/Jul/2007:07:46:09 -0400] "GET /star/images/main.gif
HTTP/1.1" 200 334

anyone has a better idea... got stuck coming up with one simple regex (
the double quote...) Need to tokenize the line,

token 1 = 10.88.90.75
token 2 = -
token 3 = -
token 4 = [16/Jul/2007:07:46:09 -0400]
token 5 = "GET /star/images/main.gif HTTP/1.1"
token 6 = 200
token 7 = 234

can some one help please.

Joe.

line = "10.88.90.75 - - [16/Jul/2007:07:46:09 -0400] \"GET star/images/main.gif HTTP/1.1\" 200 334"
token = /^(.*?)\s+(.*?)\s+(.*?)\s+(\[.*?\])\s+(\".*?\")\s+(\d+)\s+(\d+)$/.match(line)

=> token[1] = 10.88.90.75
    token[2] = -
    token[3] = -
    etc.

BR Phil

Robert_K1 · 16 July 2007 12:49

> a have a line to parse....
>
> 10.88.90.75 - - [16/Jul/2007:07:46:09 -0400] "GET /star/images/main.gif
> HTTP/1.1" 200 334
>
> anyone has a better idea... got stuck coming up with one simple regex (
> the double quote...) Need to tokenize the line,
>
> token 1 = 10.88.90.75
> token 2 = -
> token 3 = -
> token 4 = [16/Jul/2007:07:46:09 -0400]
> token 5 = "GET /star/images/main.gif HTTP/1.1"
> token 6 = 200
> token 7 = 234
>
> can some one help please.

Try this as a starting point:

line.scan %r{
\S+
> \[[^\]]*\]
> "[^"]*"
}x

(untested)

I think I got the order wrong. Rather do

line.scan %r{
\[[^\]]*\]

"[^"]*"
\S+

}x

Or do an explicit parse like the one Aur suggested.

Kind regards

robert

···

2007/7/16, Robert Klemme <shortcutter@googlemail.com>:

2007/7/16, Joe Nciri <dev@logixcel.com>:

Topic		Replies	Views
New to ruby ruby-talk	14	105	21 December 2007
(noob) need help parsing Apache log file ruby-talk	5	153	3 March 2004
Help with the following program please ruby-talk	7	127	7 February 2013
Another(!) Newbie question ruby-talk	4	42	20 April 2005
Regex extraction ruby-talk	3	63	16 December 2004

Parsing an apache access log line

Related Topics