Regular Expression question

#1

Hi,

How do I extract CYZ-EdEi8QBzG2f3m7prHw99 from the sessionId text
below. The reg expression I have is partially working. It returns CYZ
but I want the whole string CYZ-EdEi8QBzG2f3m7prHw99
Your help is appreciated. I've used RegEx coach tool but didn't come
up with a good expression. Thanks.

irb(main):100:0> puts sessionId
<FRAMESET border=0 frameSpacing=0 rows=0,* frameBorder=0><FRAME
border=0 name=re
lay marginWidth=0 marginHeight=0 src="/include/blank.html"
frameBorder="" noResi
ze scrolling=no><FRAME border=0 name=main marginWidth=0 marginHeight=0
src="/ara
neae/Analysis/home?common.sessionId=CYZ-EdEi8QBzG2f3m7prHw99"
frameBorder=0 scro
lling=no></FRAMESET>
=> nil
irb(main):101:0> sessionId2 = sessionId[/sessionId=\w+/]
=> "sessionId=CYZ"

(David A. Black) #2

Hi --

路路路

On Fri, 12 Aug 2005 tuyet.ctn@mscibarra.com wrote:

Hi,

How do I extract CYZ-EdEi8QBzG2f3m7prHw99 from the sessionId text
below. The reg expression I have is partially working. It returns CYZ
but I want the whole string CYZ-EdEi8QBzG2f3m7prHw99
Your help is appreciated. I've used RegEx coach tool but didn't come
up with a good expression. Thanks.

irb(main):100:0> puts sessionId
<FRAMESET border=0 frameSpacing=0 rows=0,* frameBorder=0><FRAME
border=0 name=re
lay marginWidth=0 marginHeight=0 src="/include/blank.html"
frameBorder="" noResi
ze scrolling=no><FRAME border=0 name=main marginWidth=0 marginHeight=0
src="/ara
neae/Analysis/home?common.sessionId=CYZ-EdEi8QBzG2f3m7prHw99"
frameBorder=0 scro
lling=no></FRAMESET>
=> nil
irb(main):101:0> sessionId2 = sessionId[/sessionId=\w+/]
=> "sessionId=CYZ"

The hyphen character isn't part of the \w character class. Try this:

聽聽聽sessionId2 = /sessionId=([-\w]+)/.match(str).captures[0]

(although you should probably have a test in there to make sure the
match worked). This will give you just the CYZ... part, but you can
adjust that easily.

David

--
David A. Black
dblack@wobblini.net

(W. James) #3

puts /sessionId=(.*?)"/.match(sessionId)[1]

路路路

tuyet.ctn@mscibarra.com wrote:

Hi,

How do I extract CYZ-EdEi8QBzG2f3m7prHw99 from the sessionId text
below.

<FRAMESET border=0 frameSpacing=0 rows=0,* frameBorder=0><FRAME
border=0 name=re
lay marginWidth=0 marginHeight=0 src="/include/blank.html"
frameBorder="" noResi
ze scrolling=no><FRAME border=0 name=main marginWidth=0 marginHeight=0
src="/ara
neae/Analysis/home?common.sessionId=CYZ-EdEi8QBzG2f3m7prHw99"
frameBorder=0 scro
lling=no></FRAMESET>

(Devin Mullins) #4

Some ideas:

- session_id2 = session_id.match(/sessionId=(.*)"/)[1]
- session_id2 = session_id[/sessionId=.*(?=")/].gsub(/sessionId=/,'')
- session_id2 = session_id.match(/sessionId=([\w-]*)/)[1]
(Now go find out what these things do. :slight_smile:

- google for ruby html parser
- go to http://bike-nomad.com/ruby/
- go to http://raa.ruby-lang.org/search.rhtml?search=html+pars

But I'm nowhere near an expert in regexes or HTML parsing in ruby.

Devin
And yeah, I corrected your variable names for you. :stuck_out_tongue:

路路路

tuyet.ctn@mscibarra.com wrote:

Hi,

How do I extract CYZ-EdEi8QBzG2f3m7prHw99 from the sessionId text
below. The reg expression I have is partially working. It returns CYZ
but I want the whole string CYZ-EdEi8QBzG2f3m7prHw99
Your help is appreciated. I've used RegEx coach tool but didn't come
up with a good expression. Thanks.

irb(main):100:0> puts sessionId
<FRAMESET border=0 frameSpacing=0 rows=0,* frameBorder=0><FRAME
border=0 name=re
lay marginWidth=0 marginHeight=0 src="/include/blank.html"
frameBorder="" noResi
ze scrolling=no><FRAME border=0 name=main marginWidth=0 marginHeight=0
src="/ara
neae/Analysis/home?common.sessionId=CYZ-EdEi8QBzG2f3m7prHw99"
frameBorder=0 scro
lling=no></FRAMESET>
=> nil
irb(main):101:0> sessionId2 = sessionId[/sessionId=\w+/]
=> "sessionId=CYZ"

Top post! Gawrsh, how awful! What's wrong with me? I must be a total newb l0zerf@ce fux0r.

(Florian Gross) #5

Others suggested more complex ways, but String#[] can take a capture counter with Regexps:

sid = input[/sessionId=([-\w]+)/, 1]

路路路

tuyet.ctn@mscibarra.com wrote:

How do I extract CYZ-EdEi8QBzG2f3m7prHw99 from the sessionId text
below. [...]

neae/Analysis/home?common.sessionId=CYZ-EdEi8QBzG2f3m7prHw99"

(W. James) #6

Florian Gro脽 wrote:

路路路

tuyet.ctn@mscibarra.com wrote:

> How do I extract CYZ-EdEi8QBzG2f3m7prHw99 from the sessionId text
> below. [...]
>
> neae/Analysis/home?common.sessionId=CYZ-EdEi8QBzG2f3m7prHw99"

Others suggested more complex ways, but String#[] can take a capture
counter with Regexps:

sid = input[/sessionId=([-\w]+)/, 1]

Some may find this surprising:

puts input.split(/sessionId=(.*?)"/)[1]