Parsing query parameters from hyperlink

lrlebron · 1 September 2007 17:35

I am trying to parse strings like this
<a href='showmono.asp?cpnum=1022&ampmonotype=comb' target='main'>

I need to get the cpnum value (555)

I am using the following function

def get_drugId(link)
     arrParts = link.html.split('?')
     cpnum = arrParts[1].split('&amp')
     cpnumparts= cpnum[0].split("=")
     drugId = cpnumparts[1]
  end

but I imagine there is a simpler way to do this. Also, I would like
something more flexible that would return all the query parameters (if
there are more than one) in an array or a hash.

Any ideas?

thanks,

Luis

Robert_K1 · 1 September 2007 19:00

The std lib:

require 'uri'

irb(main):006:0> u=URI.parse("http://foo/bar?dodo=1&dada=2"\)
=> #<URI::HTTP:0x3ff9814a URL:http://foo/bar?dodo=1&dada=2>
irb(main):007:0> u.query
=> "dodo=1&dada=2"
irb(main):008:0> u.query.split('&')
=> ["dodo=1", "dada=2"]
...

robert

···

On 01.09.2007 19:34, lrlebron@gmail.com wrote:

I am trying to parse strings like this
<a href='showmono.asp?cpnum=1022&ampmonotype=comb' target='main'>

I need to get the cpnum value (555)

I am using the following function

def get_drugId(link)
     arrParts = link.html.split('?')
     cpnum = arrParts[1].split('&amp')
     cpnumparts= cpnum[0].split("=")
     drugId = cpnumparts[1]
  end

but I imagine there is a simpler way to do this. Also, I would like
something more flexible that would return all the query parameters (if
there are more than one) in an array or a hash.

Any ideas?

Aaron_Patterson1 · 1 September 2007 19:15

Query strings are allowed to use semicolons as delimeters, not to
mention you must handle multiple values per key. I recommend using the
CGI library with the URI library:

  irb(main):001:0> require 'uri'
  => true
  irb(main):002:0> require 'cgi'
  => true
  irb(main):003:0> CGI.parse(URI.parse('http://foo/?a=b&b=c'\).query)
  => {"a"=>["b"], "b"=>["c"]}
  irb(main):004:0> CGI.parse(URI.parse('http://foo/?a=b;b=c'\).query)
  => {"a"=>["b"], "b"=>["c"]}
  irb(main):005:0> CGI.parse(URI.parse('http://foo/?b=a;b=c'\).query)
  => {"b"=>["a", "c"]}
  irb(main):006:0>

···

On Sun, Sep 02, 2007 at 04:00:20AM +0900, Robert Klemme wrote:

On 01.09.2007 19:34, lrlebron@gmail.com wrote:
>I am trying to parse strings like this
><a href='showmono.asp?cpnum=1022&ampmonotype=comb' target='main'>
>
>I need to get the cpnum value (555)
>
>I am using the following function
>
>def get_drugId(link)
> arrParts = link.html.split('?')
> cpnum = arrParts[1].split('&amp')
> cpnumparts= cpnum[0].split("=")
> drugId = cpnumparts[1]
> end
>
>but I imagine there is a simpler way to do this. Also, I would like
>something more flexible that would return all the query parameters (if
>there are more than one) in an array or a hash.
>
>Any ideas?

The std lib:

require 'uri'

irb(main):006:0> u=URI.parse("http://foo/bar?dodo=1&dada=2"\)
=> #<URI::HTTP:0x3ff9814a URL:http://foo/bar?dodo=1&dada=2>
irb(main):007:0> u.query
=> "dodo=1&dada=2"
irb(main):008:0> u.query.split('&')
=> ["dodo=1", "dada=2"]
...

--
Aaron Patterson
http://tenderlovemaking.com/

lrlebron · 1 September 2007 19:30

This would work if the string where a proper url. But it is a
hyperlink.

···

On Sep 1, 2:15 pm, Aaron Patterson <aa...@tenderlovemaking.com> wrote:

On Sun, Sep 02, 2007 at 04:00:20AM +0900, Robert Klemme wrote:
> On 01.09.2007 19:34, lrleb...@gmail.com wrote:
> >I am trying to parse strings like this
> ><a href='showmono.asp?cpnum=1022&ampmonotype=comb' target='main'>

> >I need to get the cpnum value (555)

> >I am using the following function

> >def get_drugId(link)
> > arrParts = link.html.split('?')
> > cpnum = arrParts[1].split('&amp')
> > cpnumparts= cpnum[0].split("=")
> > drugId = cpnumparts[1]
> > end

> >but I imagine there is a simpler way to do this. Also, I would like
> >something more flexible that would return all the query parameters (if
> >there are more than one) in an array or a hash.

> >Any ideas?

> The std lib:

> require 'uri'

> irb(main):006:0> u=URI.parse("http://foo/bar?dodo=1&dada=2"\)
> => #<URI::HTTP:0x3ff9814a URL:http://foo/bar?dodo=1&dada=2>
> irb(main):007:0> u.query
> => "dodo=1&dada=2"
> irb(main):008:0> u.query.split('&')
> => ["dodo=1", "dada=2"]
> ...

Query strings are allowed to use semicolons as delimeters, not to
mention you must handle multiple values per key. I recommend using the
CGI library with the URI library:

  irb(main):001:0> require 'uri'
  => true
  irb(main):002:0> require 'cgi'
  => true
  irb(main):003:0> CGI.parse(URI.parse('http://foo/?a=b&b=c'\).query)
  => {"a"=>["b"], "b"=>["c"]}
  irb(main):004:0> CGI.parse(URI.parse('http://foo/?a=b;b=c'\).query)
  => {"a"=>["b"], "b"=>["c"]}
  irb(main):005:0> CGI.parse(URI.parse('http://foo/?b=a;b=c'\).query)
  => {"b"=>["a", "c"]}
  irb(main):006:0>

--
Aaron Pattersonhttp://tenderlovemaking.com/- Hide quoted text -

- Show quoted text -

Aaron_Patterson1 · 1 September 2007 19:47

Use hpricot to extract the href, then feed it though URI and CGI.

···

On Sun, Sep 02, 2007 at 04:30:05AM +0900, lrlebron@gmail.com wrote:

On Sep 1, 2:15 pm, Aaron Patterson <aa...@tenderlovemaking.com> wrote:
> On Sun, Sep 02, 2007 at 04:00:20AM +0900, Robert Klemme wrote:
> > On 01.09.2007 19:34, lrleb...@gmail.com wrote:
> > >I am trying to parse strings like this
> > ><a href='showmono.asp?cpnum=1022&ampmonotype=comb' target='main'>
>
> > >I need to get the cpnum value (555)
>
> > >I am using the following function
>
> > >def get_drugId(link)
> > > arrParts = link.html.split('?')
> > > cpnum = arrParts[1].split('&amp')
> > > cpnumparts= cpnum[0].split("=")
> > > drugId = cpnumparts[1]
> > > end
>
> > >but I imagine there is a simpler way to do this. Also, I would like
> > >something more flexible that would return all the query parameters (if
> > >there are more than one) in an array or a hash.
>
> > >Any ideas?
>
> > The std lib:
>
> > require 'uri'
>
> > irb(main):006:0> u=URI.parse("http://foo/bar?dodo=1&dada=2"\)
> > => #<URI::HTTP:0x3ff9814a URL:http://foo/bar?dodo=1&dada=2>
> > irb(main):007:0> u.query
> > => "dodo=1&dada=2"
> > irb(main):008:0> u.query.split('&')
> > => ["dodo=1", "dada=2"]
> > ...
>
> Query strings are allowed to use semicolons as delimeters, not to
> mention you must handle multiple values per key. I recommend using the
> CGI library with the URI library:
>
> irb(main):001:0> require 'uri'
> => true
> irb(main):002:0> require 'cgi'
> => true
> irb(main):003:0> CGI.parse(URI.parse('http://foo/?a=b&b=c'\).query)
> => {"a"=>["b"], "b"=>["c"]}
> irb(main):004:0> CGI.parse(URI.parse('http://foo/?a=b;b=c'\).query)
> => {"a"=>["b"], "b"=>["c"]}
> irb(main):005:0> CGI.parse(URI.parse('http://foo/?b=a;b=c'\).query)
> => {"b"=>["a", "c"]}
> irb(main):006:0>
>
> --
> Aaron Pattersonhttp://tenderlovemaking.com/- Hide quoted text -
>
> - Show quoted text -

This would work if the string where a proper url. But it is a
hyperlink.

--
Aaron Patterson
http://tenderlovemaking.com/

lrlebron · 1 September 2007 19:55

Sorry for the second reply. I took your suggestions and came up with
the following

require 'uri'
require 'cgi'

str = "<a href='showmono.asp?cpnum=555&monotype=full' target='main'>"

def get_cpnum(link)
     arrParts = link.split(' ')
     CGI.parse(URI.parse(arrParts[1]).query)['cpnum']
  end

puts get_cpnum(str)

···

On Sep 1, 2:29 pm, "lrleb...@gmail.com" <lrleb...@gmail.com> wrote:

On Sep 1, 2:15 pm, Aaron Patterson <aa...@tenderlovemaking.com> wrote:

> On Sun, Sep 02, 2007 at 04:00:20AM +0900, Robert Klemme wrote:
> > On 01.09.2007 19:34, lrleb...@gmail.com wrote:
> > >I am trying to parse strings like this
> > ><a href='showmono.asp?cpnum=1022&ampmonotype=comb' target='main'>

> > >I need to get the cpnum value (555)

> > >I am using the following function

> > >def get_drugId(link)
> > > arrParts = link.html.split('?')
> > > cpnum = arrParts[1].split('&amp')
> > > cpnumparts= cpnum[0].split("=")
> > > drugId = cpnumparts[1]
> > > end

> > >but I imagine there is a simpler way to do this. Also, I would like
> > >something more flexible that would return all the query parameters (if
> > >there are more than one) in an array or a hash.

> > >Any ideas?

> > The std lib:

> > require 'uri'

> > irb(main):006:0> u=URI.parse("http://foo/bar?dodo=1&dada=2"\)
> > => #<URI::HTTP:0x3ff9814a URL:http://foo/bar?dodo=1&dada=2>
> > irb(main):007:0> u.query
> > => "dodo=1&dada=2"
> > irb(main):008:0> u.query.split('&')
> > => ["dodo=1", "dada=2"]
> > ...

> Query strings are allowed to use semicolons as delimeters, not to
> mention you must handle multiple values per key. I recommend using the
> CGI library with the URI library:

> irb(main):001:0> require 'uri'
> => true
> irb(main):002:0> require 'cgi'
> => true
> irb(main):003:0> CGI.parse(URI.parse('http://foo/?a=b&b=c'\).query)
> => {"a"=>["b"], "b"=>["c"]}
> irb(main):004:0> CGI.parse(URI.parse('http://foo/?a=b;b=c'\).query)
> => {"a"=>["b"], "b"=>["c"]}
> irb(main):005:0> CGI.parse(URI.parse('http://foo/?b=a;b=c'\).query)
> => {"b"=>["a", "c"]}
> irb(main):006:0>

> --
> Aaron Pattersonhttp://tenderlovemaking.com/-Hide quoted text -

> - Show quoted text -

This would work if the string where a proper url. But it is a
hyperlink.- Hide quoted text -

- Show quoted text -

Phil · 1 September 2007 20:50

lrlebron@gmail.com wrote:

This would work if the string where a proper url. But it is a
hyperlink.

Your point? A hyperlink *is* a URL in the WWW context.

···

--
Phillip Gawlowski

lrlebron · 1 September 2007 23:05

If you try to parse URI throws an error.

···

On Sep 1, 3:50 pm, Phil <cmdjackr...@googlemail.com> wrote:

lrleb...@gmail.com wrote:
> This would work if the string where a proper url. But it is a
> hyperlink.

Your point? A hyperlink *is* a URL in the WWW context.

--
Phillip Gawlowski

lrlebron · 1 September 2007 23:25

Here's what I ended up with

require 'uri'
require 'cgi'
require 'hpricot'

def get_query_value(link, key='')
doc = Hpricot(link)

     if key.empty?
        CGI.parse(URI.parse(doc.at("a")['href']).query)
     else
        CGI.parse(URI.parse(doc.at("a")['href']).query)[key]
     end

end

str = "<a href='showmono.asp?cpnum=555&monotype=full' target='main'>"

p get_query_value(str)
puts get_query_value(str,'cpnum')
puts get_query_value(str,'monotype')

It allows me to ask for the complete hash or a particular key

Thanks,

Luis

···

On Sep 1, 2:47 pm, Aaron Patterson <aa...@tenderlovemaking.com> wrote:

On Sun, Sep 02, 2007 at 04:30:05AM +0900, lrleb...@gmail.com wrote:
> On Sep 1, 2:15 pm, Aaron Patterson <aa...@tenderlovemaking.com> wrote:
> > On Sun, Sep 02, 2007 at 04:00:20AM +0900, Robert Klemme wrote:
> > > On 01.09.2007 19:34, lrleb...@gmail.com wrote:
> > > >I am trying to parse strings like this
> > > ><a href='showmono.asp?cpnum=1022&ampmonotype=comb' target='main'>

> > > >I need to get the cpnum value (555)

> > > >I am using the following function

> > > >def get_drugId(link)
> > > > arrParts = link.html.split('?')
> > > > cpnum = arrParts[1].split('&amp')
> > > > cpnumparts= cpnum[0].split("=")
> > > > drugId = cpnumparts[1]
> > > > end

> > > >but I imagine there is a simpler way to do this. Also, I would like
> > > >something more flexible that would return all the query parameters (if
> > > >there are more than one) in an array or a hash.

> > > >Any ideas?

> > > The std lib:

> > > require 'uri'

> > > irb(main):006:0> u=URI.parse("http://foo/bar?dodo=1&dada=2"\)
> > > => #<URI::HTTP:0x3ff9814a URL:http://foo/bar?dodo=1&dada=2>
> > > irb(main):007:0> u.query
> > > => "dodo=1&dada=2"
> > > irb(main):008:0> u.query.split('&')
> > > => ["dodo=1", "dada=2"]
> > > ...

> > Query strings are allowed to use semicolons as delimeters, not to
> > mention you must handle multiple values per key. I recommend using the
> > CGI library with the URI library:

> > irb(main):001:0> require 'uri'
> > => true
> > irb(main):002:0> require 'cgi'
> > => true
> > irb(main):003:0> CGI.parse(URI.parse('http://foo/?a=b&b=c'\).query)
> > => {"a"=>["b"], "b"=>["c"]}
> > irb(main):004:0> CGI.parse(URI.parse('http://foo/?a=b;b=c'\).query)
> > => {"a"=>["b"], "b"=>["c"]}
> > irb(main):005:0> CGI.parse(URI.parse('http://foo/?b=a;b=c'\).query)
> > => {"b"=>["a", "c"]}
> > irb(main):006:0>

> > --
> > Aaron Pattersonhttp://tenderlovemaking.com/-Hide quoted text -

> > - Show quoted text -

> This would work if the string where a proper url. But it is a
> hyperlink.

Use hpricot to extract the href, then feed it though URI and CGI.

--
Aaron Pattersonhttp://tenderlovemaking.com/

Robert_K1 · 2 September 2007 12:00

Does it? This works for me:

irb(main):001:0> require 'uri'
=> true
irb(main):002:0> u=URI.parse('foo.bar/baz?x=2')
=> #<URI::Generic:0x3ffa0eda URL:foo.bar/baz?x=2>
irb(main):003:0> u.query
=> "x=2"
irb(main):004:0> u=URI.parse('baz?x=2')
=> #<URI::Generic:0x3ff9f15c URL:baz?x=2>
irb(main):005:0> u.query
=> "x=2"

Cheers

robert

···

On 02.09.2007 01:03, lrlebron@gmail.com wrote:

On Sep 1, 3:50 pm, Phil <cmdjackr...@googlemail.com> wrote:

lrleb...@gmail.com wrote:

This would work if the string where a proper url. But it is a
hyperlink.

Your point? A hyperlink *is* a URL in the WWW context.

--
Phillip Gawlowski

If you try to parse URI throws an error.

lrlebron · 2 September 2007 13:35

I meant if you try to parse the string
str = "<a href='showmono.asp?cpnum=555&monotype=full' target='main'>"
it throws an error.

c:/ruby/lib/ruby/1.8/uri/common.rb:432:in `split': bad URI(is not
URI?): <a href='showmono.asp?cpnum=555&monotype=full' target='main'>
(URI::InvalidURIError)
from c:/ruby/lib/ruby/1.8/uri/common.rb:481:in `parse'
from uritest.rb:8

···

On Sep 2, 6:59 am, Robert Klemme <shortcut...@googlemail.com> wrote:

On 02.09.2007 01:03, lrleb...@gmail.com wrote:

> On Sep 1, 3:50 pm, Phil <cmdjackr...@googlemail.com> wrote:
>> lrleb...@gmail.com wrote:
>>> This would work if the string where a proper url. But it is a
>>> hyperlink.
>> Your point? A hyperlink *is* a URL in the WWW context.

>> --
>> Phillip Gawlowski

> If you try to parse URI throws an error.

Does it? This works for me:

irb(main):001:0> require 'uri'
=> true
irb(main):002:0> u=URI.parse('foo.bar/baz?x=2')
=> #<URI::Generic:0x3ffa0eda URL:foo.bar/baz?x=2>
irb(main):003:0> u.query
=> "x=2"
irb(main):004:0> u=URI.parse('baz?x=2')
=> #<URI::Generic:0x3ff9f15c URL:baz?x=2>
irb(main):005:0> u.query
=> "x=2"

Cheers

robert

Topic		Replies	Views
Url parsing in ruby ruby-talk	9	137	4 March 2010
How do I parse a string to find a URL? ruby-talk	7	151	18 September 2007
Easy access for CGI query ruby-talk	13	149	8 February 2003
Suggestion for string parsing ruby-talk	17	151	19 September 2008
How i can to parse string ruby-talk	7	159	26 April 2006

Parsing query parameters from hyperlink

Related topics