Html parser with regex, how to solve?

Grabber · 6 January 2008 00:07

Yeah,

I`m trying to develop a simple application using ruby (when this works i
will pass to rails). I need get the source code from a URL, and find for
this string:

wow, but i need search for not only 149.00, but for all possible numbers, my
friend suggest this:

i think this works! but i need other thing... look my code:

#!/usr/bin/ruby

require 'hpricot'
require 'open-uri'

@content = Hpricot(open("
http://www.newegg.com/Product/Product.aspx?Item=N82E16855101066"))

now how i can find for <h3 class="zmp">*$\d+\.\d{2}.*</h3> ?

@content.search("<h3 class="zmp">*$\d+\.\d{2}.*</h3>") is broken ;(

how i can solved this?

thanks for you attention,
Luiz Vitor Martinez Cardoso.

···

--
Regards,
Luiz Vitor Martinez Cardoso [Grabber].
(11) 8187-8662

rubz.org - engineer student at maua.br

Steve_Ross · 6 January 2008 01:10

Don't use the regex. Let hpricot do what it's good at:

$ irb
>> require 'rubygems'
>> require 'hpricot'
>> html = '<h3 class="zmp">149.00</h3>'
>> doc = Hpricot.parse(html)
>> ele = doc.search('h3.zmp')
>> puts ele.text
=> 149.00

In your code, your @content will be searchable the same way. Hpricot will give you a collection of all h3's with class 'zmp'.

http://code.whytheluckystiff.net/doc/hpricot/

Hope this helps.

···

On Jan 5, 2008, at 4:07 PM, Luiz Vitor Martinez Cardoso wrote:

Yeah,

I`m trying to develop a simple application using ruby (when this works i
will pass to rails). I need get the source code from a URL, and find for
this string:

<h3 class="zmp">$299.99</h3>

wow, but i need search for not only 149.00, but for all possible numbers, my
friend suggest this:

<h3 class="zmp">*$\d+\.\d{2}.*</h3>

i think this works! but i need other thing... look my code:

#!/usr/bin/ruby

require 'hpricot'
require 'open-uri'

@content = Hpricot(open("
http://www.newegg.com/Product/Product.aspx?Item=N82E16855101066"\))

now how i can find for <h3 class="zmp">*$\d+\.\d{2}.*</h3> ?

@content.search("<h3 class="zmp">*$\d+\.\d{2}.*</h3>") is broken ;(

how i can solved this?

thanks for you attention,
Luiz Vitor Martinez Cardoso.

--
Regards,
Luiz Vitor Martinez Cardoso [Grabber].
(11) 8187-8662

rubz.org - engineer student at maua.br

Grabber · 6 January 2008 01:34

Thanks much! This really works

Now i`m having a new problem (very simple), the output is $1999,00, how i
can remove a $? I will need convert this to a float number

Regards,
Luiz Vitor Martinez Cardoso.

···

On Jan 5, 2008 11:10 PM, s.ross <cwdinfo@gmail.com> wrote:

Don't use the regex. Let hpricot do what it's good at:

$ irb
>> require 'rubygems'
>> require 'hpricot'
>> html = '<h3 class="zmp">149.00</h3>'
>> doc = Hpricot.parse(html)
>> ele = doc.search('h3.zmp')
>> puts ele.text
=> 149.00

In your code, your @content will be searchable the same way. Hpricot
will give you a collection of all h3's with class 'zmp'.

http://code.whytheluckystiff.net/doc/hpricot/

Hope this helps.

On Jan 5, 2008, at 4:07 PM, Luiz Vitor Martinez Cardoso wrote:

> Yeah,
>
> I`m trying to develop a simple application using ruby (when this
> works i
> will pass to rails). I need get the source code from a URL, and find
> for
> this string:
>
> <h3 class="zmp">$299.99</h3>
>
> wow, but i need search for not only 149.00, but for all possible
> numbers, my
> friend suggest this:
>
> <h3 class="zmp">*$\d+\.\d{2}.*</h3>
>
> i think this works! but i need other thing... look my code:
>
> #!/usr/bin/ruby
>
> require 'hpricot'
> require 'open-uri'
>
> @content = Hpricot(open("
> http://www.newegg.com/Product/Product.aspx?Item=N82E16855101066"\))
>
> now how i can find for <h3 class="zmp">*$\d+\.\d{2}.*</h3> ?
>
> @content.search("<h3 class="zmp">*$\d+\.\d{2}.*</h3>") is broken ;(
>
> how i can solved this?
>
>
> thanks for you attention,
> Luiz Vitor Martinez Cardoso.
>
>
>
> --
> Regards,
> Luiz Vitor Martinez Cardoso [Grabber].
> (11) 8187-8662
>
> rubz.org - engineer student at maua.br

--
Regards,
Luiz Vitor Martinez Cardoso [Grabber].
(11) 8187-8662

rubz.org - engineer student at maua.br

Joe8 · 6 January 2008 02:15

try this:

ele.text.sub('$', '')

Joe

···

On Jan 5, 2008 8:34 PM, Luiz Vitor Martinez Cardoso <grabber@gmail.com> wrote:

Thanks much! This really works

Now i`m having a new problem (very simple), the output is $1999,00, how i
can remove a $? I will need convert this to a float number

Regards,
Luiz Vitor Martinez Cardoso.

On Jan 5, 2008 11:10 PM, s.ross <cwdinfo@gmail.com> wrote:

> Don't use the regex. Let hpricot do what it's good at:
>
> $ irb
> >> require 'rubygems'
> >> require 'hpricot'
> >> html = '<h3 class="zmp">149.00</h3>'
> >> doc = Hpricot.parse(html)
> >> ele = doc.search('h3.zmp')
> >> puts ele.text
> => 149.00
>
> In your code, your @content will be searchable the same way. Hpricot
> will give you a collection of all h3's with class 'zmp'.
>
> http://code.whytheluckystiff.net/doc/hpricot/
>
> Hope this helps.
>
>
> On Jan 5, 2008, at 4:07 PM, Luiz Vitor Martinez Cardoso wrote:
>
> > Yeah,
> >
> > I`m trying to develop a simple application using ruby (when this
> > works i
> > will pass to rails). I need get the source code from a URL, and find
> > for
> > this string:
> >
> > <h3 class="zmp">$299.99</h3>
> >
> > wow, but i need search for not only 149.00, but for all possible
> > numbers, my
> > friend suggest this:
> >
> > <h3 class="zmp">*$\d+\.\d{2}.*</h3>
> >
> > i think this works! but i need other thing... look my code:
> >
> > #!/usr/bin/ruby
> >
> > require 'hpricot'
> > require 'open-uri'
> >
> > @content = Hpricot(open("
> > http://www.newegg.com/Product/Product.aspx?Item=N82E16855101066"\))
> >
> > now how i can find for <h3 class="zmp">*$\d+\.\d{2}.*</h3> ?
> >
> > @content.search("<h3 class="zmp">*$\d+\.\d{2}.*</h3>") is broken ;(
> >
> > how i can solved this?
> >
> >
> > thanks for you attention,
> > Luiz Vitor Martinez Cardoso.
> >
> >
> >
> > --
> > Regards,
> > Luiz Vitor Martinez Cardoso [Grabber].
> > (11) 8187-8662
> >
> > rubz.org - engineer student at maua.br
>
>
>

--
Regards,
Luiz Vitor Martinez Cardoso [Grabber].
(11) 8187-8662

rubz.org - engineer student at maua.br

Grabber · 6 January 2008 02:32

Thanks

I do it!

Regards,
Luiz Vitor Martinez Cardoso.

···

On Jan 6, 2008 12:15 AM, Joe <qbproger@gmail.com> wrote:

try this:

ele.text.sub('$', '')

Joe

On Jan 5, 2008 8:34 PM, Luiz Vitor Martinez Cardoso <grabber@gmail.com> > wrote:
> Thanks much! This really works
>
> Now i`m having a new problem (very simple), the output is $1999,00, how
i
> can remove a $? I will need convert this to a float number
>
> Regards,
> Luiz Vitor Martinez Cardoso.
>
>
> On Jan 5, 2008 11:10 PM, s.ross <cwdinfo@gmail.com> wrote:
>
> > Don't use the regex. Let hpricot do what it's good at:
> >
> > $ irb
> > >> require 'rubygems'
> > >> require 'hpricot'
> > >> html = '<h3 class="zmp">149.00</h3>'
> > >> doc = Hpricot.parse(html)
> > >> ele = doc.search('h3.zmp')
> > >> puts ele.text
> > => 149.00
> >
> > In your code, your @content will be searchable the same way. Hpricot
> > will give you a collection of all h3's with class 'zmp'.
> >
> > http://code.whytheluckystiff.net/doc/hpricot/
> >
> > Hope this helps.
> >
> >
> > On Jan 5, 2008, at 4:07 PM, Luiz Vitor Martinez Cardoso wrote:
> >
> > > Yeah,
> > >
> > > I`m trying to develop a simple application using ruby (when this
> > > works i
> > > will pass to rails). I need get the source code from a URL, and find
> > > for
> > > this string:
> > >
> > > <h3 class="zmp">$299.99</h3>
> > >
> > > wow, but i need search for not only 149.00, but for all possible
> > > numbers, my
> > > friend suggest this:
> > >
> > > <h3 class="zmp">*$\d+\.\d{2}.*</h3>
> > >
> > > i think this works! but i need other thing... look my code:
> > >
> > > #!/usr/bin/ruby
> > >
> > > require 'hpricot'
> > > require 'open-uri'
> > >
> > > @content = Hpricot(open("
> > > http://www.newegg.com/Product/Product.aspx?Item=N82E16855101066"\))
> > >
> > > now how i can find for <h3 class="zmp">*$\d+\.\d{2}.*</h3> ?
> > >
> > > @content.search("<h3 class="zmp">*$\d+\.\d{2}.*</h3>") is broken ;(
> > >
> > > how i can solved this?
> > >
> > >
> > > thanks for you attention,
> > > Luiz Vitor Martinez Cardoso.
> > >
> > >
> > >
> > > --
> > > Regards,
> > > Luiz Vitor Martinez Cardoso [Grabber].
> > > (11) 8187-8662
> > >
> > > rubz.org - engineer student at maua.br
> >
> >
> >
>
>
> --
> Regards,
> Luiz Vitor Martinez Cardoso [Grabber].
> (11) 8187-8662
>
> rubz.org - engineer student at maua.br
>

--
Regards,
Luiz Vitor Martinez Cardoso [Grabber].
(11) 8187-8662

rubz.org - engineer student at maua.br

Topic		Replies	Views
Hpricot question ruby-talk	0	77	30 January 2008
Need a regex searching html code ruby-talk	17	137	29 February 2008
HTML parser using Hpricot ruby-talk	0	83	8 January 2010
[ANN] Hpricot 0.6 -- the swift, delightful HTML parser ruby-talk	0	119	16 June 2007
[QUIZ] Posix Pangrams (#97) ruby-talk	2	78	6 October 2006

Html parser with regex, how to solve?

Related topics