So I am new to Ruby scripting so I am not sure if this is possible or
not. I want to make a script that will load a webpage and then search
through the HTML of that page until it hits a certain tag. Once it hits
that tag it need to grab all of the text between the tag and the
appropriate end tag. Is something like this possible?
Example
<html>
<body>
<h3>test</h3>
</body>
</html>
html = <<EOF
<html>
<body>
<h3>test</h3>
</body>
</html>
EOF
doc = Hpricot(html)
puts (doc/'h3').first.inner_text
--Greg
···
On Sat, Mar 01, 2008 at 12:22:12PM +0900, Tom Arra wrote:
So I am new to Ruby scripting so I am not sure if this is possible or
not. I want to make a script that will load a webpage and then search
through the HTML of that page until it hits a certain tag. Once it hits
that tag it need to grab all of the text between the tag and the
appropriate end tag. Is something like this possible?
Example
<html>
<body>
<h3>test</h3>
</body>
</html>
On Feb 29, 9:22 pm, Tom Arra <turtleman14...@gmail.com> wrote:
So I am new to Ruby scripting so I am not sure if this is possible or
not. I want to make a script that will load a webpage and then search
through the HTML of that page until it hits a certain tag. Once it hits
that tag it need to grab all of the text between the tag and the
appropriate end tag. Is something like this possible?
Example
<html>
<body>
<h3>test</h3>
</body>
</html>
I want the script to return "test"
--
Posted viahttp://www.ruby-forum.com/.
-----Original Message-----
From: William James <w_a_x_man@yahoo.com>
Date: Sat, 1 Mar 2008 13:49:59
To:ruby-talk@ruby-lang.org (ruby-talk ML)
Subject: Re: Scan HTML
On Feb 29, 9:22 pm, Tom Arra <turtleman14...@gmail.com> wrote:
So I am new to Ruby scripting so I am not sure if this is possible or
not. I want to make a script that will load a webpage and then search
through the HTML of that page until it hits a certain tag. Once it hits
that tag it need to grab all of the text between the tag and the
appropriate end tag. Is something like this possible?
Example
<html>
<body>
<h3>test</h3>
</body>
</html>
I want the script to return "test"
--
Posted viahttp://www.ruby-forum.com/.
On Feb 29, 9:34 pm, Gregory Seidman <gsslist+r...@anthropohedron.net> wrote:
On Sat, Mar 01, 2008 at 12:22:12PM +0900, Tom Arra wrote:
> So I am new to Ruby scripting so I am not sure if this is possible or
> not. I want to make a script that will load a webpage and then search
> through the HTML of that page until it hits a certain tag. Once it hits
> that tag it need to grab all of the text between the tag and the
> appropriate end tag. Is something like this possible?
So far I think this is closest to what I am looking for. I need to go to
a website that has a server information and pull that out of the HTML.
Then take that info and spit it back out to the user. If I am
understanding the code above, it at least does the first part which I
had no clue how to do.
--
Posted via http://www.ruby-forum.com/\.
Personally I agree on that, insofar that I think the most simple,
"default" ruby solution is better than a specialized one. In this case I
think the better solution is Net::HTTP
Same question, different people, same strict requirements. It sounds
a little like homework. In that case, I suppose some of the regexp
solutions provided will work (for this small use case).
I still think Florian said it best, though. Unless you can "stack",
you won't be able to correctly reveal the components inside a nested
language structure. I haven't looked into the theory, but I can
attest to the pain in the arse I've had trying to scrape with regular
expressions.
Todd
···
On Fri, Feb 29, 2008 at 10:55 PM, William James <w_a_x_man@yahoo.com> wrote:
On Feb 29, 9:34 pm, Gregory Seidman <gsslist+r...@anthropohedron.net> > wrote:
> On Sat, Mar 01, 2008 at 12:22:12PM +0900, Tom Arra wrote:
> > So I am new to Ruby scripting so I am not sure if this is possible or
> > not. I want to make a script that will load a webpage and then search
> > through the HTML of that page until it hits a certain tag. Once it hits
> > that tag it need to grab all of the text between the tag and the
> > appropriate end tag. Is something like this possible?
>
> > Example
> > <html>
> > <body>
> > <h3>test</h3>
> > </body>
> > </html>
>
> > I want the script to return "test"
>
> You want the Hpricot gem.
So far I think this is closest to what I am looking for. I need to go to
a website that has a server information and pull that out of the HTML.
Then take that info and spit it back out to the user. If I am
understanding the code above, it at least does the first part which I
had no clue how to do.
Well I just tried it and it worked like a charm. My next thing is to
limit what it brings back.
Example
<h3>blah blah blah 7.0.0.3.4 blah blah blah</h3>
I want to pull just the 7.0.0.3.4 and none of the words. I am sure this
is going to have to deal with more regular expressions but I never
really understood how to use them well.
···
On Feb 29, 9:22 pm, Tom Arra <turtleman14...@gmail.com> wrote:
E:\>irb --prompt xmp
s = " <h3>blah blah 7.0.0.3.4 blah</h3>"
==>" <h3>blah blah 7.0.0.3.4 blah</h3>"
# Find a substring composed of numerals and dots that is
# at least 3 characters long.
s[ /[\d.]{3,}/ ]
==>"7.0.0.3.4"
···
On Mar 1, 7:56 am, Tom Arra <turtleman14...@gmail.com> wrote:
Tom Arra wrote:
> William James wrote:
>> On Feb 29, 9:22 pm, Tom Arra <turtleman14...@gmail.com> wrote:
>>> </body>
>>> </html>
>>> I want the script to return "test"
>>> --
>>> Posted viahttp://www.ruby-forum.com/.
> So far I think this is closest to what I am looking for. I need to go to
> a website that has a server information and pull that out of the HTML.
> Then take that info and spit it back out to the user. If I am
> understanding the code above, it at least does the first part which I
> had no clue how to do.
Well I just tried it and it worked like a charm. My next thing is to
limit what it brings back.
Example
<h3>blah blah blah 7.0.0.3.4 blah blah blah</h3>
I want to pull just the 7.0.0.3.4 and none of the words. I am sure this
is going to have to deal with more regular expressions but I never
really understood how to use them well.
--
Posted viahttp://www.ruby-forum.com/.
On Mar 1, 7:56 am, Tom Arra <turtleman14...@gmail.com> wrote:
>> require 'net/http'
> So far I think this is closest to what I am looking for. I need to go to
I want to pull just the 7.0.0.3.4 and none of the words. I am sure this
is going to have to deal with more regular expressions but I never
really understood how to use them well.
--
Posted viahttp://www.ruby-forum.com/.
E:\>irb --prompt xmp
s = " <h3>blah blah 7.0.0.3.4 blah</h3>"
==>" <h3>blah blah 7.0.0.3.4 blah</h3>"
# Find a substring composed of numerals and dots that is
# at least 3 characters long.
s[ /[\d.]{3,}/ ]
==>"7.0.0.3.4"
Your really good at this stuff! One thing i noticed is that it works
perfectly for the regular domain but as soon as I put a full URL into
the Net::HTTP.new command it starts to throw errors. Any ideas.
--
Posted via http://www.ruby-forum.com/\.
and here is my output TomArra.com Title Tag: Welcome To TomArra.com
7.0.0.4.3
SocketError: getaddrinfo: nodename nor servname provided, or not known
method initialize in http.rb at line 564
method open in http.rb at line 564
method connect in http.rb at line 564
method timeout in timeout.rb at line 48
method timeout in timeout.rb at line 76
method connect in http.rb at line 564
method do_start in http.rb at line 557
method start in http.rb at line 546
method request in http.rb at line 1044
method get in http.rb at line 781
at top level in simple.rb at line 11
Program exited.
William James wrote:
>> >> require 'net/http'
>> > So far I think this is closest to what I am looking for. I need to go to
>>
>> I want to pull just the 7.0.0.3.4 and none of the words. I am sure this
>> is going to have to deal with more regular expressions but I never
>> really understood how to use them well.
>> --
>> Posted viahttp://www.ruby-forum.com/.
>
> E:\>irb --prompt xmp
> s = " <h3>blah blah 7.0.0.3.4 blah</h3>"
> ==>" <h3>blah blah 7.0.0.3.4 blah</h3>"
> # Find a substring composed of numerals and dots that is
> # at least 3 characters long.
> s[ /[\d.]{3,}/ ]
> ==>"7.0.0.3.4"
Your really good at this stuff! One thing i noticed is that it works
perfectly for the regular domain but as soon as I put a full URL into
the Net::HTTP.new command it starts to throw errors. Any ideas.
Use the rest of the URL as the argument for ".get()":
One more little problem. I noticed that this net/http method automaticly
puts in port 80. Problem is that I need to get to a different port.
There has to be a way around this, right?
One more little problem. I noticed that this net/http method automaticly
puts in port 80. Problem is that I need to get to a different port.
There has to be a way around this, right?