(Almost) Everything in Ruby is an Object, so what you're asking for is another String object. "treeframe112..." is just a human-readable representation of that object, and a variable is just a pointer to that object.
Like Assaph said, you can use regexes to get such a String. ri String#match or String#scan or StringScanner, for instance.
If you plan on parsing a lot of HTML, there are some Ruby HTML parsers. Michael Neumann's Mechanize has been recommended on this list before, but that's as much as I know about it.
Devin
···
tuyet.ctn@mscibarra.com wrote:
How do I extract "treeframe1120266500902" from this String class
and stored it in a variable to be used later?
Although that'll work for this particular string, I'd rather think this is a case for a HTML parser. Apparently the name of a frame is wanted and a HTML parser is the safest way to get that info.
Although that'll work for this particular string, I'd rather think this is a
case for a HTML parser. Apparently the name of a frame is wanted and a HTML
parser is the safest way to get that info.
Agree completely. Regular expressions should not be used to parse HTML
or XML. However, XPath is an excellent alternative to regular
expressions in these cases. In XPath, the expression to get the name of
the frame would be '//frame/@name'.
Since I'm new to Ruby, I have to ask: is there an HTML parser that
supports XPath? I know that LibXML does a great job parsing HTML and I
find XPath to be a terrific way to do it--just about anything you want
to extract becomes a one-liner. Do the Ruby bindings expose this
functionality? If not, is there another library that can do this?
Although that'll work for this particular string, I'd rather think
this is a case for a HTML parser. Apparently the name of a frame is
wanted and a HTML parser is the safest way to get that info.
Agree completely. Regular expressions should not be used to parse HTML
or XML. However, XPath is an excellent alternative to regular
expressions in these cases. In XPath, the expression to get the name
of the frame would be '//frame/@name'.
Since I'm new to Ruby, I have to ask: is there an HTML parser that
supports XPath? I know that LibXML does a great job parsing HTML and I
find XPath to be a terrific way to do it--just about anything you want
to extract becomes a one-liner. Do the Ruby bindings expose this
functionality? If not, is there another library that can do this?
Rexml can - but then again, it's "just" an XML parser.
Although that'll work for this particular string, I'd rather think this is a
case for a HTML parser. Apparently the name of a frame is wanted and a HTML
parser is the safest way to get that info.
Agree completely. Regular expressions should not be used to parse HTML
or XML. However, XPath is an excellent alternative to regular
expressions in these cases. In XPath, the expression to get the name of
the frame would be '//frame/@name'.
Since I'm new to Ruby, I have to ask: is there an HTML parser that
supports XPath? I know that LibXML does a great job parsing HTML and I
find XPath to be a terrific way to do it--just about anything you want
to extract becomes a one-liner. Do the Ruby bindings expose this
functionality? If not, is there another library that can do this?
REXML, part of the standard library, does XPath. If the source HTML is not also XML, then you'll need to coerce it so REXML can load it.
Michael Neumann's Mechanize lib bundles up this behavior so that you can grab an HTML doc and operate on select sections; you can also grab the resulting REXML document and run arbitrary XPath calls on it too. Search the ruby-talk archives as this was discussed not too long ago.
I used tidy to turn HTML into XHTML, and then REXML to navigate and
modify it. I could've turned it back into HTML with tidy again, but
leaving it as XHTML was acceptable for me (parsing HTML elements from
RSS and modifying them for import into a new blog engine).
···
On 7/7/05, Mark Thomas <mrt@thomaszone.com> wrote:
Since I'm new to Ruby, I have to ask: is there an HTML parser that
supports XPath?
Michael Neumann's Mechanize lib bundles up this behavior so that you can
grab an HTML doc and operate on select sections; you can also grab the
resulting REXML document and run arbitrary XPath calls on it too.
Search the ruby-talk archives as this was discussed not too long ago.
I saw that comment, but wasn't able to find any documentation for
Mechanize. Sorry if I'm being stupid, but where can I find the
documentation?
- Google searches bring up nothing
- RAA search doesn't find Mechanize
- Rubyforge search brings up project Wee, docs tab is empty, wiki is
blank, homepage has Wee docs but no Mechanize docs.
Sigh... http://search.cpan.org/ makes finding documentation for Perl
modules very easy. Is there an equivalent for Ruby Gems?
Michael Neumann's Mechanize lib bundles up this behavior so that you can
grab an HTML doc and operate on select sections; you can also grab the
resulting REXML document and run arbitrary XPath calls on it too.
Search the ruby-talk archives as this was discussed not too long ago.
I saw that comment, but wasn't able to find any documentation for
Mechanize. Sorry if I'm being stupid, but where can I find the
documentation?
Nowhere, as it's non-existing. And I do not plan to document it, but I've been told that the www.ruby-web.org project will adopt Mechanize and maybe they'll document and improve it.
I saw that comment, but wasn't able to find any documentation for
Mechanize. Sorry if I'm being stupid, but where can I find the
documentation?
Unfortunately the documentation for Mechanize is a bit slim at the moment.
When I wanted to use it I ended up just reading the code and learned quite
a bit because it is pretty well written. I plan to write an article or two
about Mechanize and the cool things you can do with it, which I'll publish
on my web-site. But that doesn't help you much at the moment, especially
since I'm still constructing my site.
To see the code I wrote to automate library book renewal, check this out: