Extract/Parse String?

Tuyet_Ctn · 6 July 2005 02:05

How do I extract "treeframe1120266500902" from this String class
and stored it in a variable to be used later?

irb(main):205:0> puts c

<FRAMESET border=0 frameSpacing=0 rows=26,* frameBorder=0
onload=onLoad(); cols=

* onunload=onUnload()><FRAME border=0 name=sidebar_header marginWidth=0
marginHe

ight=0
src="/araneae/PortfolioAdmin/Sidebar/showSidebarFiltersB?&filterId=0&

amp;showHelp=true&common.sessionId=sGCq3td6d5iQGx94yZ9DxA99"
frameBorder=0 n

oResize scrolling=no><FRAME border=0 name=treeframe1120266500902
marginWidth=4 m

arginHeight=0 src="/include/frameReady.html" frameBorder=0

</FRAMESET>

irb(main):206:0> puts c.class

String

=> nil

Assaph_Mehr1 · 6 July 2005 03:55

Use regular expressions
(http://www.ruby-doc.org/docs/ProgrammingRuby/html/intro.html#S5), then
#scan the string for something that matches. Eg. assuming the format is
always 'treeframe' followed by digits:

irb(main):038:0> c.scan /treeframe\d+/
=> ["treeframe1120266500902"]

You'll get an array with all the results. If you know you have only one
occurence you can use String#slice (or String#[]) to get the first
value:

irb(main):037:0> c[/treeframe\d+/]
=> "treeframe1120266500902"

HTH,
Assaph

Devin_Mullins · 6 July 2005 04:07

(Almost) Everything in Ruby is an Object, so what you're asking for is another String object. "treeframe112..." is just a human-readable representation of that object, and a variable is just a pointer to that object.

Like Assaph said, you can use regexes to get such a String. ri String#match or String#scan or StringScanner, for instance.

If you plan on parsing a lot of HTML, there are some Ruby HTML parsers. Michael Neumann's Mechanize has been recommended on this list before, but that's as much as I know about it.

Devin

···

tuyet.ctn@mscibarra.com wrote:

How do I extract "treeframe1120266500902" from this String class
and stored it in a variable to be used later?

Robert · 6 July 2005 06:40

Although that'll work for this particular string, I'd rather think this is a case for a HTML parser. Apparently the name of a frame is wanted and a HTML parser is the safest way to get that info.

Kind regards

robert

···

Assaph Mehr <assaph@gmail.com> wrote:

Use regular expressions
(http://www.ruby-doc.org/docs/ProgrammingRuby/html/intro.html#S5\),
then #scan the string for something that matches. Eg. assuming the
format is always 'treeframe' followed by digits:

irb(main):038:0> c.scan /treeframe\d+/
=> ["treeframe1120266500902"]

You'll get an array with all the results. If you know you have only
one occurence you can use String#slice (or String#) to get the first
value:

irb(main):037:0> c[/treeframe\d+/]
=> "treeframe1120266500902"

HTH,
Assaph

Tuyet_Ctn · 6 July 2005 22:30

Thank you Assaph!

c[/treeframe\d+/] works beautifully!

I also appreciate your link to the intro.html although I couldn't find
examples of regular expressions.

Thanks everyone else for your suggestions. I appreciate it.

Mark_Thomas1 · 7 July 2005 12:30

Although that'll work for this particular string, I'd rather think this is a
case for a HTML parser. Apparently the name of a frame is wanted and a HTML
parser is the safest way to get that info.

Agree completely. Regular expressions should not be used to parse HTML
or XML. However, XPath is an excellent alternative to regular
expressions in these cases. In XPath, the expression to get the name of
the frame would be '//frame/@name'.

Since I'm new to Ruby, I have to ask: is there an HTML parser that
supports XPath? I know that LibXML does a great job parsing HTML and I
find XPath to be a terrific way to do it--just about anything you want
to extract becomes a one-liner. Do the Ruby bindings expose this
functionality? If not, is there another library that can do this?

- Mark.

Robert · 7 July 2005 13:25

Mark Thomas wrote:

Although that'll work for this particular string, I'd rather think
this is a case for a HTML parser. Apparently the name of a frame is
wanted and a HTML parser is the safest way to get that info.

Agree completely. Regular expressions should not be used to parse HTML
or XML. However, XPath is an excellent alternative to regular
expressions in these cases. In XPath, the expression to get the name
of the frame would be '//frame/@name'.

Since I'm new to Ruby, I have to ask: is there an HTML parser that
supports XPath? I know that LibXML does a great job parsing HTML and I
find XPath to be a terrific way to do it--just about anything you want
to extract becomes a one-liner. Do the Ruby bindings expose this
functionality? If not, is there another library that can do this?

Rexml can - but then again, it's "just" an XML parser.

Kind regards

robert

James_Britt4 · 7 July 2005 13:42

Mark Thomas wrote:

Although that'll work for this particular string, I'd rather think this is a
case for a HTML parser. Apparently the name of a frame is wanted and a HTML
parser is the safest way to get that info.

Agree completely. Regular expressions should not be used to parse HTML
or XML. However, XPath is an excellent alternative to regular
expressions in these cases. In XPath, the expression to get the name of
the frame would be '//frame/@name'.

Since I'm new to Ruby, I have to ask: is there an HTML parser that
supports XPath? I know that LibXML does a great job parsing HTML and I
find XPath to be a terrific way to do it--just about anything you want
to extract becomes a one-liner. Do the Ruby bindings expose this
functionality? If not, is there another library that can do this?

REXML, part of the standard library, does XPath. If the source HTML is not also XML, then you'll need to coerce it so REXML can load it.

Michael Neumann's Mechanize lib bundles up this behavior so that you can grab an HTML doc and operate on select sections; you can also grab the resulting REXML document and run arbitrary XPath calls on it too. Search the ruby-talk archives as this was discussed not too long ago.

James

···

- Mark.

.

--

http://www.ruby-doc.org - The Ruby Documentation Site
http://www.rubyxml.com - News, Articles, and Listings for Ruby & XML
http://www.rubystuff.com - The Ruby Store for Ruby Stuff
http://www.jamesbritt.com - Playing with Better Toys

Brad_Wilson · 7 July 2005 13:44

I used tidy to turn HTML into XHTML, and then REXML to navigate and
modify it. I could've turned it back into HTML with tidy again, but
leaving it as XHTML was acceptable for me (parsing HTML elements from
RSS and modifying them for import into a new blog engine).

···

On 7/7/05, Mark Thomas <mrt@thomaszone.com> wrote:

Since I'm new to Ruby, I have to ask: is there an HTML parser that
supports XPath?

Mark_Thomas1 · 7 July 2005 16:25

Michael Neumann's Mechanize lib bundles up this behavior so that you can
grab an HTML doc and operate on select sections; you can also grab the
resulting REXML document and run arbitrary XPath calls on it too.
Search the ruby-talk archives as this was discussed not too long ago.

I saw that comment, but wasn't able to find any documentation for
Mechanize. Sorry if I'm being stupid, but where can I find the
documentation?
- Google searches bring up nothing
- RAA search doesn't find Mechanize
- Rubyforge search brings up project Wee, docs tab is empty, wiki is
blank, homepage has Wee docs but no Mechanize docs.

Sigh... http://search.cpan.org/ makes finding documentation for Perl
modules very easy. Is there an equivalent for Ruby Gems?

- Mark.

George5 · 7 July 2005 18:48

Mark Thomas wrote:

Michael Neumann's Mechanize lib bundles up this behavior so that you can
grab an HTML doc and operate on select sections; you can also grab the
resulting REXML document and run arbitrary XPath calls on it too.
Search the ruby-talk archives as this was discussed not too long ago.

I saw that comment, but wasn't able to find any documentation for
Mechanize. Sorry if I'm being stupid, but where can I find the
documentation?

Nowhere, as it's non-existing. And I do not plan to document it, but I've been told that the www.ruby-web.org project will adopt Mechanize and maybe they'll document and improve it.

Take a look at the examples.

Regards,

Michael

Ryan_Leavengood2 · 7 July 2005 18:52

Mark Thomas said:

I saw that comment, but wasn't able to find any documentation for
Mechanize. Sorry if I'm being stupid, but where can I find the
documentation?

Unfortunately the documentation for Mechanize is a bit slim at the moment.
When I wanted to use it I ended up just reading the code and learned quite
a bit because it is pretty well written. I plan to write an article or two
about Mechanize and the cool things you can do with it, which I'll publish
on my web-site. But that doesn't help you much at the moment, especially
since I'm still constructing my site.

To see the code I wrote to automate library book renewal, check this out:

http://ruby-talk.org/cgi-bin/scat.rb/ruby/ruby-talk/146181

Though if you are mostly interested in the HTML parsing aspects of
Mechanize, this may not help you too much. What exactly are you wanting to
do?

Ryan

Topic		Replies	Views
Parse a String ruby-talk	4	85	23 January 2010
Extract value ruby-talk	3	86	22 June 2010
Easily parsing a string to retrieve values and assign them to a variable/symbol ruby-talk	6	100	18 July 2007
Regular Expression question ruby-talk	5	116	12 August 2005
Read variables from string ruby-talk	5	65	25 July 2007

Extract/Parse String?

Related Topics