Get content in a xml element using hpricot

Bonita · 13 April 2007 07:48

Hi

I'm using hpricot to parse the following file.

<item
rdf:about="http://del.icio.us/url/50666d1a3fe2b942b20819ec2919d2b7#morwyn">
<title>[from morwyn] * HTML for the Conceptually Challenged</title>
<link>http://del.icio.us/url/50666d1a3fe2b942b20819ec2919d2b7#morwyn</link>
<description>HTML for the Conceptually Challenged. Very basic tutorial,
plainly worded for people who hate to read instructions.</description>
<dc:creator>morwyn</dc:creator>
<dc:date>2006-10-10T07:28:28Z</dc:date>
<dc:subject>html imported webpagedesign</dc:subject>
<taxo:topics>
  <rdf:Bag>
    <rdf:li resource="http://del.icio.us/tag/imported" />
    <rdf:li resource="http://del.icio.us/tag/html" />
    <rdf:li resource="http://del.icio.us/tag/webpagedesign" />
  </rdf:Bag>
</taxo:topics>
</item>

I'm trying to get the content from <dc:subject> like this

doc = Hpricot.parse(File.read("965.xhtml"))

(doc/"item").each do |t|

puts (t/"dc:subject").innerTEXT

end

but I got

<dc:subject>html internet tutorial web</dc:subject>

while I only need "html internet tutorial web"

Anyone knows what's the right function to call?

THanks

···

--
Posted via http://www.ruby-forum.com/.

Kikijump · 13 April 2007 10:45

Hi

I'm using hpricot to parse the following file.

<item
rdf:about="http://del.icio.us/url/50666d1a3fe2b942b20819ec2919d2b7#morwyn">
<title>[from morwyn] * HTML for the Conceptually Challenged</title>
<link>http://del.icio.us/url/50666d1a3fe2b942b20819ec2919d2b7#morwyn</link>
<description>HTML for the Conceptually Challenged. Very basic tutorial,
plainly worded for people who hate to read instructions.</description>
<dc:creator>morwyn</dc:creator>
<dc:date>2006-10-10T07:28:28Z</dc:date>
<dc:subject>html imported webpagedesign</dc:subject>
<taxo:topics>
  <rdf:Bag>
    <rdf:li resource="http://del.icio.us/tag/imported" />
    <rdf:li resource="http://del.icio.us/tag/html" />
    <rdf:li resource="http://del.icio.us/tag/webpagedesign" />
  </rdf:Bag>
</taxo:topics>
</item>

I'm trying to get the content from <dc:subject> like this

doc = Hpricot.parse(File.read("965.xhtml"))

(doc/"item").each do |t|

  puts (t/"dc:subject").innerTEXT

end

but I got

<dc:subject>html internet tutorial web</dc:subject>

while I only need "html internet tutorial web"

Anyone knows what's the right function to call?

THanks

--
Posted viahttp://www.ruby-forum.com/.

puts (t/'dc:subject').text

···

On Apr 13, 9:48 am, Bonita <abbo...@yahoo.com.tw> wrote:

Billy_Hsu1 · 13 April 2007 10:50

Sorry for deleted your text

Maybe you can try:

puts (t/"dc:subject").text

Bonita wrote:

···

I'm trying to get the content from <dc:subject> like this

doc = Hpricot.parse(File.read("965.xhtml"))

(doc/"item").each do |t|

puts (t/"dc:subject").innerTEXT

end

but I got

<dc:subject>html internet tutorial web</dc:subject>

while I only need "html internet tutorial web"

Anyone knows what's the right function to call?

THanks

--
Posted via http://www.ruby-forum.com/\.

Kikijump · 13 April 2007 10:50

puts (t/'dc:subject').text

Sorry for the double post but I shouldn't have copy/paste the result
directly from irb

···

On Apr 13, 12:40 pm, kikij...@gmail.com wrote:

On Apr 13, 9:48 am, Bonita <abbo...@yahoo.com.tw> wrote:

> Hi

> I'm using hpricot to parse the following file.

> <item
> rdf:about="http://del.icio.us/url/50666d1a3fe2b942b20819ec2919d2b7#morwyn">
> <title>[from morwyn] * HTML for the Conceptually Challenged</title>
> <link>http://del.icio.us/url/50666d1a3fe2b942b20819ec2919d2b7#morwyn</link>
> <description>HTML for the Conceptually Challenged. Very basic tutorial,
> plainly worded for people who hate to read instructions.</description>
> <dc:creator>morwyn</dc:creator>
> <dc:date>2006-10-10T07:28:28Z</dc:date>
> <dc:subject>html imported webpagedesign</dc:subject>
> <taxo:topics>
> <rdf:Bag>
> <rdf:li resource="http://del.icio.us/tag/imported" />
> <rdf:li resource="http://del.icio.us/tag/html" />
> <rdf:li resource="http://del.icio.us/tag/webpagedesign" />
> </rdf:Bag>
> </taxo:topics>
> </item>

> I'm trying to get the content from <dc:subject> like this

> doc = Hpricot.parse(File.read("965.xhtml"))

> (doc/"item").each do |t|

> puts (t/"dc:subject").innerTEXT

> end

> but I got

> <dc:subject>html internet tutorial web</dc:subject>

> while I only need "html internet tutorial web"

> Anyone knows what's the right function to call?

> THanks

> --
> Posted viahttp://www.ruby-forum.com/.
>> puts (t/'dc:subject').text

Topic		Replies	Views
Hpricot innerTEXT? ruby-talk	9	63	14 April 2007
HTML parser using Hpricot ruby-talk	0	83	8 January 2010
Docs using hpricot with xml including 'CDATA'? ruby-talk	0	130	8 January 2009
Hpricot query ruby-talk	7	92	20 January 2009
Hpricot Namespaces ruby-talk	0	73	9 January 2007

Get content in a xml element using hpricot

Related topics