Get content in a xml element using hpricot

Hi

I'm using hpricot to parse the following file.

<item
rdf:about="http://del.icio.us/url/50666d1a3fe2b942b20819ec2919d2b7#morwyn">
<title>[from morwyn] * HTML for the Conceptually Challenged</title>
<link>http://del.icio.us/url/50666d1a3fe2b942b20819ec2919d2b7#morwyn</link>
<description>HTML for the Conceptually Challenged. Very basic tutorial,
plainly worded for people who hate to read instructions.</description>
<dc:creator>morwyn</dc:creator>
<dc:date>2006-10-10T07:28:28Z</dc:date>
<dc:subject>html imported webpagedesign</dc:subject>
<taxo:topics>
  <rdf:Bag>
    <rdf:li resource="http://del.icio.us/tag/imported" />
    <rdf:li resource="http://del.icio.us/tag/html" />
    <rdf:li resource="http://del.icio.us/tag/webpagedesign" />
  </rdf:Bag>
</taxo:topics>
</item>

I'm trying to get the content from <dc:subject> like this

doc = Hpricot.parse(File.read("965.xhtml"))

(doc/"item").each do |t|

  puts (t/"dc:subject").innerTEXT

end

but I got

<dc:subject>html internet tutorial web</dc:subject>

while I only need "html internet tutorial web"

Anyone knows what's the right function to call?

THanks

···

--
Posted via http://www.ruby-forum.com/.

Hi

I'm using hpricot to parse the following file.

<item
rdf:about="http://del.icio.us/url/50666d1a3fe2b942b20819ec2919d2b7#morwyn&quot;&gt;
<title>[from morwyn] * HTML for the Conceptually Challenged</title>
<link>http://del.icio.us/url/50666d1a3fe2b942b20819ec2919d2b7#morwyn&lt;/link&gt;
<description>HTML for the Conceptually Challenged. Very basic tutorial,
plainly worded for people who hate to read instructions.</description>
<dc:creator>morwyn</dc:creator>
<dc:date>2006-10-10T07:28:28Z</dc:date>
<dc:subject>html imported webpagedesign</dc:subject>
<taxo:topics>
  <rdf:Bag>
    <rdf:li resource="http://del.icio.us/tag/imported&quot; />
    <rdf:li resource="http://del.icio.us/tag/html&quot; />
    <rdf:li resource="http://del.icio.us/tag/webpagedesign&quot; />
  </rdf:Bag>
</taxo:topics>
</item>

I'm trying to get the content from <dc:subject> like this

doc = Hpricot.parse(File.read("965.xhtml"))

(doc/"item").each do |t|

  puts (t/"dc:subject").innerTEXT

end

but I got

<dc:subject>html internet tutorial web</dc:subject>

while I only need "html internet tutorial web"

Anyone knows what's the right function to call?

THanks

--
Posted viahttp://www.ruby-forum.com/.

puts (t/'dc:subject').text

···

On Apr 13, 9:48 am, Bonita <abbo...@yahoo.com.tw> wrote:

Sorry for deleted your text :frowning:

Maybe you can try:

  puts (t/"dc:subject").text

Bonita wrote:

···

I'm trying to get the content from <dc:subject> like this

doc = Hpricot.parse(File.read("965.xhtml"))

(doc/"item").each do |t|

  puts (t/"dc:subject").innerTEXT

end

but I got

<dc:subject>html internet tutorial web</dc:subject>

while I only need "html internet tutorial web"

Anyone knows what's the right function to call?

THanks

--
Posted via http://www.ruby-forum.com/\.

puts (t/'dc:subject').text

Sorry for the double post but I shouldn't have copy/paste the result
directly from irb :frowning:

···

On Apr 13, 12:40 pm, kikij...@gmail.com wrote:

On Apr 13, 9:48 am, Bonita <abbo...@yahoo.com.tw> wrote:

> Hi

> I'm using hpricot to parse the following file.

> <item
> rdf:about="http://del.icio.us/url/50666d1a3fe2b942b20819ec2919d2b7#morwyn&quot;&gt;
> <title>[from morwyn] * HTML for the Conceptually Challenged</title>
> <link>http://del.icio.us/url/50666d1a3fe2b942b20819ec2919d2b7#morwyn&lt;/link&gt;
> <description>HTML for the Conceptually Challenged. Very basic tutorial,
> plainly worded for people who hate to read instructions.</description>
> <dc:creator>morwyn</dc:creator>
> <dc:date>2006-10-10T07:28:28Z</dc:date>
> <dc:subject>html imported webpagedesign</dc:subject>
> <taxo:topics>
> <rdf:Bag>
> <rdf:li resource="http://del.icio.us/tag/imported&quot; />
> <rdf:li resource="http://del.icio.us/tag/html&quot; />
> <rdf:li resource="http://del.icio.us/tag/webpagedesign&quot; />
> </rdf:Bag>
> </taxo:topics>
> </item>

> I'm trying to get the content from <dc:subject> like this

> doc = Hpricot.parse(File.read("965.xhtml"))

> (doc/"item").each do |t|

> puts (t/"dc:subject").innerTEXT

> end

> but I got

> <dc:subject>html internet tutorial web</dc:subject>

> while I only need "html internet tutorial web"

> Anyone knows what's the right function to call?

> THanks

> --
> Posted viahttp://www.ruby-forum.com/.
>> puts (t/'dc:subject').text