(RE)XML question

Question for you all. I want to treat HTML like XML
(which is no big deal).

But I want to find certain "special" tags (not real
HTML) and replace them with my own text.

It's macro-type stuff. Basically I want to output
the *same* HTML except for the text that replaced
the special tags.

I can't find any examples of generating XML with
REXML. It should be easy, I don't want it to be
too hard.

Contrived example below in case it helps.

How would you do this?

Thanks,
Hal

Input:

<html>
<body>
<p>Hi, there.</p>
<foo bar="this" bam="that">some more text</foo>
<p>That's all.</p>
</body>
</html>

Output:
<html>
<body>
<p>Hi, there.</p>
<p>I found a foo tag enclosing 'some more text' with
bar and bam values of 'this' and 'that'...</p>
<p>That's all.</p>
</body>
</html>

So, in Hpricot:

  doc = Hpricot("<html>...</html>")
  doc.search("foo").each do |ele|
    new_ele = Hpricot \
      '<p>I found a ' + ele.name + " tag enclosing \'" +
      ele.inner_html + "' with " + ele.attributes.keys.join(' and ') +
      " values of " + ele.attributes.values.map { |x| "'#{x}'" }.join(' and ') +
      "...</p>"
    ele.parent.replace_child(ele, new_ele.children.first)
  end
  puts doc

REXML has a replace_child as well. But now you've motivated me to add Element#replace.

_why

···

On Wed, Aug 23, 2006 at 07:15:09AM +0900, rubyhacker@gmail.com wrote:

Input:

<html>
<body>
<p>Hi, there.</p>
<foo bar="this" bam="that">some more text</foo>
<p>That's all.</p>
</body>
</html>

Output:
<html>
<body>
<p>Hi, there.</p>
<p>I found a foo tag enclosing 'some more text' with
bar and bam values of 'this' and 'that'...</p>
<p>That's all.</p>
</body>
</html>

unknown wrote:

It's macro-type stuff. Basically I want to output
the *same* HTML except for the text that replaced
the special tags.

This is what XSLT was designed for and it may provide another option for
you..

ilan

···

--
Posted via http://www.ruby-forum.com/\.

rubyhacker@gmail.com wrote:

Contrived example below in case it helps.

input = <<ENDHTML
<html>
<body>
<p>Hi, there.</p>
<foo bar="this" bam="that">some more text</foo>
<p>That's all.</p>
</body>
</html>
ENDHTML

require 'rexml/document'
doc = REXML::Document.new( input )
doc.root.each_element( '//foo' ){ |e|
  new_para = REXML::Element.new( 'p' )
  new_para.text = "I found a foo tag enclosing '#{e.text}' with bar and
bam values of '#{e.attributes['bar']}' and '#{e.attributes['bam']}'..."
  e.parent.replace_child( e, new_para )
}
puts doc

#=> <html>
#=> <body>
#=> <p>Hi, there.</p>
#=> <p>I found a foo tag enclosing &apos;some more text&apos; with bar
and bam values of &apos;this&apos; and &apos;that&apos;...</p>
#=> <p>That's all.</p>
#=> </body>
#=> </html>

rubyhacker@gmail.com wrote:

Question for you all. I want to treat HTML like XML
(which is no big deal).

But I want to find certain "special" tags (not real
HTML) and replace them with my own text.

It's macro-type stuff. Basically I want to output
the *same* HTML except for the text that replaced
the special tags.

I can't find any examples of generating XML with
REXML. It should be easy, I don't want it to be
too hard.

Contrived example below in case it helps.

How would you do this?

Thanks,
Hal

Input:

<html>
<body>
<p>Hi, there.</p>
<foo bar="this" bam="that">some more text</foo>
<p>That's all.</p>
</body>
</html>

Output:
<html>
<body>
<p>Hi, there.</p>
<p>I found a foo tag enclosing 'some more text' with
bar and bam values of 'this' and 'that'...</p>
<p>That's all.</p>
</body>
</html>

require 'xml-split.rb'

tag = 'foo'
DATA.read.xml_split(tag).each {|stuff|
  if stuff.class == String
    print stuff
  else
    attr = stuff[0].xml_parse
    puts "<p>I found a #{tag} tag enclosing '#{stuff[1]}' with"
    print "#{attr.keys.join(' and ')} values of "
    print "'#{attr.values.join("' and '")}'...</p>"
  end
}

__END__
<html>
<body>
<p>Hi, there.</p>
<foo bar="this" bam="that">some more text</foo>
<p>That's all.</p>
</body>
</html>

---- output ----

<html>
<body>
<p>Hi, there.</p>
<p>I found a foo tag enclosing 'some more text' with
bam and bar values of 'that' and 'this'...</p>
<p>That's all.</p>
</body>
</html>

why the lucky stiff wrote:

So, in Hpricot:

  doc = Hpricot("<html>...</html>")
  doc.search("foo").each do |ele|
    new_ele = Hpricot \
      '<p>I found a ' + ele.name + " tag enclosing \'" +
      ele.inner_html + "' with " + ele.attributes.keys.join(' and ') +
      " values of " + ele.attributes.values.map { |x| "'#{x}'" }.join(' and ') +
      "...</p>"
    ele.parent.replace_child(ele, new_ele.children.first)
  end
  puts doc

REXML has a replace_child as well. But now you've motivated me to add Element#replace.

Hmm, the right thing to do and a tasty way to do it.

This motivates me to download Hpricot for the first time
and try it. Probably tomorrow as my brane is fride.

Thanks,
Hal

Ilan Berci wrote:

>
> It's macro-type stuff. Basically I want to output
> the *same* HTML except for the text that replaced
> the special tags.

This is what XSLT was designed for and it may provide another option for
you..

That makes sense. I've never used XSLT, but I'm sure that's
a viable solution.

_Why's Hpricot example worked perfectly for me, BTW.

So, a related question.

Suppose I wanted to "nest" macros of this kind. Something like:

  <mac1 foo="1" bar="2>My name is
        <mac2 baz="3" bam="4">seed-value</mac2>
  today.</mac1>

Forgive the nonsense example.

Could XSLT handle this easily? Could Hpricot (_why)?

Thanks,
Hal

unknown wrote:

_Why's Hpricot example worked perfectly for me, BTW.

So, a related question.

Suppose I wanted to "nest" macros of this kind. Something like:

  <mac1 foo="1" bar="2>My name is
        <mac2 baz="3" bam="4">seed-value</mac2>
  today.</mac1>

Forgive the nonsense example.

Could XSLT handle this easily? Could Hpricot (_why)?

Thanks,
Hal

Yes, both techniques could handle nested elements, I don't know what XML
tools you are using, but many come with XSLT support built in. XSLT
allows any XML(XHTML) doc to be transformed into any other. At one time
it was slated to replace .CSS but that never seeemed to materialize.
Now days, it's mostly used in report generation and xml rpc filtering
but it ofcourse has many uses. The disadvantages of XSLT is that it can
be rather challenging to debug and it can grow to be very verbose in non
trivial transformations. The advantage is that it is a W3C standard and
practically every platform/language has support for it in one form or
another.

I have no experience of Hpricot but if you are already using Ruby as
your main processor then I would probably stick with Hpricot as the
solutions above look much cleaner than an XSLT solution :slight_smile: Oh.. and
lastly, if you don't use XSLT/XPath on a regular basis, you can easily
forget it's symantics and have to keep referring back to the docs or at
least I have to.

ilan

···

--
Posted via http://www.ruby-forum.com/\.