Rexml newbie confused

I bit the bullet and went to ruby cvs last week from 1.6.8.
The new libraries are great! Today I am playing with rexml and
open-uri. Something is b0rked, though.

I tried parsing my homepage since it’s xhtml, and I don’t have
loads of xml files knocking about.

open-uri loads the doc fine (nice lib, incidentally), but rexml has
some problems with it. If I remove this line:

from , it all works ok. Anything I’m missing?
Here’s the program, both inputs and the output (I cut out all the
bits that didn’t seem relevant):

1rasputin@lb:xml$ cat wtf.xml

oh dear hmm

1rasputin@lb:xml$ cat poc.rb
#!/data/ruby/bin/ruby -w
require “rexml/document”
require “open-uri”
xml = open(ARGV[0])
doc = REXML::Document.new xml

1rasputin@lb:xml$ ./poc.rb wtf.xml
/data/ruby/lib/ruby/1.9/rexml/parsers/baseparser.rb:291:in pull': Missing end tag for 'head' (got "html") (REXML::ParseException) Line: 9 Position: 280 Last 80 unconsumed characters: from /data/ruby/lib/ruby/1.9/rexml/document.rb:180:in build’
from /data/ruby/lib/ruby/1.9/rexml/document.rb:44:in initialize' from ./poc.rb:6:in new’
from ./poc.rb:6
1rasputin@lb:xml$ ./poc.rb ok.xml

1rasputin@lb:xml$ diff ok.xml wtf.xml
5a6

1rasputin@lb:xml$

···


Serenity through viciousness.
Rasputin :: Jack of All Trades - Master of Nuns

Hi,

···

Dick Davies rasputnik@hellooperator.net wrote:

open-uri loads the doc fine (nice lib, incidentally), but rexml has
some problems with it. If I remove this line:

from , it all works ok. Anything I’m missing?

I think you should enclose “stylesheet” with quotation
marks, i.e. ‘rel=stylesheet’ should be ‘rel=“stylesheet”’.

XML does not allow you to skip quotation marks in attribute
values.

Hopes this helps,

Takashi Sano

Dick Davies wrote:

I bit the bullet and went to ruby cvs last week from 1.6.8.
The new libraries are great! Today I am playing with rexml and
open-uri. Something is b0rked, though.

I tried parsing my homepage since it’s xhtml, and I don’t have
loads of xml files knocking about.

open-uri loads the doc fine (nice lib, incidentally), but rexml has
some problems with it. If I remove this line:

------------^

The value of the rel attribute must be in quotes.

James

Hi,

open-uri loads the doc fine (nice lib, incidentally), but rexml has
some problems with it. If I remove this line:

from , it all works ok. Anything I’m missing?

I think you should enclose “stylesheet” with quotation
marks, i.e. ‘rel=stylesheet’ should be ‘rel=“stylesheet”’.

XML does not allow you to skip quotation marks in attribute
values.

Thanks, I thought I’d run tidy on it, but obviously without the
option to enforce xml compliance.
Sorry for the noise, I’ll get back to playing…

···

Dick Davies rasputnik@hellooperator.net wrote:

Waste not, get your budget cut next year.
Rasputin :: Jack of All Trades - Master of Nuns

Dick Davies rasputnik@hellooperator.net wrote in message news:20040428124512.GA26840@lb.tenfour

I think you should enclose “stylesheet” with quotation
marks, i.e. ‘rel=stylesheet’ should be ‘rel=“stylesheet”’.

XML does not allow you to skip quotation marks in attribute
values.

Thanks, I thought I’d run tidy on it, but obviously without the
option to enforce xml compliance.
Sorry for the noise, I’ll get back to playing…

Grrr. I’ve got to sit down one of these days and improve REXML’s
error reporting. It isn’t quite a bug, but it is certainly frustrating
that REXML didn’t tell you that the problem was that quotes were
missing.

It is on my list of things to do.

— SER