This works fine, but not if the document contains a doctype
declaration with a system identifier. For some reason, libxml tries to
resolve it. Leading to significant performances issues.
Is there a way to tell the Document-object that it should ignore the
doctype declaration if present? Or should I first remove the
declaration from the document before calling new?
This works fine, but not if the document contains a doctype
declaration with a system identifier. For some reason, libxml tries to
resolve it. Leading to significant performances issues.
If the doctype is an HTML, open the document like this:
This works fine, but not if the document contains a doctype
declaration with a system identifier. For some reason, libxml tries to
resolve it. Leading to significant performances issues.
Is there a way to tell the Document-object that it should ignore the
doctype declaration if present? Or should I first remove the
declaration from the document before calling new?
regard, Ruud
-------------------------------------
This sig is dedicated to the advancement of Nuclear Power
Tommy Nordgren
tommy.nordgren@comhem.se
This works fine, but not if the document contains a doctype
declaration with a system identifier. For some reason, libxml tries to
resolve it. Leading to significant performances issues.
Is there a way to tell the Document-object that it should ignore the
doctype declaration if present? Or should I first remove the
declaration from the document before calling new?
thanks for the suggestion. The document is not an HTML document. It is
an XML document. It is something like this:
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE test PUBLIC "-//FARAWAY//DTD-verweg//NL"
"http://some.site.nl/dtd/test.dtd">
<test>
<p>this is a test</p>
</test>
I don't want XML::Document to resolve the URL and waiting for a
timeout. I couldn't find anything in the documentation on this.
regards, Ruud
···
On 29/07/2008, Phlip <phlip2005@gmail.com> wrote:
ruud grosmann wrote:
This works fine, but not if the document contains a doctype
declaration with a system identifier. For some reason, libxml tries to
resolve it. Leading to significant performances issues.
If the doctype is an HTML, open the document like this:
thanks for this hint. I had decided libxslt was not for me because of
a probblem with garbage collection after starting to use it (see other
post).
So a good alternative is welcome. I'll check it out later this week.
regards, Ruud
···
On 31/07/2008, Mark Guzman <segfault@hasno.info> wrote:
This works fine, but not if the document contains a doctype
declaration with a system identifier. For some reason, libxml tries to
resolve it. Leading to significant performances issues.
Is there a way to tell the Document-object that it should ignore the
doctype declaration if present? Or should I first remove the
declaration from the document before calling new?
thank you for the hint. I did it already, but I was wondering if there
is some hidden option that did it for me.
Is my assumption correct that the class not documentated very good?
After googling for some time I only found something that appeared to
be outdated. That why I eventually posted my question here.
Is using libxml the right thing to do to, or are there smarter alternatives?
thanks, Ruud
···
On 29/07/2008, Phlip <phlip2005@gmail.com> wrote:
ruud grosmann wrote:
I don't want XML::Document to resolve the URL and waiting for a
timeout. I couldn't find anything in the documentation on this.
Is using libxml the right thing to do to, or are there smarter alternatives?
Libxml-ruby is the most complete & accurate parser of the big three (REXML, Libxml-ruby, and Hpricot), and its documentation can be very challenging. How much of the original C Libxml documentation have you been able to read?
I tried to reply to this via the ruby-talk mailing list and it didn't
work. Not sure why not, maybe someone can fill me in on that. Anyway,
here's my take:
doesn't look like a real doctype definition, so if you can pull it out
of your xml (by hand, not programmatically) before trying to parse it,
I'd say that would be a good idea. That being said, there are two
attributes of the XML::Parser class that look like they may be of
interest: default_load_external_dtd and default_validity_checking. Try
setting both of those to false, unless you have a real dtd to validate
against and the example above was fake. Of course, since this is using
XML::Parser instead of XML::Document I think you would need to do e.g.:
parser = XML::Parser.file(<file>)
parser.default_load_external_dtd = false
parser.default_validity_checking = false
doc = parser.parse
... and then go from there.
Phill D.
Phlip wrote:
···
ruud grosmann wrote:
Is using libxml the right thing to do to, or are there smarter alternatives?
Libxml-ruby is the most complete & accurate parser of the big three
(REXML,
Libxml-ruby, and Hpricot), and its documentation can be very
challenging. How
much of the original C Libxml documentation have you been able to read?
doesn't look like a real doctype definition, so if you can pull it out of your xml (by hand, not programmatically) before trying to parse it, I'd say that would be a good idea. That being said, there are two attributes of the XML::Parser class that look like they may be of interest: default_load_external_dtd and default_validity_checking. Try setting both of those to false, unless you have a real dtd to validate against and the example above was fake. Of course, since this is using XML::Parser instead of XML::Document I think you would need to do e.g.: parser = XML::Parser.file(<file>)
parser.default_load_external_dtd = false
parser.default_validity_checking = false
doc = parser.parse
... and then go from there.
Phill D.
Phlip wrote:
···
ruud grosmann wrote:
Is using libxml the right thing to do to, or are there smarter alternatives?
Libxml-ruby is the most complete & accurate parser of the big three (REXML, Libxml-ruby, and Hpricot), and its documentation can be very challenging. How much of the original C Libxml documentation have you been able to read?
doesn't look like a real doctype definition, so if you can pull it out of your xml (by hand, not programmatically) before trying to parse it, I'd say that would be a good idea. That being said, there are two attributes of the XML::Parser class that look like they may be of interest: default_load_external_dtd and default_validity_checking. Try setting both of those to false, unless you have a real dtd to validate against and the example above was fake. Of course, since this is using XML::Parser instead of XML::Document I think you would need to do e.g.: parser = XML::Parser.file(<file>)
parser.default_load_external_dtd = false
parser.default_validity_checking = false
doc = parser.parse
... and then go from there.
Phill D.
Phlip wrote:
ruud grosmann wrote:
Is using libxml the right thing to do to, or are there smarter alternatives?
Libxml-ruby is the most complete & accurate parser of the big three (REXML, Libxml-ruby, and Hpricot), and its documentation can be very challenging. How much of the original C Libxml documentation have you been able to read?
doesn't look like a real doctype definition, so if you can pull it out
of your xml (by hand, not programmatically) before trying to parse it,
I'd say that would be a good idea. That being said, there are two
attributes of the XML::Parser class that look like they may be of
interest: default_load_external_dtd and default_validity_checking. Try
setting both of those to false, unless you have a real dtd to validate
against and the example above was fake. Of course, since this is using
XML::Parser instead of XML::Document I think you would need to do
e.g.: parser = XML::Parser.file(<file>)
parser.default_load_external_dtd = false
parser.default_validity_checking = false
doc = parser.parse
... and then go from there.
Phill D.
Phlip wrote:
ruud grosmann wrote:
Is using libxml the right thing to do to, or are there smarter
alternatives?
Libxml-ruby is the most complete & accurate parser of the big three
(REXML, Libxml-ruby, and Hpricot), and its documentation can be very
challenging. How much of the original C Libxml documentation have you
been able to read?
When I was researching the difference between the normal XML parser and the HTML parser, I also observed those variables not working. That's why I didn't bring them up.
doesn't look like a real doctype definition, so if you can pull it out
of your xml (by hand, not programmatically) before trying to parse it,
I'd say that would be a good idea. That being said, there are two
attributes of the XML::Parser class that look like they may be of
interest: default_load_external_dtd and default_validity_checking. Try
setting both of those to false, unless you have a real dtd to validate
against and the example above was fake. Of course, since this is using
XML::Parser instead of XML::Document I think you would need to do
e.g.: parser = XML::Parser.file(<file>)
parser.default_load_external_dtd = false
parser.default_validity_checking = false
doc = parser.parse
... and then go from there.
Phill D.
Phlip wrote:
ruud grosmann wrote:
Is using libxml the right thing to do to, or are there smarter
alternatives?
Libxml-ruby is the most complete & accurate parser of the big three
(REXML, Libxml-ruby, and Hpricot), and its documentation can be very
challenging. How much of the original C Libxml documentation have you
been able to read?
Hey Ruud,
Nope, I can't see that you're doing anything wrong. I guess all I can say is if can send the actual XML so I can give it a try with it (because when I use your original example it seems to work fine as long as I set those class variables). Also, the error message you sent was broken up, if you could please try to send that again it would probably help. Here's what I'm using:
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE test PUBLIC "-//FARAWAY//DTD-verweg//NL" "Site.nl: Domeinnaam en webhosting - Ga online, begin nu!;
<test>
<p>this is a test</p>
</test>
And here's the error I get when I don't set those class variables:
Hm, Java XML parsers I know have a special callback that you can set
that will deal with resolving external entities. I could not find
anything similar in libxml documentation but maybe I just looked in
the wrong places. With that you could load the file just once (or
even fetch it from some internal memory or file system). Also, I find
it a bit strange that those flags are global - this can introduce
weird bugs when using an application which parses XML concurrently and
needs different flags for each process...
doesn't look like a real doctype definition, so if you can pull it out
of your xml (by hand, not programmatically) before trying to parse it,
I'd say that would be a good idea. That being said, there are two
attributes of the XML::Parser class that look like they may be of
interest: default_load_external_dtd and default_validity_checking. Try
setting both of those to false, unless you have a real dtd to validate
against and the example above was fake. Of course, since this is using
XML::Parser instead of XML::Document I think you would need to do
e.g.: parser = XML::Parser.file(<file>)
parser.default_load_external_dtd = false
parser.default_validity_checking = false
doc = parser.parse
... and then go from there.
Phill D.
Phlip wrote:
ruud grosmann wrote:
Is using libxml the right thing to do to, or are there smarter
alternatives?
Libxml-ruby is the most complete & accurate parser of the big three
(REXML, Libxml-ruby, and Hpricot), and its documentation can be very
challenging. How much of the original C Libxml documentation have you
been able to read?
Hey Ruud,
Nope, I can't see that you're doing anything wrong. I guess all I can say
is if can send the actual XML so I can give it a try with it (because when I
use your original example it seems to work fine as long as I set those class
variables). Also, the error message you sent was broken up, if you could
please try to send that again it would probably help. Here's what I'm using:
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE test PUBLIC "-//FARAWAY//DTD-verweg//NL"
"Site.nl: Domeinnaam en webhosting - Ga online, begin nu!;
<test>
<p>this is a test</p>
</test>
And here's the error I get when I don't set those class variables:
doesn't look like a real doctype definition, so if you can pull it out
of your xml (by hand, not programmatically) before trying to parse it,
I'd say that would be a good idea. That being said, there are two
attributes of the XML::Parser class that look like they may be of
interest: default_load_external_dtd and default_validity_checking. Try
setting both of those to false, unless you have a real dtd to validate
against and the example above was fake. Of course, since this is using
XML::Parser instead of XML::Document I think you would need to do
e.g.: parser = XML::Parser.file(<file>)
parser.default_load_external_dtd = false
parser.default_validity_checking = false
doc = parser.parse
... and then go from there.
Phill D.
Phlip wrote:
ruud grosmann wrote:
Is using libxml the right thing to do to, or are there smarter
alternatives?
Libxml-ruby is the most complete & accurate parser of the big three
(REXML, Libxml-ruby, and Hpricot), and its documentation can be very
challenging. How much of the original C Libxml documentation have you
been able to read?
Hey Ruud,
Nope, I can't see that you're doing anything wrong. I guess all I can
say
is if can send the actual XML so I can give it a try with it (because when
I
use your original example it seems to work fine as long as I set those
class
variables). Also, the error message you sent was broken up, if you could
please try to send that again it would probably help. Here's what I'm
using:
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE test PUBLIC "-//FARAWAY//DTD-verweg//NL"
"http://some.site.nl/dtd/test.dtd">
<test>
<p>this is a test</p>
</test>
And here's the error I get when I don't set those class variables:
Hm, Java XML parsers I know have a special callback that you can set
that will deal with resolving external entities. I could not find
anything similar in libxml documentation but maybe I just looked in
the wrong places. With that you could load the file just once (or
even fetch it from some internal memory or file system). Also, I find
it a bit strange that those flags are global - this can introduce
weird bugs when using an application which parses XML concurrently and
needs different flags for each process...
Kind regards
robert
--
use.inject do |as, often| as.you_can - without end