Escaping single quotes in XPath query with REXML

Anybody tried to use XPath in REXML with a single quote, only to run into the fact that quote escaping in XPath is apparently not accounted for? If this were in the context on XSLT I'd be able to assign some annoying temp variable like $apos, but it's not, so I can't.

irb(main):001:0> require 'rexml/document'
=> true
irb(main):002:0> include REXML
=> Object
irb(main):003:0> xml = "<rss version='2.0'><channel><item><title>John's Doe</title></item></channel></rss>"
=> "<rss version='2.0'><channel><item><title>John's Doe</title></item></channel></rss>"
irb(main):004:0> xmldoc = Document.new xml
=> <UNDEFINED> ... </>
irb(main):005:0> XPath.first( xmldoc, "/rss/channel/item/title" ).to_s
=> "<title>John's Doe</title>"
irb(main):006:0> XPath.first( xmldoc, "/rss/channel/item/title[text()='John's Doe']" ).to_s
NoMethodError: undefined method `node_type' for "John":String
         from /usr/local/lib/ruby/1.8/rexml/xpath_parser.rb:124:in `internal_parse'
         from /usr/local/lib/ruby/1.8/rexml/xpath_parser.rb:123:in `each'
         from /usr/local/lib/ruby/1.8/rexml/xpath_parser.rb:123:in `internal_parse'
         from /usr/local/lib/ruby/1.8/rexml/xpath_parser.rb:49:in `match'
         from /usr/local/lib/ruby/1.8/rexml/xpath_parser.rb:402:in `Predicate'
         from /usr/local/lib/ruby/1.8/rexml/xpath_parser.rb:346:in `Predicate'
         from /usr/local/lib/ruby/1.8/rexml/xpath_parser.rb:204:in `internal_parse'
         from /usr/local/lib/ruby/1.8/rexml/xpath_parser.rb:199:in `times'
         from /usr/local/lib/ruby/1.8/rexml/xpath_parser.rb:199:in `internal_parse'
         from /usr/local/lib/ruby/1.8/rexml/xpath_parser.rb:49:in `match'
         from /usr/local/lib/ruby/1.8/rexml/xpath_parser.rb:34:in `parse'
         from /usr/local/lib/ruby/1.8/rexml/xpath.rb:28:in `first'
         from (irb):6
irb(main):007:0> XPath.first( xmldoc, "/rss/channel/item/title[text()='John\'s Doe']" ).to_s
NoMethodError: undefined method `node_type' for "John":String
         from /usr/local/lib/ruby/1.8/rexml/xpath_parser.rb:124:in `internal_parse'
         from /usr/local/lib/ruby/1.8/rexml/xpath_parser.rb:123:in `each'
         from /usr/local/lib/ruby/1.8/rexml/xpath_parser.rb:123:in `internal_parse'
         from /usr/local/lib/ruby/1.8/rexml/xpath_parser.rb:49:in `match'
         from /usr/local/lib/ruby/1.8/rexml/xpath_parser.rb:402:in `Predicate'
         from /usr/local/lib/ruby/1.8/rexml/xpath_parser.rb:346:in `Predicate'
         from /usr/local/lib/ruby/1.8/rexml/xpath_parser.rb:204:in `internal_parse'
         from /usr/local/lib/ruby/1.8/rexml/xpath_parser.rb:199:in `times'
         from /usr/local/lib/ruby/1.8/rexml/xpath_parser.rb:199:in `internal_parse'
         from /usr/local/lib/ruby/1.8/rexml/xpath_parser.rb:49:in `match'
         from /usr/local/lib/ruby/1.8/rexml/xpath_parser.rb:34:in `parse'
         from /usr/local/lib/ruby/1.8/rexml/xpath.rb:28:in `first'
         from (irb):7

irb(main):006:0> XPath.first( xmldoc,
"/rss/channel/item/title[text()='John's Doe']" ).to_s

I'm no expert in XPath, but that looks like a broken XPath query because of
the three single quotes.

irb(main):007:0> XPath.first( xmldoc,
"/rss/channel/item/title[text()='John\'s Doe']" ).to_s

That's identical, as you'll see if you try this:

irb(main):001:0> a="text()='John\'s Doe'"
=> "text()='John's Doe'"

You've not inserted a backslash into the string, you just escaped the quote,
and the escaping was removed. You need two backslashes to insert a single
backslash into the string:

irb(main):002:0> a="text()='John\\'s Doe'"
=> "text()='John\\'s Doe'"

(Despite how it looks, there is only a single backslash in there; it's shown
as two because it's inside a double-quoted string, to make it valid Ruby)

irb(main):003:0> a.each_byte { |c| print c.chr," " }
t e x t ( ) = ' J o h n \ ' s D o e ' => "text()='John\\'s Doe'"

However, I've just had a quick scan through the XPath-1.0 spec, and I don't
think that's how you do it. You can include single quotes inside a
double-quoted string, and vice versa. But probably what you want for the
general case is XML character entities: &#39; or &apos;

Try passing your string through this before constructing your XPath query:

  require 'rexml/text'
  a = "John's Doe"
  b = REXML::Text::normalize(a)
  #=> "John&apos;s Doe"

HTH,

Brian.

Hmm, that doesn't work.

irb(main):007:0> XPath.first( xmldoc, "/rss/channel/item/title[text()='John&apos;s Doe']" ).to_s
=> ""
irb(main):008:0> XPath.first( xmldoc, "/rss/channel/item/title[text()='John&#39;s Doe']" ).to_s
=> ""
irb(main):009:0> XPath.first( xmldoc, "/rss/channel/item/title[text()=\"John's Doe\"]" ).to_s
=> "<title>John's Doe</title>"

You might want to raise that with the REXML author. In the mean time, if you
know the string only contains single quotes, then you can surround it with
double quotes in the XPath query, as per the third line above.

Regards,

Brian.

···

On Thu, Oct 21, 2004 at 09:28:51AM +0100, Brian Candler wrote:

Try passing your string through this before constructing your XPath query:

  require 'rexml/text'
  a = "John's Doe"
  b = REXML::Text::normalize(a)
  #=> "John&apos;s Doe"