Nokigiri xpath

I have a long XML like below .. I wish to select DATA ("cdef" in this
case) when key="English"

What can be the easiest way. XML below is a part of 100 page XML

<topic>
   <key>Spanish</key>
  <topic>
    <key>description</key>
       <data>
            ABCDEF
       </data>
    <key>server`</key>
    <string>systems</string>
    <key>title</key>
     <string>Directir</string>
   </topic>
   <key>English</key>
    <topic>
      <key>description</key>
      <data>
          CDEF
      </data>
      <key>server</key>
      <string>producer</string>
      <key>title</key>
      <string>Update 66</string>
    </topic>
</topic>

···

--
Posted via http://www.ruby-forum.com/.

Try this:

doc.xpath("//key[. = 'English']/following-sibling::topic/data")

=> [#<Nokogiri::XML::Element:0x4918efe name="data"
children=[#<Nokogiri::XML::Text:0x4918dfa "\n CDEF\n ">]>]

Jesus.

···

On Thu, Nov 24, 2011 at 9:43 AM, Ruby Mania <prateek123@gmail.com> wrote:

I have a long XML like below .. I wish to select DATA ("cdef" in this
case) when key="English"

What can be the easiest way. XML below is a part of 100 page XML

<topic>
<key>Spanish</key>
<topic>
<key>description</key>
<data>
ABCDEF
</data>
<key>server`</key>
<string>systems</string>
<key>title</key>
<string>Directir</string>
</topic>
<key>English</key>
<topic>
<key>description</key>
<data>
CDEF
</data>
<key>server</key>
<string>producer</string>
<key>title</key>
<string>Update 66</string>
</topic>
</topic>

--
Posted via http://www.ruby-forum.com/\.

Thanks a lot for help. But it matched CDEF and all nodes after that even
if key != english

thanks again

···

Try this:

doc.xpath("//key[. = 'English']/following-sibling::topic/data")

=> [#<Nokogiri::XML::Element:0x4918efe name="data"
children=[#<Nokogiri::XML::Text:0x4918dfa "\n CDEF\n ">]>]

Jesus.

--
Posted via http://www.ruby-forum.com/\.

one way i can think of is to loop and break after getting first value
its ok if one english tag
what is multiple tag in the long file

Ruby Mania wrote in post #1033514:

···

Thanks a lot for help. But it matched CDEF and all nodes after that even
if key != english

thanks again

Try this:

doc.xpath("//key[. = 'English']/following-sibling::topic/data")

=> [#<Nokogiri::XML::Element:0x4918efe name="data"
children=[#<Nokogiri::XML::Text:0x4918dfa "\n CDEF\n ">]>]

Jesus.

--
Posted via http://www.ruby-forum.com/\.

I'm not sure why is this. I'm still trying to come up with a good
XPath that will return just that node,
but in the meantime you can do this:

doc.xpath("//key[. = 'English']/following-sibling::topic/data")[0]

Jesus.

···

On Thu, Nov 24, 2011 at 12:30 PM, Ruby Mania <prateek123@gmail.com> wrote:

Thanks a lot for help. But it matched CDEF and all nodes after that even
if key != english

Thanks a lot for help. But it matched CDEF and all nodes after that even
if key != english

I'm not sure why is this. I'm still trying to come up with a good
XPath that will return just that node,

Well, there could be many matches and from the original posting I
cannot see that only the first is needed.

but in the meantime you can do this:

doc.xpath("//key[. = 'English']/following-sibling::topic/data")[0]

Or better

doc.at_xpath("//key[. = 'English']/following-sibling::topic[1]/data")

I would probably do

doc.xpath('//topic[preceding-sibling::key[text()="English"]]//data')

or, for one hit only

doc.at_xpath('//topic[preceding-sibling::key[text()="English"]][1]//data')

Not sure about efficiency but I prefer it visually to have the path to
the selected node as basis and use criteria in for filtering.

If we want to be even more robust we could do

doc.xpath('//topic[preceding-sibling::key[last() and text()="English"]]//data')

This will avoid matching the topic in

<key>English</key>
...
<key>Foo</key>
<topic>...

or

<key>English</key>
<another>...</another>
<topic>

Kind regards

robert

PS: My favorite XPath help
http://www.w3schools.com/xpath/default.asp
http://www.zvon.org/xxl/XPathTutorial/General/examples.html

···

2011/11/24 Jesús Gabriel y Galán <jgabrielygalan@gmail.com>:

On Thu, Nov 24, 2011 at 12:30 PM, Ruby Mania <prateek123@gmail.com> wrote:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

PPS: You can append /text() to directly get the text:

//topic[preceding-sibling::key[last() and text()="English"]]//data/text()

e.g.

doc.xpath('//topic[preceding-sibling::key[last() and
text()="English"]]//data/text()').map {|x| x.text.strip}

···

On Thu, Nov 24, 2011 at 1:49 PM, Robert Klemme <shortcutter@googlemail.com> wrote:

2011/11/24 Jesús Gabriel y Galán <jgabrielygalan@gmail.com>:

On Thu, Nov 24, 2011 at 12:30 PM, Ruby Mania <prateek123@gmail.com> wrote:

Thanks a lot for help. But it matched CDEF and all nodes after that even
if key != english

I'm not sure why is this. I'm still trying to come up with a good
XPath that will return just that node,

Well, there could be many matches and from the original posting I
cannot see that only the first is needed.

but in the meantime you can do this:

doc.xpath("//key[. = 'English']/following-sibling::topic/data")[0]

Or better

doc.at_xpath("//key[. = 'English']/following-sibling::topic[1]/data")

I would probably do

doc.xpath('//topic[preceding-sibling::key[text()="English"]]//data')

or, for one hit only

doc.at_xpath('//topic[preceding-sibling::key[text()="English"]][1]//data')

Not sure about efficiency but I prefer it visually to have the path to
the selected node as basis and use criteria in for filtering.

If we want to be even more robust we could do

doc.xpath('//topic[preceding-sibling::key[last() and text()="English"]]//data')

This will avoid matching the topic in

<key>English</key>
...
<key>Foo</key>
<topic>...

or

<key>English</key>
<another>...</another>
<topic>

Kind regards

robert

PS: My favorite XPath help
http://www.w3schools.com/xpath/default.asp
XPath 教程

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Thanks a lot for help. But it matched CDEF and all nodes after that even
if key != english

I'm not sure why is this. I'm still trying to come up with a good
XPath that will return just that node,

Well, there could be many matches and from the original posting I
cannot see that only the first is needed.

What I don't understand is why that xpath returns nodes whose
preceding key sibling doesn't have 'English' as value.
I mean:

<topics>
  <topic>
    <key>English</key>
    <topic><data>CDEF</data></topic>
  </topic>
  <topic>
    <key>Spanish</key>
    <topic><data>ABC</data></topic>
  </topic>
</topics>

Why that xpath returns the ABC also. I would have thought that
following-sibling for <key>English</key> would only be the
<topic><data>CDEF</data></topic>, from which we are selecting the data
node.

but in the meantime you can do this:

doc.xpath("//key[. = 'English']/following-sibling::topic/data")[0]

Or better

doc.at_xpath("//key[. = 'English']/following-sibling::topic[1]/data")

I would probably do

doc.xpath('//topic[preceding-sibling::key[text()="English"]]//data')

or, for one hit only

doc.at_xpath('//topic[preceding-sibling::key[text()="English"]][1]//data')

Not sure about efficiency but I prefer it visually to have the path to
the selected node as basis and use criteria in for filtering.

I agree with you, and I would guess this is more efficient, since
nokogiri doesn't have to return as many nodes.

Jesus.

···

On Thu, Nov 24, 2011 at 1:49 PM, Robert Klemme <shortcutter@googlemail.com> wrote:

2011/11/24 Jesús Gabriel y Galán <jgabrielygalan@gmail.com>:

On Thu, Nov 24, 2011 at 12:30 PM, Ruby Mania <prateek123@gmail.com> wrote:

Thanks a lot for help. But it matched CDEF and all nodes after that even
if key != english

I'm not sure why is this. I'm still trying to come up with a good
XPath that will return just that node,

Well, there could be many matches and from the original posting I
cannot see that only the first is needed.

What I don't understand is why that xpath returns nodes whose
preceding key sibling doesn't have 'English' as value.

With the statement above I was referring to the case where there are
multiple pairs of key "English" and topic.

I mean:

<topics>
<topic>
<key>English</key>
<topic><data>CDEF</data></topic>
</topic>
<topic>
<key>Spanish</key>
<topic><data>ABC</data></topic>
</topic>
</topics>

Why that xpath returns the ABC also. I would have thought that

Which XPath expression are you referring to here with "that xpath"?
If you mean this

irb(main):020:0> doc =
Nokogiri.XML("<r><a><k/><b>1</b><b>2</b></a><a><k/><b>3</b></a></r>")
=> ...
irb(main):022:0> doc.xpath('//k/following-sibling::b').size
=> 3
irb(main):023:0> puts doc.xpath('//k/following-sibling::b')
<b>1</b>
<b>2</b>
<b>3</b>
=> nil

Then you get three matches but from different parents - even though
you cannot distinguish them immediately. If you want to only match
exactly one entry you need to add more criteria:

irb(main):024:0> doc.xpath('//k/following-sibling::b[1]').size
=> 2
irb(main):025:0> puts doc.xpath('//k/following-sibling::b[1]')
<b>1</b>
<b>3</b>
=> nil

following-sibling for <key>English</key> would only be the
<topic><data>CDEF</data></topic>, from which we are selecting the data
node.

Generally *-sibling refers to all siblings, i.e. sub nodes of the same node

irb(main):016:0> doc = Nokogiri.XML("<a><k/><b>1</b><b>2</b></a>")
=> #<Nokogiri::XML::Document:0x832daa4 name="document"
children=[#<Nokogiri::XML::Element:0x832d810 name="a"
children=[#<Nokogiri::XML::Element:0x832d68a name="k">,
#<Nokogiri::XML::Element:0x832d568 name="b"
children=[#<Nokogiri::XML::Text:0x832d450 "1">]>,
#<Nokogiri::XML::Element:0x831c02e name="b"
children=[#<Nokogiri::XML::Text:0x831bf02 "2">]>]>]>

irb(main):017:0> doc.xpath('//k/following-sibling::b').size
=> 2

irb(main):019:0> puts doc.xpath('//k/following-sibling::b')
<b>1</b>
<b>2</b>
=> nil

See also the XPath resources I mentioned earlier.

Kind regards

robert

···

2011/11/24 Jesús Gabriel y Galán <jgabrielygalan@gmail.com>:

On Thu, Nov 24, 2011 at 1:49 PM, Robert Klemme > <shortcutter@googlemail.com> wrote:

2011/11/24 Jesús Gabriel y Galán <jgabrielygalan@gmail.com>:

On Thu, Nov 24, 2011 at 12:30 PM, Ruby Mania <prateek123@gmail.com> wrote:

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Hi,

Now I see what was wrong with my reasoning. I was misunderstanding the
XML structure. Somehow, I thought that the only topic at the same
level as the key was the one we wanted to search. Looking more closely
I realized that key is at the same level as all other topic nodes in
the document.

Thanks,

Jesus.

···

On Thu, Nov 24, 2011 at 5:02 PM, Robert Klemme <shortcutter@googlemail.com> wrote:

Generally *-sibling refers to all siblings, i.e. sub nodes of the same node

irb(main):016:0> doc = Nokogiri.XML("<a><k/><b>1</b><b>2</b></a>")
=> #<Nokogiri::XML::Document:0x832daa4 name="document"
children=[#<Nokogiri::XML::Element:0x832d810 name="a"
children=[#<Nokogiri::XML::Element:0x832d68a name="k">,
#<Nokogiri::XML::Element:0x832d568 name="b"
children=[#<Nokogiri::XML::Text:0x832d450 "1">]>,
#<Nokogiri::XML::Element:0x831c02e name="b"
children=[#<Nokogiri::XML::Text:0x831bf02 "2">]>]>]>

irb(main):017:0> doc.xpath('//k/following-sibling::b').size
=> 2

irb(main):019:0> puts doc.xpath('//k/following-sibling::b')
<b>1</b>
<b>2</b>
=> nil