Rexml to slow

Bu_Mihai · 29 March 2008 07:46

I have an xml file and sometimes i call the find_first_recursive method;
when my xml file is small its working fine but when i have ~900 lines im
waiting ~15 seconds to return me the wanted node and i want something
faster; How can i obtain a better time?

I would have tried libxml but i had some problems to install it under
windows.

···

--
Posted via http://www.ruby-forum.com/.

Mark_Ryall · 29 March 2008 08:04

have you tried hpricot?

···

On Sat, Mar 29, 2008 at 6:46 PM, Bu Mihai <mihai.bulhac@yahoo.com> wrote:

I have an xml file and sometimes i call the find_first_recursive method;
when my xml file is small its working fine but when i have ~900 lines im
waiting ~15 seconds to return me the wanted node and i want something
faster; How can i obtain a better time?

I would have tried libxml but i had some problems to install it under
windows.
--
Posted via http://www.ruby-forum.com/\.

Robert_K1 · 30 March 2008 11:00

What's the find criteria you use? Maybe you can use XPath. 900 lines does not really sound large so I suspect there might be an algorithmic or design error.

Kind regards

robert

···

On 29.03.2008 08:46, Bu Mihai wrote:

I have an xml file and sometimes i call the find_first_recursive method;
when my xml file is small its working fine but when i have ~900 lines im
waiting ~15 seconds to return me the wanted node and i want something
faster; How can i obtain a better time?

Bu_Mihai · 29 March 2008 08:08

Mark Ryall wrote:

have you tried hpricot?

not yet; its faster?

···

--
Posted via http://www.ruby-forum.com/\.

Bu_Mihai · 30 March 2008 18:23

Robert Klemme wrote:

···

On 29.03.2008 08:46, Bu Mihai wrote:

I have an xml file and sometimes i call the find_first_recursive method;
when my xml file is small its working fine but when i have ~900 lines im
waiting ~15 seconds to return me the wanted node and i want something
faster; How can i obtain a better time?

What's the find criteria you use? Maybe you can use XPath. 900 lines
does not really sound large so I suspect there might be an algorithmic
or design error.

Kind regards

robert

this is the criteria:

node=rexml_element.find_first_recursive {|node|
node.attributes["again"]=="yes"}
--
Posted via http://www.ruby-forum.com/\.

Robert_K1 · 30 March 2008 18:50

That's easy

doc.elements.each('//[@again="yes"]') do |node|
# any node that has attribute again with value yes
end

And I am pretty sure that this is faster than your approach. What does your program do? With more context we can come up with further suggestions.

Kind regards

robert

···

On 30.03.2008 20:23, Bu Mihai wrote:

Robert Klemme wrote:

On 29.03.2008 08:46, Bu Mihai wrote:

I have an xml file and sometimes i call the find_first_recursive method;
when my xml file is small its working fine but when i have ~900 lines im
waiting ~15 seconds to return me the wanted node and i want something
faster; How can i obtain a better time?

What's the find criteria you use? Maybe you can use XPath. 900 lines
does not really sound large so I suspect there might be an algorithmic
or design error.

this is the criteria:

node=rexml_element.find_first_recursive {|node| node.attributes["again"]=="yes"}

Bu_Mihai · 30 March 2008 20:21

Robert Klemme wrote:

this is the criteria:

node=rexml_element.find_first_recursive {|node|
node.attributes["again"]=="yes"}

That's easy

doc.elements.each('//[@again="yes"]') do |node|
# any node that has attribute again with value yes
end

And I am pretty sure that this is faster than your approach. What does
your program do? With more context we can come up with further
suggestions.

Kind regards

robert

I'm not sure if that will works, i have a xml file with this
structure(and it must be like this, the following example is a simple
sample of the original):
<root>
   <new_section>
      <pages>
           <page again="yes">page1</page>
           <page again="no">page2</page>
           <page againe=yes"">page3
                 <pages>
                    <page again="no">page4<page>
                    <page again="yes">
                       <pages>....and so on
                 </pages>
           </page>

      </pages>
   </new_section>
   <new_section>
</root>

I have a recursive function to find all 'page' nodes with attribute
'again' 'yes but i need to start the searc from the beging of the file
or from the current node and the display all subnodes with 'yes'; after
the all nodes was founded then i need to search them again from the
begining of the file; its something like this:

def find(xml_file)
   node=xml_file.find_first_recursive {|node|
node.attributes["again"]=="yes"}
   if not(node==nil)
          then
             puts node.text
             find(xml_file.elements[node])
          else
             find(xml_file.elements["//"])
    end
end

In this example the find function is an endless loop, somewhere i must
put a return, but i need something like that and when my file is big
(~900) i wait ~10 seconds for the command (but not always - only when
i'm starting to search from the beging of the file):
node=xml_file.find_first_recursive {|node|
node.attributes["again"]=="yes"}

Many thanks for your help Robert.

···

On 30.03.2008 20:23, Bu Mihai wrote:

--
Posted via http://www.ruby-forum.com/\.

Robert_K1 · 30 March 2008 21:53

Robert Klemme wrote:

this is the criteria:

node=rexml_element.find_first_recursive {|node| node.attributes["again"]=="yes"}

That's easy

doc.elements.each('//[@again="yes"]') do |node|
   # any node that has attribute again with value yes
end

And I am pretty sure that this is faster than your approach. What does
your program do? With more context we can come up with further suggestions.

I'm not sure if that will works, i have a xml file with this structure(and it must be like this, the following example is a simple sample of the original):
<root>
   <new_section>
      <pages>
           <page again="yes">page1</page>
           <page again="no">page2</page>
           <page againe=yes"">page3
                 <pages>
                    <page again="no">page4<page>
                    <page again="yes">
                       <pages>....and so on
                 </pages>
           </page>

      </pages>
   </new_section>
   <new_section>
</root>

I have a recursive function to find all 'page' nodes with attribute 'again' 'yes but i need to start the searc from the beging of the file or from the current node and the display all subnodes with 'yes';

You can use the XPath from the root and I believe also from a particulara node.

after the all nodes was founded then i need to search them again from the begining of the file;

When I asked what your program does, I really meant: Can you explain in non technical words what this program is supposed to do? Since you seem to traverse over the same nodes over and over again I have the strong feeling that there is a better alternative - but for that we need to know the purpose of the program.

Many thanks for your help Robert.

You're welcome.

Kind regards

robert

···

On 30.03.2008 22:21, Bu Mihai wrote:

On 30.03.2008 20:23, Bu Mihai wrote:

Bu_Mihai · 31 March 2008 08:40

Im trying to build a map and to memorize all routes. I have a root node
wich will generate some roads and each road will generate another roads
and i have to go on all roads until there is no road unchecked.
If im on a road and that road generates new roads then to go an all
generated road i must begin my route from the begining not from the road
who generates his child roads.

I have the root node who generate two roads: road1 and road2 and i must
verify this roads and check if each road will generate new roads; if yes
then i must set "again=yes" because that road has "childs" who must be
checked. So for example road1 generate road3 but to get to road3 i must
go to root->road1->road3 and so on... (if road3 generates 3 another
roads to go on one road i must have root->road1->road3->road3_1 or
road3_2 or road3_3)

Also i must have a attribute duplicate_road; for example if road2
generates also road3 then i will compare all checked roads till that
moment and if it is found then that means it is a duplicate road so i
mustnt check if again (again=no)

And so i can generate in xml a map with roads (for the moment i dont
care which path is shorter only to find a path from the root to the
road_x based on the xml map).

Tnx.

···

--
Posted via http://www.ruby-forum.com/.

Robert_K1 · 31 March 2008 09:21

Ok, a pretty straightforward graph problem. It is a bad idea to do
that on the raw XML data. You should create a representation of the
road data that suits your algorithm better. Then read the whole XML
only once, create that representation and implement your algorithm on
your internal representation. Doing it on the XML is certainly the
worst option.

Kind regards

robert

···

2008/3/31, Bu Mihai <mihai.bulhac@yahoo.com>:

Im trying to build a map and to memorize all routes. I have a root node
wich will generate some roads and each road will generate another roads
and i have to go on all roads until there is no road unchecked.
If im on a road and that road generates new roads then to go an all
generated road i must begin my route from the begining not from the road
who generates his child roads.

--
use.inject do |as, often| as.you_can - without end

Bu_Mihai · 31 March 2008 11:36

and what do you recomand?

···

--
Posted via http://www.ruby-forum.com/.

Robert_K1 · 31 March 2008 12:42

? I gave my recommendations already. You sure do not expect me to
code that up for you, do you?

Kind regards

robert

···

2008/3/31, Bu Mihai <mihai.bulhac@yahoo.com>:

and what do you recomand?

--
use.inject do |as, often| as.you_can - without end

Bu_Mihai · 31 March 2008 13:34

Robert Klemme wrote:

···

2008/3/31, Bu Mihai <mihai.bulhac@yahoo.com>:

and what do you recomand?

? I gave my recommendations already. You sure do not expect me to
code that up for you, do you?

Kind regards

robert

No of course not, i meant what algorithm would u recomand and in what
would be better to implement it (any ruby gem?)...

Thanks, i'll do a search to find out.
--
Posted via http://www.ruby-forum.com/\.

Marc_Heiler · 31 March 2008 13:53

I believe the core problem is that XML itself is pretty sub-optimal for
almost everything

Is anyone updating the REXML website by the way? I believe it would be
interesting to see exactly these kind of speed issues handled on the
website because if i am not mistaken, these questions and problems
continually pop-up with *XML

···

--
Posted via http://www.ruby-forum.com/.

Robert_K1 · 31 March 2008 13:59

Ah, ok misunderstood you. Backtracking comes to mind. Before you
change the algorithm you could start by creating few classes based on
the info you have in the XML file and use those. I would have to
think longer about this to come up with more profound suggestions.

Cheers

robert

···

2008/3/31, Bu Mihai <mihai.bulhac@yahoo.com>:

Robert Klemme wrote:
> 2008/3/31, Bu Mihai <mihai.bulhac@yahoo.com>:
>> and what do you recomand?
>
> ? I gave my recommendations already. You sure do not expect me to
> code that up for you, do you?

No of course not, i meant what algorithm would u recomand and in what
would be better to implement it (any ruby gem?)...

--
use.inject do |as, often| as.you_can - without end

Robert_K1 · 31 March 2008 14:01

I believe the core problem is that XML itself is pretty sub-optimal for
almost everything

As always, there are problems where this tool (XML) is suited good,
less good and not at all.

Is anyone updating the REXML website by the way? I believe it would be
interesting to see exactly these kind of speed issues handled on the
website because if i am not mistaken, these questions and problems
continually pop-up with *XML

Not sure whether I agree: IMHO in this case the problem is a
misapplication of XML. XML is good for persisting structured data but
not as an in memory model for calculations.

Kind regards

robert

···

2008/3/31, Marc Heiler <shevegen@linuxmail.org>:

--
use.inject do |as, often| as.you_can - without end

Bu_Mihai · 31 March 2008 16:36

Robert Klemme wrote:

I believe the core problem is that XML itself is pretty sub-optimal for
almost everything

As always, there are problems where this tool (XML) is suited good,
less good and not at all.

Is anyone updating the REXML website by the way? I believe it would be
interesting to see exactly these kind of speed issues handled on the
website because if i am not mistaken, these questions and problems
continually pop-up with *XML

Not sure whether I agree: IMHO in this case the problem is a
misapplication of XML. XML is good for persisting structured data but
not as an in memory model for calculations.

Kind regards

robert

What is IMHO?; i'm a newbie in ruby and at first i was searching
something similary with C++ tree structure (beacause that i would use if
is was C) but i want in ruby and i was searching some gem to help me
because i have to learn more about ruby language to build my own ruby
class to work with.
I've tried xml beacuse it was the best (!?) i found in ruby for
implementing a tree structure (not only a binary tree), but it is slowly
when i want to read a big structure.

···

2008/3/31, Marc Heiler <shevegen@linuxmail.org>:

--
Posted via http://www.ruby-forum.com/\.

Robert_K1 · 1 April 2008 06:48

Robert Klemme wrote:
>> I believe the core problem is that XML itself is pretty sub-optimal for
>> almost everything
>
> As always, there are problems where this tool (XML) is suited good,
> less good and not at all.
>
>> Is anyone updating the REXML website by the way? I believe it would be
>> interesting to see exactly these kind of speed issues handled on the
>> website because if i am not mistaken, these questions and problems
>> continually pop-up with *XML
>
> Not sure whether I agree: IMHO in this case the problem is a
> misapplication of XML. XML is good for persisting structured data but
> not as an in memory model for calculations.

What is IMHO?

http://www.google.com/search?q=imho

; i'm a newbie in ruby and at first i was searching
something similary with C++ tree structure (beacause that i would use if
is was C) but i want in ruby and i was searching some gem to help me
because i have to learn more about ruby language to build my own ruby
class to work with.

I believe it works better the other way round: understand a concept
(such as "tree", which is not too difficult) and implement it in Ruby.

http://raa.ruby-lang.org/search.rhtml?search=tree

Apart from that, it's easy to roll your own:

TreeNode = Struct.new :data, :parent, :children do
  def initialize(data = nil, parent = nil)
    self.data = data
    self.parent = parent
    self.children =
  end
end

I've tried xml beacuse it was the best (!?) i found in ruby for
implementing a tree structure (not only a binary tree), but it is slowly
when i want to read a big structure.

900 lines XML is far from a "big structure". And XML is format for
/persistently/ storing structured data - mostly in files. An XML DOM
is nothing you want to do complex non XML operations on. because of
the overhead.

Kind regards

robert

···

2008/3/31, Bu Mihai <mihai.bulhac@yahoo.com>:

> 2008/3/31, Marc Heiler <shevegen@linuxmail.org>:

--
use.inject do |as, often| as.you_can - without end

Bu_Mihai · 1 April 2008 06:53

Thanks a lot for helping me, ive also find this
http://rubytree.rubyforge.org/
which i think it is what i want from the begining; i can build my map
with rubytree and the save it in a xml file. I will check your links
too.

Thanks again Robert.

···

--
Posted via http://www.ruby-forum.com/.

Topic		Replies	Views
REXML ... performance & memory usage ruby-talk	13	114	9 November 2006
Humanized Xml tree navigation ruby-talk	20	146	8 August 2006
REXML Speed Question ruby-talk	3	108	8 April 2011
Can you search in REXML by attributes? ruby-talk	18	222	20 April 2011
REXML in C ruby-talk	19	134	3 July 2002

Rexml to slow

Related topics