REXML Speed Question

Hello, I have been using REXML to extract information from an XML file
and I am having an issue with the amount of time it is taking. If I
point directly to what I want it is pretty fast. The issue arises when
I have to grab a reference id, then research for that id to get another
id, until I finally get to the piece of information I want. This is
what a snippet my code currently looks like:

···

---------------------------------------------
result = []
wall_refs1 = XPath.match( $doc,
"doc:iso_10303_28/uos/IfcWallStandardCase//*[@pos='1']" )

wall_refs1 = grab_id(wall_refs1,'ref')
#grab_id simply puts the ref's id and puts them into an array
#output from this would be [["i1741"]]

wall_ref2 = []
wall_refs1.each do |ref|
  x =
REXML::XPath.first($doc,"//*[@id='#{ref}']//IfcExtrudedAreaSolid").attribute("ref").value
  wall_ref2 << x
end
#Output [["i1738"]]

wall_depth = []
wall_ref2.each do |ref|
  x = REXML::XPath.match($doc,"//*[@id='#{ref}']//Depth").map {|element|
element.text}
  wall_depth << x
end
#Output [["120."]]

wall_depth_final = wall_depth.map do |arr|
  arr.map do |arr2|
    #this is simply converting to float and rounding to 2 decimles
    arr2.to_f.round_to(2)
  end
end

wall_depth_final
#Output [["120.00"]
-----------------------------------------

The problem with doing this is that it takes substantial time for the
computer to run this, doing this for say 200 elements can take 25
minutes (I would be guessing the reason it takes so long to run is
because as some of the xml files are 10,000+ lines and I image it takes
a while to comb through that). I have to start from the first location
and work my way to the final one, and simply cannot run a search to grab
//depth unfortunately.

Is there a quicker way of accomplishing the same thing, or is time
always going to be a burden?

Thank you for your time.

This would be the xml I am reading:

<IfcWallStandardCase id="i1677">
  <Representation>
    <IfcProductDefinitionShape id="i1747">
      <Representations id="i1750" exp:cType="list">
        <IfcShapeRepresentation exp:pos="0" xsi:nil="true" ref="i1708"/>
        <IfcShapeRepresentation exp:pos="1" xsi:nil="true" ref="i1741"/>
      </Representations>
   </IfcProductDefinitionShape>
  </Representation>
</IfcWallStandardCase>
<IfcShapeRepresentation id="i1741">
  <Items id="i1746" exp:cType="set">
    <IfcExtrudedAreaSolid exp:pos="0" xsi:nil="true" ref="i1738"/>
  </Items>
</IfcShapeRepresentation>
<IfcExtrudedAreaSolid id="i1738">
  <Depth>120.</Depth>
</IfcExtrudedAreaSolid>

--
Posted via http://www.ruby-forum.com/.

Switch to nokogiri and you'll be much much happier.

···

On Apr 7, 2011, at 21:00 , Kyle X. wrote:

Hello, I have been using REXML to extract information from an XML file
and I am having an issue with the amount of time it is taking. If I
point directly to what I want it is pretty fast. The issue arises when
I have to grab a reference id, then research for that id to get another
id, until I finally get to the piece of information I want. This is
what a snippet my code currently looks like:

For larger XML documents SAX parsing can really improve performance (specifically because SAX parsing doesn't create an entire DOM structure, it only extracts the bits you are interested in). Programming with a SAX parser is very different though :slight_smile:

You can also switch to another library for handling your XML, the most popular library (at least to my knowledge) is Nokogiri (http://nokogiri.org/\) and it is a great deal faster than REXML

···

On 8-4-2011 6:00, Kyle X. wrote:

Hello, I have been using REXML to extract information from an XML file
and I am having an issue with the amount of time it is taking. If I
point directly to what I want it is pretty fast. The issue arises when
I have to grab a reference id, then research for that id to get another
id, until I finally get to the piece of information I want. This is
what a snippet my code currently looks like:

---------------------------------------------
result =
wall_refs1 = XPath.match( $doc,
"doc:iso_10303_28/uos/IfcWallStandardCase//*[@pos='1']" )

wall_refs1 = grab_id(wall_refs1,'ref')
#grab_id simply puts the ref's id and puts them into an array
#output from this would be [["i1741"]]

wall_ref2 =
wall_refs1.each do |ref|
   x =
REXML::XPath.first($doc,"//*[@id='#{ref}']//IfcExtrudedAreaSolid").attribute("ref").value
   wall_ref2<< x
end
#Output [["i1738"]]

wall_depth =
wall_ref2.each do |ref|
   x = REXML::XPath.match($doc,"//*[@id='#{ref}']//Depth").map {|element|
element.text}
   wall_depth<< x
end
#Output [["120."]]

wall_depth_final = wall_depth.map do |arr|
   arr.map do |arr2|
     #this is simply converting to float and rounding to 2 decimles
     arr2.to_f.round_to(2)
   end
end

wall_depth_final
#Output [["120.00"]
-----------------------------------------

The problem with doing this is that it takes substantial time for the
computer to run this, doing this for say 200 elements can take 25
minutes (I would be guessing the reason it takes so long to run is
because as some of the xml files are 10,000+ lines and I image it takes
a while to comb through that). I have to start from the first location
and work my way to the final one, and simply cannot run a search to grab
//depth unfortunately.

Is there a quicker way of accomplishing the same thing, or is time
always going to be a burden?

Thank you for your time.

This would be the xml I am reading:

<IfcWallStandardCase id="i1677">
   <Representation>
     <IfcProductDefinitionShape id="i1747">
       <Representations id="i1750" exp:cType="list">
         <IfcShapeRepresentation exp:pos="0" xsi:nil="true" ref="i1708"/>
         <IfcShapeRepresentation exp:pos="1" xsi:nil="true" ref="i1741"/>
       </Representations>
    </IfcProductDefinitionShape>
   </Representation>
</IfcWallStandardCase>
<IfcShapeRepresentation id="i1741">
   <Items id="i1746" exp:cType="set">
     <IfcExtrudedAreaSolid exp:pos="0" xsi:nil="true" ref="i1738"/>
   </Items>
</IfcShapeRepresentation>
<IfcExtrudedAreaSolid id="i1738">
   <Depth>120.</Depth>
</IfcExtrudedAreaSolid>

Thanks for the info. I am going to try Nokogiri, if I can only figure
out how to get it to work in SketchUp.... There is a surprisingly a
dearth of information on the topic, after a few hours of trying to find
out online.... Any chance anyone know how?

···

--
Posted via http://www.ruby-forum.com/.