Without knowing the whole problem it is difficult to say what the best solution is, but for the string you post above, I would clean it up and parse with something like Hpricot:
require 'rubygems'
require 'hpricot'
string = DATA.read #read in string
string.gsub!(/</,'<') #Convert lt and gt symbols to real <>
string.gsub!(/>/,'>')
string.gsub!(/"/,'"') #Put in quotes
doc = Hpricot(string) #Parse with Hpricot
fields = ['audience','creator'] #Create array of 'fields' to extract
fields.each do |f| #For each field...
el = doc.search("//div[@class='field field-type-text field-field-#{f}']") #...find appropriate divs
el.each do |e| # for each field div...
puts "<#{f}>" + e.at("//div[@class='field-item']").inner_html + "</#{f}>" #print data
end
end
__END__
<div class="field field-type-text field-field-audience">
<h3 class="field-label">audience</h3>
<div class="field-items">
<div class="field-item">Public</div>
</div>
</div>
<div class="field field-type-text field-field-creator">
<h3 class="field-label">creator</h3>
<div class="field-items">
<div class="field-item">Tom Jones</div>
</div>
</div>
Alex Gutteridge
Bioinformatics Center
Kyoto University
···
On 6 Aug 2007, at 10:12, Jan Ask wrote:
Alex & Sebastian,
Thanks for taking the time to reply. The string.gsub(/start.*end/m,
'some_value') did indeed help, but I am afraid my problem is a bit more
complicated.
I am basically trying to cleanup a long xml file. A typical part of the
string looks like this:
<div class="field field-type-text field-field-audience">
<h3 class="field-label">audience</h3>
<div class="field-items">
<div class="field-item">Public</div>
</div>
</div>
<div class="field field-type-text field-field-creator">
<h3 class="field-label">creator</h3>
<div class="field-items">
<div class="field-item">Tom Jones</div>
</div>
</div>
I am trying to format it like this:
<audience>Public</audience>
<creator>Tom Jones</creator>
So the problem is that the values in the xml change throughout the
string, so I cannot do a pattern match for them directly. Any ideas
would be hugely appreciated!
Jan
--
Posted via http://www.ruby-forum.com/\.