For the life of me, i can't figure out a ruby equivalent to perl's /g
basically, i want to do the following
while htmlSource=~m/<table>(.*?)<\table>/g do
tableSource=$1
tableSource=~m/Index (\d+)/
indexNumber=$1
while tableSource=~m/<tr>(.*?)<\/tr>/g do
tableRowSource=$1
doSomethingWith(tableRowSource, indexNumber)
end#while tableSource
end#while htmlSource
I will actually need to pull multiple vars, not just a single one,
from the regex
I will need to do the outer loop an unknown amount of times per
document (0-20) and i will need to loop the inner an unknown amount of
times (0-29)
htmlSource.gsub(/<table>(.*?)<\/table>/) do |t|
tableRowSource = $1
tableRowSource.gsub(/<tr>(.*?)<\/tr>/) do |r|
doSomethingWith $1
end
end
···
On Tue, Nov 18, 2008 at 4:06 PM, knohr <just_a_techie200x@yahoo.com> wrote:
For the life of me, i can't figure out a ruby equivalent to perl's /g
basically, i want to do the following
while htmlSource=~m/<table>(.*?)<\table>/g do
tableSource=$1
tableSource=~m/Index (\d+)/
indexNumber=$1
while tableSource=~m/<tr>(.*?)<\/tr>/g do
tableRowSource=$1
doSomethingWith(tableRowSource, indexNumber)
end#while tableSource
end#while htmlSource
I will actually need to pull multiple vars, not just a single one,
from the regex
I will need to do the outer loop an unknown amount of times per
document (0-20) and i will need to loop the inner an unknown amount of
times (0-29)
While I can't answer your original question, I could possibly help you with the scraping if you are willing to reveal the page you are trying to scrape and the data bits on it which should be scraped.
Cheers,
Peter
···
On 2008.11.19., at 1:06, knohr wrote:
For the life of me, i can't figure out a ruby equivalent to perl's /g
basically, i want to do the following
while htmlSource=~m/<table>(.*?)<\table>/g do
tableSource=$1
tableSource=~m/Index (\d+)/
indexNumber=$1
while tableSource=~m/<tr>(.*?)<\/tr>/g do
tableRowSource=$1
doSomethingWith(tableRowSource, indexNumber)
end#while tableSource
end#while htmlSource
I will actually need to pull multiple vars, not just a single one,
from the regex
I will need to do the outer loop an unknown amount of times per
document (0-20) and i will need to loop the inner an unknown amount of
times (0-29)
require 'nokogiri'
doc = Nokogiri::HTML(htmlSource)
doc.search('//tr').each do |row|
index = row.xpath('ancestor::table/*[contains("Index",.)]')
doSomethingWith(row.text,index[/(\d)/])
end
The location of the element containing the index may have to be
modified.
-- Mark.
···
On Nov 18, 7:08 pm, knohr <just_a_techie2...@yahoo.com> wrote:
For the life of me, i can't figure out a ruby equivalent to perl's /g
basically, i want to do the following
while htmlSource=~m/<table>(.*?)<\table>/g do
tableSource=$1
tableSource=~m/Index (\d+)/
indexNumber=$1
while tableSource=~m/<tr>(.*?)<\/tr>/g do
tableRowSource=$1
doSomethingWith(tableRowSource, indexNumber)
end#while tableSource
end#while htmlSource
I will actually need to pull multiple vars, not just a single one,
from the regex
I will need to do the outer loop an unknown amount of times per
document (0-20) and i will need to loop the inner an unknown amount of
times (0-29)
class Regexp
def global_match(str, &proc)
retval = nil
loop do
res = str.sub(self) do |m|
proc.call($~) # pass MatchData obj
''
end
break retval if res == str
str = res
retval ||= true
end
end
end
re = /.../
re.global_match(...) do |m|
...
end
···
On Tue, Nov 18, 2008 at 9:06 PM, knohr <just_a_techie200x@yahoo.com> wrote:
For the life of me, i can't figure out a ruby equivalent to perl's /g
basically, i want to do the following
while htmlSource=~m/<table>(.*?)<\table>/g do
tableSource=$1
tableSource=~m/Index (\d+)/
indexNumber=$1
while tableSource=~m/<tr>(.*?)<\/tr>/g do
tableRowSource=$1
doSomethingWith(tableRowSource, indexNumber)
end#while tableSource
end#while htmlSource
I will actually need to pull multiple vars, not just a single one,
from the regex
I will need to do the outer loop an unknown amount of times per
document (0-20) and i will need to loop the inner an unknown amount of
times (0-29)
That is pretty much how, except globals are hardly thread safe I think. Use scan instead of gsub:
Here's something I wrote to extract information from data structured like this:
- tablename
+ field1
+ field2:string
- table2name
+field1 : string
+field2
Table = Struct.new(:name, :fields)
Field = Struct.new(:name, :type)
def extract_db_spec(file)
tables =
doc = open(file, File::RDONLY) {|f|f.read}
table_name = /\- (\w*)\s*?\n/
field_name = /(\s+\+ (\w+)\s*(\:\s*(\w*))?\n)/
doc.scan /#{table_name}(#{field_name}+)/ do |tablename, fields|
t = Table.new tablename,
fields.scan field_name do |junk, fieldname, junk2, type|
if type.nil? || type == ""
if /\w+_id/ === fieldname
type = "int"
else
type = "string"
end
end
t.fields << Field.new(fieldname, type)
end
tables << t
end
tables
end
einarmagnus
···
On 19.11.2008, at 00:37 , Alan Johnson wrote:
On Tue, Nov 18, 2008 at 4:06 PM, knohr <just_a_techie200x@yahoo.com> > wrote:
For the life of me, i can't figure out a ruby equivalent to perl's /g
basically, i want to do the following
while htmlSource=~m/<table>(.*?)<\table>/g do
tableSource=$1
tableSource=~m/Index (\d+)/
indexNumber=$1
while tableSource=~m/<tr>(.*?)<\/tr>/g do
tableRowSource=$1
doSomethingWith(tableRowSource, indexNumber)
end#while tableSource
end#while htmlSource
I will actually need to pull multiple vars, not just a single one,
from the regex
I will need to do the outer loop an unknown amount of times per
document (0-20) and i will need to loop the inner an unknown amount of
times (0-29)
Thread safe would be a plus.
any suggestions?
I think this does what you want, although I don't think gsub was really made
for this purpose.