Hi,
I just started working my way around web scraping with Ruby and I just
run in to some problems with Time.parse.
I'm scraping a date from a website:
collect_last_bid =
last_bid_cell.scan(/\d{1,2}\.\d{1,2}\.\d{1,4}\b.\b\d{1,2}\:\d{1,2}:\d{1,2}/)
...which comes out nicely as "26.01.2011 21:00:08"
However, when I Time.parse the variable:
right_now = Time.parse(collect_last_bid)
...I get "Error in time.rb, Line 240 in 'parse'"
Parsing the string with Time.parse("26.01.2011 21:00:08") works fine.
Any thoughts?
···
--
Posted via http://www.ruby-forum.com/.
String#scan returns an array:
irb(main):001:0> s = "<td>26.01.2011 21:00:08</td>"
=> "<td>26.01.2011 21:00:08</td>"
irb(main):003:0> time =
s.scan(/\d{1,2}\.\d{1,2}\.\d{1,4}\b.\b\d{1,2}\:\d{1,2}:\d{1,2}/)
=> ["26.01.2011 21:00:08"]
irb(main):005:0> require 'time'
=> true
irb(main):006:0> Time.parse(time)
NoMethodError: private method `gsub!' called for ["26.01.2011 21:00:08"]:Array
from /usr/lib/ruby/1.8/date/format.rb:1061:in `_parse'
from /usr/lib/ruby/1.8/time.rb:240:in `parse'
from (irb):6
from :0
irb(main):007:0> Time.parse(time[0])
=> Wed Jan 26 21:00:08 +0100 2011
Jesus.
···
On Wed, Jan 26, 2011 at 4:56 PM, Jens Finnäs <jens.finnas@gmail.com> wrote:
Hi,
I just started working my way around web scraping with Ruby and I just
run in to some problems with Time.parse.
I'm scraping a date from a website:
collect_last_bid =
last_bid_cell.scan(/\d{1,2}\.\d{1,2}\.\d{1,4}\b.\b\d{1,2}\:\d{1,2}:\d{1,2}/)
...which comes out nicely as "26.01.2011 21:00:08"
However, when I Time.parse the variable:
right_now = Time.parse(collect_last_bid)
...I get "Error in time.rb, Line 240 in 'parse'"
Parsing the string with Time.parse("26.01.2011 21:00:08") works fine.
Any thoughts?
Cheers guys. Runs nicely now.
Great discussion board by the way.
···
--
Posted via http://www.ruby-forum.com/.
Robert_K1
(Robert K.)
4
When parsing the input manually then Time.local or Time.utc are better IMHO.
irb(main):011:0> s = "<td>26.01.2011 21:00:08</td>"
=> "<td>26.01.2011 21:00:08</td>"
irb(main):012:0> d,m,y,hr,mi,se = s.scan /\d+/
=> ["26", "01", "2011", "21", "00", "08"]
irb(main):013:0> Time.local y,m,d,hr,mi,se
=> 2011-01-26 21:00:08 +0100
irb(main):014:0> Time.utc y,m,d,hr,mi,se
=> 2011-01-26 21:00:08 UTC
Alternative: after require 'time' we can use Time.strptime:
irb(main):016:0> Time.strptime s, '<td>%d.%m.%Y %H:%M:%S</td>'
=> 2011-01-26 21:00:08 +0100
Side note: I'd rather use a proper HTML parser (e.g. Nokogiri) and
work only on the text inside <td>.
Kind regards
robert
···
2011/1/26 Jesús Gabriel y Galán <jgabrielygalan@gmail.com>:
On Wed, Jan 26, 2011 at 4:56 PM, Jens Finnäs <jens.finnas@gmail.com> wrote:
Hi,
I just started working my way around web scraping with Ruby and I just
run in to some problems with Time.parse.
I'm scraping a date from a website:
collect_last_bid =
last_bid_cell.scan(/\d{1,2}\.\d{1,2}\.\d{1,4}\b.\b\d{1,2}\:\d{1,2}:\d{1,2}/)
...which comes out nicely as "26.01.2011 21:00:08"
However, when I Time.parse the variable:
right_now = Time.parse(collect_last_bid)
...I get "Error in time.rb, Line 240 in 'parse'"
Parsing the string with Time.parse("26.01.2011 21:00:08") works fine.
Any thoughts?
String#scan returns an array:
irb(main):001:0> s = "<td>26.01.2011 21:00:08</td>"
=> "<td>26.01.2011 21:00:08</td>"
irb(main):003:0> time =
s.scan(/\d{1,2}\.\d{1,2}\.\d{1,4}\b.\b\d{1,2}\:\d{1,2}:\d{1,2}/)
=> ["26.01.2011 21:00:08"]
irb(main):005:0> require 'time'
=> true
irb(main):006:0> Time.parse(time)
NoMethodError: private method `gsub!' called for ["26.01.2011 21:00:08"]:Array
from /usr/lib/ruby/1.8/date/format.rb:1061:in `_parse'
from /usr/lib/ruby/1.8/time.rb:240:in `parse'
from (irb):6
from :0
irb(main):007:0> Time.parse(time[0])
=> Wed Jan 26 21:00:08 +0100 2011
--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/