I've got a set of scripts that collect URLs from certain web pages and
I'm trying to extract some content from each of those pages (translation
stage). I keep seeing the error below.
Can someone help me understand what's happening here? I'm certainly not
expecting a fix, I just want to get some insights into the nature of
this issue. We've been seeing several similar issues on this project
we're working on. Thanks in advance.
# Load all files and caches them for further processing.
input_filenames.each do |filename|
host = @host_cache[filename] =
URI(URI.escape(@state[filename]['link'])).host.downcase # line 37
(@document_cache[host] ||= {})[filename] =
Nokogiri::HTML(file_contents(filename))
end
Without looking too much into it, I would say that @state[filename]['link'] is nil. You are passing that nil to
URI.escape, which raises an error. Can you print @state[filename]['link'] before calling URI.escape?
Jesus.
···
On Wed, Dec 29, 2010 at 8:21 PM, Mr. Bill <mrbillhaxor@yahoo.com> wrote:
I've got a set of scripts that collect URLs from certain web pages and
I'm trying to extract some content from each of those pages (translation
stage). I keep seeing the error below.
Can someone help me understand what's happening here? I'm certainly not
expecting a fix, I just want to get some insights into the nature of
this issue. We've been seeing several similar issues on this project
we're working on. Thanks in advance.
# Load all files and caches them for further processing.
input_filenames.each do |filename|
host = @host_cache[filename] =
URI(URI.escape(@state[filename]['link'])).host.downcase # line 37
(@document_cache[host] ||= {})[filename] =
Nokogiri::HTML(file_contents(filename))
end
Update: we found a solution that involves simply not using the
PageContentExtractor but another ruby plugin.
Thank you for your time and attention to this.