Yet another private method `gsub' called for nil:NilClass error

I've got a set of scripts that collect URLs from certain web pages and
I'm trying to extract some content from each of those pages (translation
stage). I keep seeing the error below.
Can someone help me understand what's happening here? I'm certainly not
expecting a fix, I just want to get some insights into the nature of
this issue. We've been seeing several similar issues on this project
we're working on. Thanks in advance.

···

-----------------------------------------------------
(projectx) Running translation stage
DEBUG [2010-12-28 12:57:01 EST] (PageContentExtractor#559021) Executing
plugin input_files=48 (0477475be2aa9f8b79013eaf8e410f8d, etc)
ERROR [2010-12-28 12:57:01 EST] (projectx) Unexepected fatal error
while processing page_1: private method `gsub' called for nil:NilClass
/usr/lib/ruby/1.8/uri/common.rb:289:in `escape'
~/sandbox/projectx/core_plugins/plugins/PageContentExtractor.rb:37:in
`execute'
~/sandbox/projectx/core_plugins/plugins/PageContentExtractor.rb:36:in
`each'
~/sandbox/projectx/core_plugins/plugins/PageContentExtractor.rb:36:in
`execute'
~/sandbox/projectx/core_plugins/plugins/BasePlugin.rb:191:in `call'
~/sandbox/projectx/lib/filesystem_lock_provider.rb:66:in `lock'
~/sandbox/projectx/core_plugins/plugins/BasePlugin.rb:190:in `call'
~/sandbox/projectx/lib/feed.rb:216:in `run'
~/sandbox/projectx/lib/feed.rb:212:in `each'
~/sandbox/projectx/lib/feed.rb:212:in `run'
~/sandbox/projectx/lib/feed.rb:207:in `each'
~/sandbox/projectx/lib/feed.rb:207:in `run'
bin/_run_feeds:77
bin/_run_feeds:74:in `each'
bin/_run_feeds:74
-----------------------------------------------------

Going through each step with rdebug, we can get a view of what is
happening when it trips up:

(rdb:1) step
projectx/core_plugins/plugins/PageContentExtractor.rb:37
host = @host_cache[filename] =
URI(URI.escape(@state[filename['link'])).host.downcase
(rdb:1) step
/usr/lib/ruby/1.8/uri/common.rb:285 unless unsafe.kind_of?(Regexp)
(rdb:1) step
/usr/lib/ruby/1.8/uri/common.rb:289 str.gsub(unsafe) do |us|
(rdb:1) step
projectx/core_plugins/plugins/BasePlugin.rb:197
@logger.context=prev_context
(rdb:1) step
projectx/core_plugins/plugins/BasePlugin.rb:198 @basedir = nil
(rdb:1) step
projectx/core_plugins/plugins/BasePlugin.rb:199 @lockdir = nil
(rdb:1) step
projectx/core_plugins/plugins/BasePlugin.rb:200 @state = nil
(rdb:1) step
projectx/core_plugins/plugins/BasePlugin.rb:201 @input_files = nil
(rdb:1) step
projectx/core_plugins/plugins/BasePlugin.rb:202 @permstate = nil
(rdb:1) step
projectx/core_plugins/plugins/BasePlugin.rb:203 @context_counters = nil
(rdb:1) step
projectx/core_plugins/plugins/BasePlugin.rb:204 @on_error = nil
(rdb:1) step
projectx/lib/feed.rb:221
msg = "Unexepected fatal error while running translation:
#{@name}: #{e.message}"

...code snippet from PageContentExtractor.rb:

# Load all files and caches them for further processing.
    input_filenames.each do |filename|
      host = @host_cache[filename] =
URI(URI.escape(@state[filename]['link'])).host.downcase # line 37
      (@document_cache[host] ||= {})[filename] =
Nokogiri::HTML(file_contents(filename))
    end

--
Posted via http://www.ruby-forum.com/.

Without looking too much into it, I would say that
@state[filename]['link'] is nil. You are passing that nil to
URI.escape, which raises an error. Can you print
@state[filename]['link'] before calling URI.escape?

Jesus.

···

On Wed, Dec 29, 2010 at 8:21 PM, Mr. Bill <mrbillhaxor@yahoo.com> wrote:

I've got a set of scripts that collect URLs from certain web pages and
I'm trying to extract some content from each of those pages (translation
stage). I keep seeing the error below.
Can someone help me understand what's happening here? I'm certainly not
expecting a fix, I just want to get some insights into the nature of
this issue. We've been seeing several similar issues on this project
we're working on. Thanks in advance.

-----------------------------------------------------
(projectx) Running translation stage
DEBUG [2010-12-28 12:57:01 EST] (PageContentExtractor#559021) Executing
plugin input_files=48 (0477475be2aa9f8b79013eaf8e410f8d, etc)
ERROR [2010-12-28 12:57:01 EST] (projectx) Unexepected fatal error
while processing page_1: private method `gsub' called for nil:NilClass
/usr/lib/ruby/1.8/uri/common.rb:289:in `escape'
~/sandbox/projectx/core_plugins/plugins/PageContentExtractor.rb:37:in
`execute'
~/sandbox/projectx/core_plugins/plugins/PageContentExtractor.rb:36:in
`each'
~/sandbox/projectx/core_plugins/plugins/PageContentExtractor.rb:36:in
`execute'
~/sandbox/projectx/core_plugins/plugins/BasePlugin.rb:191:in `call'
~/sandbox/projectx/lib/filesystem_lock_provider.rb:66:in `lock'
~/sandbox/projectx/core_plugins/plugins/BasePlugin.rb:190:in `call'
~/sandbox/projectx/lib/feed.rb:216:in `run'
~/sandbox/projectx/lib/feed.rb:212:in `each'
~/sandbox/projectx/lib/feed.rb:212:in `run'
~/sandbox/projectx/lib/feed.rb:207:in `each'
~/sandbox/projectx/lib/feed.rb:207:in `run'
bin/_run_feeds:77
bin/_run_feeds:74:in `each'
bin/_run_feeds:74
-----------------------------------------------------

Going through each step with rdebug, we can get a view of what is
happening when it trips up:

(rdb:1) step
projectx/core_plugins/plugins/PageContentExtractor.rb:37
host = @host_cache[filename] =
URI(URI.escape(@state[filename['link'])).host.downcase
(rdb:1) step
/usr/lib/ruby/1.8/uri/common.rb:285 unless unsafe.kind_of?(Regexp)
(rdb:1) step
/usr/lib/ruby/1.8/uri/common.rb:289 str.gsub(unsafe) do |us|
(rdb:1) step
projectx/core_plugins/plugins/BasePlugin.rb:197
@logger.context=prev_context
(rdb:1) step
projectx/core_plugins/plugins/BasePlugin.rb:198 @basedir = nil
(rdb:1) step
projectx/core_plugins/plugins/BasePlugin.rb:199 @lockdir = nil
(rdb:1) step
projectx/core_plugins/plugins/BasePlugin.rb:200 @state = nil
(rdb:1) step
projectx/core_plugins/plugins/BasePlugin.rb:201 @input_files = nil
(rdb:1) step
projectx/core_plugins/plugins/BasePlugin.rb:202 @permstate = nil
(rdb:1) step
projectx/core_plugins/plugins/BasePlugin.rb:203 @context_counters = nil
(rdb:1) step
projectx/core_plugins/plugins/BasePlugin.rb:204 @on_error = nil
(rdb:1) step
projectx/lib/feed.rb:221
msg = "Unexepected fatal error while running translation:
#{@name}: #{e.message}"

...code snippet from PageContentExtractor.rb:

# Load all files and caches them for further processing.
input_filenames.each do |filename|
host = @host_cache[filename] =
URI(URI.escape(@state[filename]['link'])).host.downcase # line 37
(@document_cache[host] ||= {})[filename] =
Nokogiri::HTML(file_contents(filename))
end

Update: we found a solution that involves simply not using the
PageContentExtractor but another ruby plugin.
Thank you for your time and attention to this.

···

--
Posted via http://www.ruby-forum.com/.