Mechanize MySQL and threads - deadlock?

First of all: I'm still new to Ruby.

So pointing me to documentation or books is fine.

Use case:

Use mechanize to gather information. Because there are many pages I'd
like to run multiple threads each fetching pages. The fetched data
should be written to a MySQL database.

Can you point me to information telling me how to do this?

The failure looks like this now:

  /pr/tasks/get_data_ruby/tasks.rb:364:in `join': deadlock detected (fatal)
          from /pr/tasks/get_data_ruby/tasks.rb:364:in `block in run_tasks_wait'
          from /pr/tasks/get_data_ruby/tasks.rb:364:in `each'
          from /pr/tasks/get_data_ruby/tasks.rb:364:in `run_tasks_wait'
          from get-data.rb:37:in `<mai

What is causing such deadlocks at all?

Details about my implementation:

···

=================================
Ruby version: ruby 1.9.1p378 (2010-01-10 revision 26273) [x86_64-linux]
sequel-3.8.0
mysqlplus-0.1.1

Because things always go wrong I'd like store state in database to
resume work where the script failed.

To keep things simple I tried giving each thread it's own agent and DB
connection:

  def newDBConnection
    Sequel.connect(
      :adapter => 'mysql',
      :user => 'root',
      :host => 'localhost',
      :database => 'get_data',
      :password=>'XXX')
  end

  # share one agent and db connection per thread
  class MyThread < Thread
    def agent
      if !@agent
        @agent = Mechanize.new
        @agent.max_history =1
      end
      @agent
    end

    def db
      @dbCache ||= newDBConnection
    end
  end

next I defined a task which reuses the db and Mechanize agent from the
thread which is running the task:

class Task
  def run
    # override
    @thread = Thread.current
    task
  end

  def agent
    @agent ||= @thread.agent
  end

  def db
    @dbCache ||= @thread.db
  end
end

Next I wrote a simple function taking a list of tasks and a thread class
MyThread. it spawns parallel threads each getting a task from the task
list (Queue). They all may add more tasks to the queue.
The script should run until all tasks are done.

# t: class extending Thread
# tasks: type Queue.new
# parallel: num of threads used to run those tasks
def run_tasks_wait(t, tasks, parallel)
  working = 0
  threads = []
  # run 3 threads
  (1..parallel).each {|i|
    threads << t.new {
      firstTime = true
      while working > 0 || firstTime
        firstTime = false
        while task = tasks.pop
          working += 1
          $log.debug("starting task #{task.to_s}")
          $log.catchAndLog "caught exception in main worker thread" do
            task.run if !task.nil?
          end
          $log.debug("finished task #{task.to_s} threads-working: #{working}")
          working -= 1
        end
        # even if there is nothing left in queue keep thread running if there is one thread running
        # this thread may push additional tasks to the queue
        sleep 1
      end
    } }
    # wait for threads
    threads.each {|t| t.join() }
end

Thanks for any pointers
Marc Weber

# t: class extending Thread
# tasks: type Queue.new
# parallel: num of threads used to run those tasks
def run_tasks_wait(t, tasks, parallel)

Replacing the Queue by an Array seems to fix the issue.

Marc