I have a long-running batch job that I would like to speed up.
Currently it uses only one CPU and the server I have in mind for this
has 16 cores, and I want to take advantage of them.
I'm thinking one of three possibilities:
1. JRuby, where the threads are native OS threads
2. A message queue (e.g. ActiveMQ + Stomp), where worker threads run
as separate processes, thus using all cores.
3. A MapReduce implementation (e.g. hadoop)
I would like to see if anyone has gone down this road and can weigh in
on these options.
-- Mark.
How difficult it your task? If you were able to reduce (heh) it to a MapReduce problem, you could use something like Skynet or Starfish. For even simpler forking, check out Ara Howard's threadify or my forkify for simple parallel processing.
- Lee
···
On Jun 29, 2009, at 2:00 PM, Mark Thomas wrote:
I have a long-running batch job that I would like to speed up.
Currently it uses only one CPU and the server I have in mind for this
has 16 cores, and I want to take advantage of them.
I'm thinking one of three possibilities:
1. JRuby, where the threads are native OS threads
2. A message queue (e.g. ActiveMQ + Stomp), where worker threads run
as separate processes, thus using all cores.
3. A MapReduce implementation (e.g. hadoop)
I would like to see if anyone has gone down this road and can weigh in
on these options.
Mark Thomas wrote:
...
2. A message queue (e.g. ActiveMQ + Stomp), where worker threads run
as separate processes, thus using all cores.
...
I would like to see if anyone has gone down this road and can weigh in
on these options.
I have gone down option 2, it works well.
Depending on your application, you may not need the sophistication of a
"real" queue manager. You could just create a Queue object (from
thread.rb), running in its own process, and share it using DRb. Multiple
reader processes can pop messages from the queue, and will block until a
message is available. Writers can push messages into the queue as
required. There is also SizedQueue which will block the writers if the
queue gets too full.
A "real" queue manager like RabbitMQ may make sense if you need your
subtasks to persist in the queue in the event of a system crash. But for
a simple worker-farm type of application, this usually isn't necessary.
Regards,
Brian.
···
--
Posted via http://www.ruby-forum.com/\.
> I would like to see if anyone has gone down this road and can weigh in
> on these options.
How difficult it your task? If you were able to reduce (heh) it to a
MapReduce problem, you could use something like Skynet or Starfish.
For even simpler forking, check out Ara Howard's threadify or my
forkify for simple parallel processing.
Yes, it fits a MapReduce problem but most MapReduce implementations I
came across seemed like overkill. I wasn't aware of Skynet or
Starfish--they look promising, thanks. The file interface of Starfish
may in fact be just what I'm looking for.
I'll check out threadify and forkify too.
Thanks again.
-- Mark.
Just throwing my 2 cents out here:
What if you just created a daemon controller that threaded each process
on a different core o_O?
Would speed things up greatly whilst keeping control over each process.
- Mac
···
--
Posted via http://www.ruby-forum.com/.
Yep, multiple processes are your friend - especially on Unix 
Ellie
Eleanor McHugh
Games With Brains
http://slides.games-with-brains.net
···
On 30 Jun 2009, at 00:17, Michael Linfield wrote:
Just throwing my 2 cents out here:
What if you just created a daemon controller that threaded each process
on a different core o_O?
Would speed things up greatly whilst keeping control over each process.
----
raise ArgumentError unless @reality.responds_to? :reason
That's the idea behind using a message queue -- it does that kind of
stuff for you. Workers are processes that will be distributed among
cores. The only thing I'm unsure about in a MQ architecture is the
collating of answers from all the worker threads, i.e. the Reduce part
of MapReduce.
···
On Jun 29, 7:17 pm, Michael Linfield <globyy3...@hotmail.com> wrote:
Just throwing my 2 cents out here:
What if you just created a daemon controller that threaded each process
on a different core o_O?
Would speed things up greatly whilst keeping control over each process.