I apologize if this has been answered somewhere obvious. I did a fair bit of Googling prior to piping up. If this has been addressed, please feel free to simply point and grunt!
In a nutshell, I have concerns surrounding the Ruby threads implementation as it appears to be a home-grown system. How does would performance stack up against FreeBSD 5.x KSD threads? I've run into problems with the Python dummy_thread module before with respect to performance, especially when dealing with intensive IO. I have a bit of a fear that I'll have the same issue here.
I've been tasked with writing a dynamic HTTPS/HTTP gateway of sorts, and as such, it ought to get quite busy in the network IO department. I'd like to take advantage of thread pooling and whatnot.
I'd really love to use Ruby as I'm quickly falling in love with it - what a nice language.
I wrote a data collection system that gathers statistics from over
2000 servers every five minutes 24/7. This has both high network
usage patterns as well as high disk usage patterns.
I also had a 4 processor box (a Sun E4500). Ruby threads cannot
take advantage of a multiprocessor server.
In order to have each collection cycle finish within the 5 minute
window alloted, all 4 processors must be fully utilized.
The architecture that I settled on was similar to a threaded worker
pool, except instead of threads I used processes with the main
process acting as the scheduler and the child processes reading
work tasks from a distributed queue (Ruby makes this easy with
Rinda). This allows scaling both by adding more processors
(and processes) to the server OR because of Rinda (and dRB)
simply by adding more collection servers.
So far, this solution has been running over a year and a half
on the single 4 processor server with no unscheduled down time.
My take is don't use the Ruby threading model for
network intensive tasks. Rather think about using dRB
and process level parallelism. You might be suprised
at how well it works and how scalable this makes your
system.
Rick
···
On Fri, Sep 30, 2005 at 12:20:07PM +0900, Jeff McNeil wrote:
Greetings.
I apologize if this has been answered somewhere obvious. I did a fair
bit of Googling prior to piping up. If this has been addressed,
please feel free to simply point and grunt!
In a nutshell, I have concerns surrounding the Ruby threads
implementation as it appears to be a home-grown system. How does
would performance stack up against FreeBSD 5.x KSD threads? I've run
into problems with the Python dummy_thread module before with respect
to performance, especially when dealing with intensive IO. I have a
bit of a fear that I'll have the same issue here.
I've been tasked with writing a dynamic HTTPS/HTTP gateway of sorts,
and as such, it ought to get quite busy in the network IO
department. I'd like to take advantage of thread pooling and whatnot.
I'd really love to use Ruby as I'm quickly falling in love with it -
what a nice language.
In a nutshell, I have concerns surrounding the Ruby threads
implementation as it appears to be a home-grown system. How does
would performance stack up against FreeBSD 5.x KSD threads? I've run
into problems with the Python dummy_thread module before with respect
to performance, especially when dealing with intensive IO. I have a
bit of a fear that I'll have the same issue here.
Interesting. I'll do a bit of comparison work between that approach and native Python "threading" support. I'd assume the dRB approach to be slower. Just out of curiosity, has any work been done towards improving the threading system? The current system is great for systems that might not fully support threading out of the box, but I'd really like to see support for the POSIX threads at the system level.
-Jeff
···
On Sep 30, 2005, at 12:38 AM, Rick Nooner wrote:
On Fri, Sep 30, 2005 at 12:20:07PM +0900, Jeff McNeil wrote:
Greetings.
I apologize if this has been answered somewhere obvious. I did a fair
bit of Googling prior to piping up. If this has been addressed,
please feel free to simply point and grunt!
In a nutshell, I have concerns surrounding the Ruby threads
implementation as it appears to be a home-grown system. How does
would performance stack up against FreeBSD 5.x KSD threads? I've run
into problems with the Python dummy_thread module before with respect
to performance, especially when dealing with intensive IO. I have a
bit of a fear that I'll have the same issue here.
I've been tasked with writing a dynamic HTTPS/HTTP gateway of sorts,
and as such, it ought to get quite busy in the network IO
department. I'd like to take advantage of thread pooling and whatnot.
I'd really love to use Ruby as I'm quickly falling in love with it -
what a nice language.
Thoughts?
Jeff
I wrote a data collection system that gathers statistics from over
2000 servers every five minutes 24/7. This has both high network
usage patterns as well as high disk usage patterns.
I also had a 4 processor box (a Sun E4500). Ruby threads cannot
take advantage of a multiprocessor server.
In order to have each collection cycle finish within the 5 minute
window alloted, all 4 processors must be fully utilized.
The architecture that I settled on was similar to a threaded worker
pool, except instead of threads I used processes with the main
process acting as the scheduler and the child processes reading
work tasks from a distributed queue (Ruby makes this easy with
Rinda). This allows scaling both by adding more processors
(and processes) to the server OR because of Rinda (and dRB)
simply by adding more collection servers.
So far, this solution has been running over a year and a half
on the single 4 processor server with no unscheduled down time.
My take is don't use the Ruby threading model for
network intensive tasks. Rather think about using dRB
and process level parallelism. You might be suprised
at how well it works and how scalable this makes your
system.
I've had similar experiences. As far as I remember from when we had a
peek under the hood to check the thread implementation, its based on a
select() model. Ruby threads are not really threads in the sense of
scheduled time-sliced processes.
In my case, we were writting something similar (HTTP decoding engine)
and handled the network io with ruby threads (because that what selects
are really good for), but then handled decoding in a forked child
(because its all CPU, it would have blocked the other ruby 'threads').
I hope we get native threads at some point. It would be nice for
things like you're talking about. I also hope that the threading
won't be like Python using a single global lock.
I've done quite a bit of work with Python and Python threading.
While it should be better at using a multi-processor
system and I/O of all forms than Ruby, its global interpreter
lock really reduces performance when running threaded code. You
either have to use a process methodolgy like I described earlier
or write the threaded code in C/C++.
I have written several large systems in Python over the past 10
years and have written using Python threads, process level parallelism
and threaded C/C++ libraries using Python for the higher level
logic. The C/C++ libraries proved to be by far the fastest. The
same ideas could be used with Ruby.
Cavet, I haven't really used Python for about 2 years so I don't
know what the current state of the art is there.
A good point about Ruby threads is that they are rock solid
and can be used in much greater numbers than native threads
in many circumstances.
Rick
···
On Fri, Sep 30, 2005 at 02:00:57PM +0900, Jeff McNeil wrote:
Interesting. I'll do a bit of comparison work between that approach
and native Python "threading" support. I'd assume the dRB approach to
be slower. Just out of curiosity, has any work been done towards
improving the threading system? The current system is great for
systems that might not fully support threading out of the box, but
I'd really like to see support for the POSIX threads at the system
level.
Yeah, a lot like pre-KSE FreeBSD threads, implemented via poll(). I'm not an expert in the area, but I believe a lot of user space threading libraries are implemented this way (people used to run some apps under a linked-in LinuxThreads as an alternative).
I can work around it using a different design approach, more in-tune with what Rick suggested.
-Jeff
···
On Sep 30, 2005, at 12:11 PM, Paul wrote:
I've had similar experiences. As far as I remember from when we had a
peek under the hood to check the thread implementation, its based on a
select() model. Ruby threads are not really threads in the sense of
scheduled time-sliced processes.
In my case, we were writting something similar (HTTP decoding engine)
and handled the network io with ruby threads (because that what selects
are really good for), but then handled decoding in a forked child
(because its all CPU, it would have blocked the other ruby 'threads').