Andrew Cowan icculus@gmdstudios.com writes:
I am somewhat new to threaded applications so am hoping someone can help
give me a little advice.
The world of concurrent programming is vast. This little post is far
from describing that world.
I am working on a server-oriented application which is threaded, each
connection to the server creates a new thread and many various custom class
objects are loaded per thread, some of them containing a very large amount
of data.
This is a naive and bad approach. See below.
objects, is there only one copy in memory that is shared among the various
threads that are running?
All kind of variables but thread-specific variables are shared between
threads. Thread-specific variables are accessed with thread# (see
http://www.rubycentral.com/book/ref_c_thread.html#Thread._ob_cb )
Are there other issues to consider regarding scalibility?
Consider the following real story:
Couple years back, our team had to build a web robot. It is a program
that recursively fetches web pages.
One of my (dumb and stuborn) team member insisted in using one thread
for each connection. So, if there are 700 simultaneous connections,
there are at least 700 threads. Each thread perform the following
tasks in order: get next url, resolve dns, connect, get page, store
page in dbase so that it can be parsed later on.
Such large number of threads are guaranteed to kill the computer
performance as more times are spent switching threads rather than
doing the actual work.
My solution is to use 5 threads no matter how many connections there
are. Each threads do the specific jobs: get next url, resolve dns,
connect (non-blocking), get page (using select()), store page in
dbase. This works only in UNIX, but then at that time Windows didn’t
inspire much confidence for working with large number of
connections. I can’t say much about Windows as I’ve never seen its
source code, nor ever bother to read documentations (infamous for
their inaccuracy). At least in Linux, if you have 700 simultaneous
connection, there is still only one kernel thread that move packets
from the ethernet device to the main ram. <gossip, unproven
In Windows, I was told, the Async I/O uses kernel
thread. Whether or not this is true, it is obviously an inefficient
solution to handling many connections. </gossip, /unproven assertion>
Think of threads as assembly lines. Had Ford required 700 assembly
lines to simultaneously make 700 T-models, the automobile world would
never even see a single T-model.
So, in the end, if you see that your number of threads are growing in
step with the number of connection, perhaps it is time for you to
re-evaluate the design.
YS.