Please help, question regarding threads

I am somewhat new to threaded applications so am hoping someone can help
give me a little advice.

I am working on a server-oriented application which is threaded, each
connection to the server creates a new thread and many various custom class
objects are loaded per thread, some of them containing a very large amount
of data.

I am worried about scalability, does using class variables for storing the
data prevent a copy from being produced (loaded) once in each thread? That
is if the data is stored in class variables within their respective
objects, is there only one copy in memory that is shared among the various
threads that are running?

Are there other issues to consider regarding scalibility?

Thank you for whatever help and advice you may offer!
-Andy

···

Posted Via Uncensored-News.Com - Still Only $9.95 - http://www.uncensored-news.com
<><><><><><><> The Worlds Uncensored News Source <><><><><><><><>

Hello Andrew,

Thursday, October 03, 2002, 11:25:07 AM, you wrote:

I am worried about scalability, does using class variables for storing the
data prevent a copy from being produced (loaded) once in each thread? That
is if the data is stored in class variables within their respective
objects, is there only one copy in memory that is shared among the various
threads that are running?

yes. ruby threads are very simple and light. it is close to just
switching “program counter” between several locations and no more.
data can be made local to the thread only by calling method or creating
closure (standard ruby ways to make local data) and class data is common for all threads:

class A
def A.set(x)
@@x = x
end
def A.get
@@x
end
end

A.set(1)
p A.get
Thread.new { A.set(2) }.join
p A.get

···


Best regards,
Bulat mailto:bulatz@integ.ru

Andrew Cowan icculus@gmdstudios.com writes:

I am somewhat new to threaded applications so am hoping someone can help
give me a little advice.

The world of concurrent programming is vast. This little post is far
from describing that world.

I am working on a server-oriented application which is threaded, each
connection to the server creates a new thread and many various custom class
objects are loaded per thread, some of them containing a very large amount
of data.

This is a naive and bad approach. See below.

objects, is there only one copy in memory that is shared among the various
threads that are running?

All kind of variables but thread-specific variables are shared between
threads. Thread-specific variables are accessed with thread# (see
http://www.rubycentral.com/book/ref_c_thread.html#Thread._ob_cb )

Are there other issues to consider regarding scalibility?

Consider the following real story:

Couple years back, our team had to build a web robot. It is a program
that recursively fetches web pages.

One of my (dumb and stuborn) team member insisted in using one thread
for each connection. So, if there are 700 simultaneous connections,
there are at least 700 threads. Each thread perform the following
tasks in order: get next url, resolve dns, connect, get page, store
page in dbase so that it can be parsed later on.

Such large number of threads are guaranteed to kill the computer
performance as more times are spent switching threads rather than
doing the actual work.

My solution is to use 5 threads no matter how many connections there
are. Each threads do the specific jobs: get next url, resolve dns,
connect (non-blocking), get page (using select()), store page in
dbase. This works only in UNIX, but then at that time Windows didn’t
inspire much confidence for working with large number of
connections. I can’t say much about Windows as I’ve never seen its
source code, nor ever bother to read documentations (infamous for
their inaccuracy). At least in Linux, if you have 700 simultaneous
connection, there is still only one kernel thread that move packets
from the ethernet device to the main ram. <gossip, unproven

In Windows, I was told, the Async I/O uses kernel
thread. Whether or not this is true, it is obviously an inefficient
solution to handling many connections. </gossip, /unproven assertion>

Think of threads as assembly lines. Had Ford required 700 assembly
lines to simultaneously make 700 T-models, the automobile world would
never even see a single T-model.

So, in the end, if you see that your number of threads are growing in
step with the number of connection, perhaps it is time for you to
re-evaluate the design.

YS.

Hello Yohanes,

Thursday, October 03, 2002, 12:09:01 PM, you wrote:

I am working on a server-oriented application which is threaded, each
connection to the server creates a new thread

So, in the end, if you see that your number of threads are growing in
step with the number of connection, perhaps it is time for you to
re-evaluate the design.

if he writes server, not client, it is better to use one thread per
connection. also, ruby threads are very light and IMHO you can’t save
time by simulating thread switching in your own code

also, about thread-specific data - all blocks created when executing
new thread will have data independent from other threads

···


Best regards,
Bulat mailto:bulatz@integ.ru

data can be made local to the thread only by calling method or creating
closure (standard ruby ways to make local data)

ruby has thread local variables.

Guy Decoux

Bulat Ziganshin wrote:

Hello Yohanes,

Thursday, October 03, 2002, 12:09:01 PM, you wrote:

I am working on a server-oriented application which is threaded, each
connection to the server creates a new thread

So, in the end, if you see that your number of threads are growing in
step with the number of connection, perhaps it is time for you to
re-evaluate the design.

if he writes server, not client, it is better to use one thread per
connection. also, ruby threads are very light and IMHO you can’t save
time by simulating thread switching in your own code

Would you care to explain why? I can give you a great example of at
least one protocol where setting a thread per connection is a terrible
idea, XMPP (Jabber). Since the protocol expects the user to setup a
long-running socket per user, even though the traffic across this socket
is relatively low, a thread per connection would kill your scalability.

If the usage of these sockets is short-lived, and there is a limited
number of connections established, I can understand a thread per
connection, as it make design significantly easier. However, this
imposes scalability restrictions on your software at design time, and
should you decide later you need more scalability in your software,
you’ll have to throw more money, either in the form of bigger hardware,
or time refactoring.

Admittedly, I don’t know all the specifics around the design of ruby’s
thread implementation, but I would venture a guess that a thread per
socket in any software where you need a highly scalable network server
is a bad idea. Keep in mind, however, that the thread implementation in
ruby (the C interpreter) is likely to be radically different from an
implementation in JRuby (the Java interpreter).

bs.

Hello Ben,

Thursday, October 03, 2002, 11:17:41 PM, you wrote:

if he writes server, not client, it is better to use one thread per
connection. also, ruby threads are very light and IMHO you can’t save
time by simulating thread switching in your own code

Would you care to explain why?

because you need to work with several connections in one thread,
emulating ruby own threads. on my 1200 mhz box, ruby primitives run in
about 1 microsec, thread switching is about 20 microsec, and
create+destroy thread is about 50 microsec

is a bad idea. Keep in mind, however, that the thread implementation in
ruby (the C interpreter) is likely to be radically different from an
implementation in JRuby (the Java interpreter).

may be or may be not. who knows :slight_smile:

···


Best regards,
Bulat mailto:bulatz@integ.ru

Bulat Ziganshin wrote:

is a bad idea. Keep in mind, however, that the thread implementation in
ruby (the C interpreter) is likely to be radically different from an
implementation in JRuby (the Java interpreter).

may be or may be not. who knows :slight_smile:

I know! :slight_smile:
JRuby uses Java’s threads, so they are indeed radically different.

/Anders

···


Anders Bengtsson ndrsbngtssn@yahoo.se


Gratis e-mail resten av livet på www.yahoo.se/mail
Busenkelt!

Hello Anders,

Friday, October 04, 2002, 2:25:31 PM, you wrote:

Bulat Ziganshin wrote:

is a bad idea. Keep in mind, however, that the thread implementation in
ruby (the C interpreter) is likely to be radically different from an
implementation in JRuby (the Java interpreter).

may be or may be not. who knows :slight_smile:

I know! :slight_smile:
JRuby uses Java’s threads, so they are indeed radically different.

jruby have compatibility problems with ruby in this area? f.e. class
variables are global?

···


Best regards,
Bulat mailto:bulatz@integ.ru

Bulat Ziganshin wrote:

I know! :slight_smile:
JRuby uses Java’s threads, so they are indeed radically different.

jruby have compatibility problems with ruby in this area? f.e. class
variables are global?

No, the language itself behaves the same.

The major difference is that Ruby’s threading API is based on having
full control over thread scheduling, with methods like Thread.critical,
which stops all other threads. When you use an external implemenation of
threads, like Java’s threads or POSIX threads or whatever, you don’t
have that level of control.
There are many ways to get around this, so it will be interesting in the
future to watch how Rite, Cardinal and other Ruby implementations will
approach this.

/Anders

···


Anders Bengtsson ndrsbngtssn@yahoo.se


Gratis e-mail resten av livet på www.yahoo.se/mail
Busenkelt!