Drb experts

drb’ing ruby’ists-

what is the preferred method of server a ‘live’ ruby object, eg one holding
resources on the local machine, like database connections

a) distribute the connections directly, perhaps starting a new DrbServer
(once only) for each connection - returning this object to the clients

silly (pseudo-code) :

class PGconn
include DRbUndumped
end

class PgSqlConnPool
def initialize (*databases)
@pgconns = {}
databases.map {|db| @pgconns[db] = PGconn.new db }
end

def pgconn db
  DRbObject.new(nil, (DRb::DRbServer.new(nil, @pgconn[db]).uri)
end

end

b) return a proxy object knowing it’s Drb parent, where methods called on the
proxy relay into calls on the main Drb server

silly (pseudo-code) :

class PGconnProxy
include DRbUndumped
def initialize drb, db
@drb = drb
@db = db
end

def query sql
  @drb.query db, sql
end

end

class PgSqlConnPool
def initialize (*databases)
@pgconns = {}
databases.map {|db| @pgconns[db] = PGconn.new db }
end

def pgconn db
  PGconnProxy.new DRbObject.new(nil, self.uri), db
end

def query db, sql
  @pgconns[db].query sql
end

end

obviously i know very little about drb - but hopefully the overall design will
come through in my bad code. i’m hopefull that i’ll be able to work out the
details myself. i’m really interested in the best overall design stratagy.

-a

···

====================================

Ara Howard
NOAA Forecast Systems Laboratory
Information and Technology Services
Data Systems Group
R/FST 325 Broadway
Boulder, CO 80305-3328
Email: ahoward@fsl.noaa.gov
Phone: 303-497-7238
Fax: 303-497-7259
====================================

ahoward wrote:

drb’ing ruby’ists-

what is the preferred method of server a ‘live’ ruby object, eg one holding
resources on the local machine, like database connections

I don’t have a direct answer for you. However, you can probably find some good
ideas from the dbi-users mailing list archive at
http://www.rosat.mpe-garching.mpg.de/mailing-lists/dbi/. A search on “connection
pool” yields plenty of results (some more interesting than others).

Hope that helps.

Regards,

Dan

drb’ing ruby’ists-

what is the preferred method of server a ‘live’ ruby object, eg one holding
resources on the local machine, like database connections

a) distribute the connections directly, perhaps starting a new DrbServer
(once only) for each connection - returning this object to the clients

This will work well if you can load all the database drivers on the
client end (once per client), and will perform the best because DRb
is only dispatching database connections.

b) return a proxy object knowing it’s Drb parent, where methods called on the
proxy relay into calls on the main Drb server

This will result in the least memory overhead, but DRb may become a
bottleneck if you have to do alot of database activity. Remember DRb
spawns a thread per request.

obviously i know very little about drb - but hopefully the overall design will
come through in my bad code. i’m hopefull that i’ll be able to work out the
details myself. i’m really interested in the best overall design stratagy.

Well, if you have only a few database requests, or have low memory, go
with option b. If you have a ton of memory, or are doing many database
queries, go with option a.

If you’re going to be using this with Apache, you’ll be a tough call.
Is the overhead of having the PG drivers in each Apache process worth
it? Probably yes if you have a ton of memory or don’t have many Apache
processes hanging around. If you are low on memory, or only do a few DB
queries/page, then you may be better off having DRb handle everything.

···

ahoward (ahoward@fsl.noaa.gov) wrote:


Eric Hodel - drbrain@segment7.net - http://segment7.net
All messages signed with fingerprint:
FEC2 57F1 D465 EB15 5D6E 7C11 332A 551C 796C 9F04

This will work well if you can load all the database drivers on the client
end (once per client), and will perform the best because DRb is only
dispatching database connections.

well, the client and server will be the same in my case - i plan to use this
object from cgi programs.

This will result in the least memory overhead, but DRb may become a
bottleneck if you have to do alot of database activity. Remember DRb spawns
a thread per request.

yes, this does seem like a bottle neck. in general, what types of data needs
thread protection when programming drb?

Well, if you have only a few database requests, or have low memory, go
with option b. If you have a ton of memory, or are doing many database
queries, go with option a.

2GB or 512GB of RAM - primary/secondar ha-linux machines.

If you’re going to be using this with Apache, you’ll be a tough call. Is
the overhead of having the PG drivers in each Apache process worth it?
Probably yes if you have a ton of memory or don’t have many Apache processes
hanging around. If you are low on memory, or only do a few DB queries/page,
then you may be better off having DRb handle everything.

you lost me.

why would the pg drivers be loaded into apache in either case? i’m
envisioning a pool which fires up idependently from cgi programs or apache,
cgi programs will either get a proxy object delegating calls to the drb object
which does have the drivers loaded, or a handle (guess this is a proxy too
;0) to the actuall connection object. in either case it seems like only the
server(s) would have the postgresql.so in memory? i’m not using mod_ANYTHING.

-a

···

On Fri, 31 Jan 2003, Eric Hodel wrote:

–=20
Eric Hodel - drbrain@segment7.net - http://segment7.net
All messages signed with fingerprint:
FEC2 57F1 D465 EB15 5D6E 7C11 332A 551C 796C 9F04

–l0l+eSofNeLXHSnY
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.0 (FreeBSD)

iD8DBQE+Oag7MypVHHlsnwQRAotyAKCqcn+dVn8kcxfkXn4Fh85r2H8MuwCfapxJ
MU3urb6kNpJ1Fi+Etikw6Wg=
=a39h
-----END PGP SIGNATURE-----

–l0l+eSofNeLXHSnY–

====================================

Ara Howard
NOAA Forecast Systems Laboratory
Information and Technology Services
Data Systems Group
R/FST 325 Broadway
Boulder, CO 80305-3328
Email: ahoward@fsl.noaa.gov
Phone: 303-497-7238
Fax: 303-497-7259
====================================

This will work well if you can load all the database drivers on the client
end (once per client), and will perform the best because DRb is only
dispatching database connections.

well, the client and server will be the same in my case - i plan to use this
object from cgi programs.

Here I meant client and server processes.

This will result in the least memory overhead, but DRb may become a
bottleneck if you have to do alot of database activity. Remember DRb spawns
a thread per request.

yes, this does seem like a bottle neck. in general, what types of data needs
thread protection when programming drb?

Anything you write to should have a mutex around it, unless you don’t
care about lost updates.

For a DB driver, that would probably be the open DB socket count.

Well, if you have only a few database requests, or have low memory, go
with option b. If you have a ton of memory, or are doing many database
queries, go with option a.

2GB or 512GB of RAM - primary/secondar ha-linux machines.

If you’re going to be using this with Apache, you’ll be a tough call. Is
the overhead of having the PG drivers in each Apache process worth it?
Probably yes if you have a ton of memory or don’t have many Apache processes
hanging around. If you are low on memory, or only do a few DB queries/page,
then you may be better off having DRb handle everything.

you lost me.

why would the pg drivers be loaded into apache in either case? i’m
envisioning a pool which fires up idependently from cgi programs or apache,
cgi programs will either get a proxy object delegating calls to the drb object
which does have the drivers loaded, or a handle (guess this is a proxy too
;0) to the actuall connection object. in either case it seems like only the
server(s) would have the postgresql.so in memory? i’m not using mod_ANYTHING.

Let me clear this up with a little code:

When you use Undumped, DRb passes a proxy of the object across to
the client, so everything done through the object takes place on
the server side, not the client side. The client only needs to load
DRb.

If you don’t use Undumped, you need to load all the libraries used by
the class on the client side. For a DB driver, this would be the
postgres drives, and you’d get the whole driver on each Apache process.
The more Apache processes you have, the more overhead this takes up.

Here is an example:

···

ahoward (ahoward@fsl.noaa.gov) wrote:


server.rb:

require ‘drb’
class Server
def custom_obj
return CustomObj.new
end
def custom_undumped_obj
return CustomUnDumpedObj.new
end
end
class CustomObj
def do_stuff
puts “I’m doing stuff!”
end
end
class CustomUnDumpedObj
include DRbUndumped
def do_stuff
puts “I’m doing stuff, and I’m not dumped!”
end
end
DRb.start_service(‘druby://localhost:5000’, Server.new)
DRb.thread.join


client.rb:

require ‘drb’

class CustomObj
def do_stuff
puts “I’m doing stuff!”
end
end

DRb.start_service
obj = DRbObject.new(nil, ‘druby://localhost:5000’)
co = obj.custom_obj
udo = obj.custom_undumped_obj
p co
p udo
co.do_stuff
udo.do_stuff


If you start up the server, then run the client you will see:

** client side:
$ ruby client.rb
#CustomObj:0x8056f28
#<DRb::DRbObject:0x8056b18 @ref=67286072, @uri=“druby://localhost:5000”>
I’m doing stuff!

** server side:
$ ruby server.rb
I’m doing stuff, and I’m not dumped!
^C
$

If you remove the class CustomObj from the client.rb you get this:

** client side:
$ ruby client.rb
/usr/local/lib/ruby/site_ruby/1.6/drb/drb.rb:123:in load': undefined class/module CustomObj (ArgumentError) from /usr/local/lib/ruby/site_ruby/1.6/drb/drb.rb:123:in load’
from /usr/local/lib/ruby/site_ruby/1.6/drb/drb.rb:166:in recv_reply' from /usr/local/lib/ruby/site_ruby/1.6/drb/drb.rb:300:in send_message’
from /usr/local/lib/ruby/site_ruby/1.6/drb/drb.rb:227:in method_missing' from /usr/local/lib/ruby/site_ruby/1.6/drb/drb.rb:226:in open’
from /usr/local/lib/ruby/site_ruby/1.6/drb/drb.rb:226:in `method_missing’
from client.rb:15

So the advantage of using include UnDumped is that everything gets
proxied, there’s less per-process overhead because you don’t have a
driver sitting in each process, and you have perfect control over the
number of DB sockets you have open.

The disadantage is that you’re shoving everything into a bitty socket
before sending it out to the DB.


Eric Hodel - drbrain@segment7.net - http://segment7.net
All messages signed with fingerprint:
FEC2 57F1 D465 EB15 5D6E 7C11 332A 551C 796C 9F04

Here I meant client and server processes.

me too.

‘well, the client and server will be the same in my case’.gsub
/will be the same/, ‘will be on the same host/’

Anything you write to should have a mutex around it, unless you don’t care
about lost updates.

you mean anthing outside the local scope of the method called?

eg. @var, @@var, $var, etc.

Let me clear this up with a little code:

[code very helpfull - thanks!!!]

When you use Undumped, DRb passes a proxy of the object across to
the client, so everything done through the object takes place on
the server side, not the client side. The client only needs to load
DRb.

ahhh. i wondered if this were the case. can anyone explain the ‘un’ in
UnDumped? ‘un’ becaues it is not actually dumped, but proxies are returned
instead?

If you don’t use Undumped, you need to load all the libraries used by the
class on the client side.

in my case - the cgi process would need postgres.so, etc.

For a DB driver, this would be the postgres drives, and you’d get the whole
driver on each Apache process. The more Apache processes you have, the more
overhead this takes up.

as i mention above, the client and server side would be the same in my case,
since the cgi (client) would be using a pool (server) running on my web/db
server machine(s). i wonder though, if in this case the cost would actually
be that high as you think since postgres.so would be shared for all processes

  • surely ‘require’ does a shared mapping and not a private one?

in fact, it seems like in the case of using the client and server on the same
machine - at least where *.so’s are concerned - there would be little
differnce in overhead since the one image would be in memory regardless. have
you done any performance tests of this situation?

-a

thanks again for the code.

···

On Fri, 31 Jan 2003, Eric Hodel wrote:

Here is an example:


server.rb:

require ‘drb’
class Server
def custom_obj
return CustomObj.new
end
def custom_undumped_obj
return CustomUnDumpedObj.new
end
end
class CustomObj
def do_stuff
puts “I’m doing stuff!”
end
end
class CustomUnDumpedObj
include DRbUndumped
def do_stuff
puts “I’m doing stuff, and I’m not dumped!”
end
end
DRb.start_service(‘druby://localhost:5000’, Server.new)
DRb.thread.join


client.rb:

require ‘drb’

class CustomObj
def do_stuff
puts “I’m doing stuff!”
end
end

DRb.start_service
obj =3D DRbObject.new(nil, ‘druby://localhost:5000’)
co =3D obj.custom_obj
udo =3D obj.custom_undumped_obj
p co
p udo
co.do_stuff
udo.do_stuff


If you start up the server, then run the client you will see:

** client side:
$ ruby client.rb
#CustomObj:0x8056f28
#<DRb::DRbObject:0x8056b18 @ref=3D67286072, @uri=3D"druby://localhost:5000">
I’m doing stuff!

** server side:
$ ruby server.rb
I’m doing stuff, and I’m not dumped!
^C
$

If you remove the class CustomObj from the client.rb you get this:

** client side:
$ ruby client.rb
/usr/local/lib/ruby/site_ruby/1.6/drb/drb.rb:123:in load': undefined class= /module CustomObj (ArgumentError) from /usr/local/lib/ruby/site_ruby/1.6/drb/drb.rb:123:in load’
from /usr/local/lib/ruby/site_ruby/1.6/drb/drb.rb:166:in recv_repl= y' from /usr/local/lib/ruby/site_ruby/1.6/drb/drb.rb:300:in send_mess=
age’
from /usr/local/lib/ruby/site_ruby/1.6/drb/drb.rb:227:in method_mi= ssing' from /usr/local/lib/ruby/site_ruby/1.6/drb/drb.rb:226:in open’
from /usr/local/lib/ruby/site_ruby/1.6/drb/drb.rb:226:in `method_mi=
ssing’
from client.rb:15

So the advantage of using include UnDumped is that everything gets
proxied, there’s less per-process overhead because you don’t have a
driver sitting in each process, and you have perfect control over the
number of DB sockets you have open.

The disadantage is that you’re shoving everything into a bitty socket
before sending it out to the DB.

–=20
Eric Hodel - drbrain@segment7.net - http://segment7.net
All messages signed with fingerprint:
FEC2 57F1 D465 EB15 5D6E 7C11 332A 551C 796C 9F04

–/qIPZgKzMPM+y5U5
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.0 (FreeBSD)

iD8DBQE+OeJ/MypVHHlsnwQRAnR1AJ9md6MuEXaP2kU8bQtQilYrzybGOwCgitrS
Q/Z9xVuRU3qW08Lq4RxOAyk=
=WC7o
-----END PGP SIGNATURE-----

–/qIPZgKzMPM+y5U5–

====================================

Ara Howard
NOAA Forecast Systems Laboratory
Information and Technology Services
Data Systems Group
R/FST 325 Broadway
Boulder, CO 80305-3328
Email: ahoward@fsl.noaa.gov
Phone: 303-497-7238
Fax: 303-497-7259
====================================

Let me clear this up with a little code:

** server side:
$ ruby server.rb
I’m doing stuff, and I’m not dumped!

so - raising exceptions from objects whos class includes ‘undumped’ is
probably a bad idea since it would only raise an error on the server side?
how can one propagate error back to the client in this case?

-a

···

On Fri, 31 Jan 2003, Eric Hodel wrote:

====================================

Ara Howard
NOAA Forecast Systems Laboratory
Information and Technology Services
Data Systems Group
R/FST 325 Broadway
Boulder, CO 80305-3328
Email: ahoward@fsl.noaa.gov
Phone: 303-497-7238
Fax: 303-497-7259
====================================

Here I meant client and server processes.

me too.

‘well, the client and server will be the same in my case’.gsub
/will be the same/, ‘will be on the same host/’

Anything you write to should have a mutex around it, unless you don’t care
about lost updates.

you mean anthing outside the local scope of the method called?

eg. @var, @@var, $var, etc.

Let me clear this up with a little code:

[code very helpfull - thanks!!!]

When you use Undumped, DRb passes a proxy of the object across to
the client, so everything done through the object takes place on
the server side, not the client side. The client only needs to load
DRb.

ahhh. i wondered if this were the case. can anyone explain the ‘un’ in
UnDumped? ‘un’ becaues it is not actually dumped, but proxies are returned
instead?

If you watch the transactions with tcpdump, you can see the actual
object coming across without undumped. Normally, DRb use Marshall and
dump to send the object across.

If you don’t use Undumped, you need to load all the libraries used by the
class on the client side.

in my case - the cgi process would need postgres.so, etc.

For a DB driver, this would be the postgres drives, and you’d get the whole
driver on each Apache process. The more Apache processes you have, the more
overhead this takes up.

as i mention above, the client and server side would be the same in my case,
since the cgi (client) would be using a pool (server) running on my web/db
server machine(s). i wonder though, if in this case the cost would actually
be that high as you think since postgres.so would be shared for all processes

  • surely ‘require’ does a shared mapping and not a private one?

in fact, it seems like in the case of using the client and server on the same
machine - at least where *.so’s are concerned - there would be little
differnce in overhead since the one image would be in memory regardless. have
you done any performance tests of this situation?

Well, if they are separate Apache processes, then it would fall to your
OS’s ld.so to do a shared mapping (which I’m sure is smart). The
overhead would be for anything you’d do in ruby before it hits the DB,
the code for querying, quoting, etc.

I would expect that DRb would be slower, because each Apache process can
have its own DB connection, and one process won’t block the others
during I/O. DRb adds the overhead of marshalling and dumping objects
across a TCP/IP socket.

How much slower that will be though, I don’t know.

···

ahoward (ahoward@fsl.noaa.gov) wrote:

On Fri, 31 Jan 2003, Eric Hodel wrote:


Eric Hodel - drbrain@segment7.net - http://segment7.net
All messages signed with fingerprint:
FEC2 57F1 D465 EB15 5D6E 7C11 332A 551C 796C 9F04