Two Advanced Ruby Performance Questions

Hi All and thanks for the responses.

3. I address the overhead of web services operations constantly. Even
when working within the SAME language, I benchmark overhead and test the
cost of each operation. For example, ColdFusion has a native WDDX (XML
like) conversion format that we used to use to fold multiple fields into
a single field for caching in a database. After testing, I wrote a
custom encoding/decoding format that executes about 1-2 orders of
magnitude faster if I remember correctly. I also weigh the cost of
calling methods. In fact, one of the major reasons for wanting a switch

It is said that method call overhead in ruby is pretty high, partially
due to the possibility of being able override an existing method
later.

I found some of the information I wanted in not the eRuby or ERB pages
but in mod_ruby.

Both ERB and eRuby will compile your template into a string that will
get 'eval'ed. The difference is that ERB uses print statements and is
therefore hard to capture.
Have a look at erubis -- ERB implementation in C (I haven't tried it myself).

It suggests that ONE instance of Ruby executes to handle all the
threads; however, it doesn't go into too much detail about how this is
handled. So, for example, if I call

The usual way to deploy ruby application is either using
mod_fgci(d)/fast_cgi or using mongrel or webrick as a container and
proxying to them. (Proxy to Mongrel seems to be the easiest one).

Mongrel is able to run in multiple threads, rails not. The problem is
somewhere in the metaprogramming magic (though I don't know precisely
where). This limitation is worked around by running multiple instances
of mongrel with multiple ruby interpreters. These interpreteter share
nothing by default among them (except db). You can have them share
sessions, etc. by using db or shared memory for it.

Within one interpreter, classes are persistent. I.e. the life cycle
is: the interpreter starts, initializes, and serves requests in a
loop. The classes you define stay there until interpreter is stopped.
Instances/objects are created on the run as needed. They are not
recycled. However it may be possible to create a factory method that
will cache created instances and reuse them. That will possibly make
your code more complicated and error prone...

I still don't know whethere eRuby can be called from within Ruby or if
it has to be called from the command line or through some sort of
adapter.

You can call it from ruby as well. There's not a lot of documentation,
but it's certainly possible. Ask me when you'll need it and I'll send
an example to you.
(OT: I sent a doc patch to Shugo Maeda, but got no response.)
However eRuby will not work as it is with rails due to the print statement.

···

On 11/27/06, Sunny Hirai <sunny@citymax.com> wrote:

I feel like Ruby needs a "High Performance Ruby" book. There is one for
MySQL and that is the only reason I had the confidence to make the
decision to switch out of MS SQL Server. Knowing what I'm up against
would help tremendously.

Thanks for your feedback. If anybody knows anything more about the guts
of mod_ruby and/or Ruby, please let me know.

All the best,

Sunny Hirai
CEO, MeZine Inc.

Hey Sunny-

  The whole thing about Xen is involved in the idea of easy scalability. The behavior of production apps run on a cluster of mongrels that I have seen has a sweet spot. About 3 or 4 mongrels in a cluster behind a front webserver like nginx all running in one Xen instance can serve quite a bit of traffic. Any more then 5 mongrels in a small cluster like this and you start to see diminishing returns. And when you add a shared filesystem to the mix like gfs then you can add and remove nodes from an app cluster at will. Just by cloning the xen instance and having it join the cluster and mount the rails applications on a gfs mount. Doing it this way you can have up to 16 xen instances all sharing a gfs filesystem all serving the same rails application. So when it is time to deploy new code you only have to deploy the code to one of these xen nodes and then just restart the mongrels on all nodes that share the same gfs mount. Then all nodes are serving the new code. All of these nodes run behind hardware load balancers so you can bring them down and up in a piggy back fashion and never have down time to the website users.

  We have found that Xen only imposes a performance overhead of 5-9% compared to non virtualized linux. So going with xen makes it so much easier to scale across boxes that the tradeoff is well worth it. The only boxes you may not want Xen on is the database box or boxes. If you have super heavy database traffic then you may want to go non virtualized on that box. But saying that, we still run our mysql clusters on xen instances and have not had performance problems with it. We use coraid AoE SAN which is block level ATA disk access over ethernet but without the overhead of tcp/ip. So none of our servers have hard disk drives. They only have 128Mb flash ata chips for the main Xen dom0's to boot off of. Then they load all the domU's off of the SAN.

  Rails tends to use a lot of memory but the cpu usage is not so bad. Our cpu's are sleeping and its always ram that needs to be increased. We are going to start buying big 4 processor boxes with 32 or 64gigs of ram and virtualizing on top of those. Currently we use boxes with two dual-core amd opterons with 8gigs of ram.

  Using many smaller rails app server nodes behind load balancing with all of them sharing state with the database and the shared filesystem makes applications feel responsive and distribute the load in a nice way.

Cheers-
-- Ezra Zygmuntowicz-- Lead Rails Evangelist
-- ez@engineyard.com
-- Engine Yard, Serious Rails Hosting
-- (866) 518-YARD (9273)

···

On Nov 28, 2006, at 10:08 PM, Sunny Hirai wrote:

Max Muermann wrote:

I know you don't intend to use Rails, but there's a
performance-specific blog dealing with Rails that might be of interest
to you:

RailsExpress.blog

Maybe you can get in touch with those guys and get them to share some
fo their experiences.

Thanks for the recommendation. The website looks to be some more of what
I'm looking for.

Hi Ezra,

Could you comment on Xen instances yielding better performance then
instances of Ruby on a single server. Is this a mild improvement, a
significant one or for redundancy/easy of deployment?

By the way, the ideas in engineyard are fascinating. Offering a hosting
platform with scalability built in. Very Nice.

I'd be willing to pay for an early beta of your book. :slight_smile:

Sunny Hirai

To M. Edward

Thanks for the info. In terms of VM, basically I'm looking for something
that is significantly faster than Ruby is right now. Ultimately it would
have been nice to start clean on Ruby 2.0 semantics and its upcoming VM
but I don't have much confidence on timelines here, especially to a
stable build. I'd prefer jRuby because it would allow us to hook into
Java easier which a lot of reference implementations for integration are
done in; however, I'm worried about difference between it and the
reference implementation.

Thanks also for the referral to "Ruby for Rails." I have read the book
once but I wasn't thinking of scoping when I did. I will read it again.
The information on scoping will probably be very helpful.

I have read the Pickaxe book (a couple of times now) but am not much of
a C programmer though I have learned it in the past and I know much of
its semantics are similar to Java/C# style without garbage collection,
native OOP, etc.

I do have to disagree a little with your "one last comment" however. I
find it necessary to learn everything I need to know about a language to
scale. I find it uncomfortable when I don't know what is happening under
the hood because things can take me by surprise.

I agree that not knowing what's going on under the hood can be a "good
thing" if your application doesn't need to scale largely and, quite
frankly, for about 99% of apps, you really don't need to worry that much
about performance. But it is absolutely essential in our applications. A
wrong choice early on or a lack of knowledge could mean we run into
possibly unsurmountable problems later.

As an example, I know that MS SQL server keeps statistics on all of its
tables and makes optimization decisions on which indexes to use based on
those statistics. One night, our application slowed to a complete crawl.
It stopped serving pages and yet we couldn't recall any change we made
to the code that would cause it. We traced it to the DB and what
happened was that one of our partners added a huge number of products to
our db. This in itself wasn't a problem as our indexes are designed to
scale to a large number of products; however, SQL Server incorrectly
started choosing the wrong index and performance went down by something
like 100x - 1000x. Obviously, it was using bad logic to decide which
index to use; however, if I didn't know that the optimizer used table
stats to make decisions on which index to use, we would have likely been
stuck looking in the wrong area. The change in table size changed the
stats and the index used change. As it were, we rewrote the query such
that we provided more hinting to the database and then MS SQL Server
started using the correct index again.

I like knowing this type of stuff so I know what happened when things go
wrong and to prevent it from happening in the first place.

Jan Svitok,

Thanks for the incredible information. I feel like you've got an
understanding of how Ruby works underneath and have some interesting
approaches to boot. Just the names of the useful projects has helped
immensely.

It is said that method call overhead in ruby is pretty high, partially
due to the possibility of being able override an existing method
later.

Thanks for the warning. Actually, I think I mispoke a little. I should
have said the overhead of specific methods. Although I have timed method
calls in ColdFusion and the call time differs depending on where the
methods are called from (e.g. methods in objects take a longer to call
than local methods), I haven't found the overhead to be a problem. Like
Ruby, method calls in ColdFusion are done through a lookup and they can
be modified at runtime so I expect similar call times. Object
instantiation, however, was crippling and certain specific methods took
too long to execute and were rewritten (like the WDDX call).

Both ERB and eRuby will compile your template into a string that will
get 'eval'ed. The difference is that ERB uses print statements and is
therefore hard to capture.
Have a look at erubis -- ERB implementation in C (I haven't tried it
myself).

Thank you. I will take a look at erubis.

Thanks also for the information on Mongrel. This project sounds
interesting. I am disappointed to learn that Rails is not thread safe. I
am still hoping that ActiveRecord will work well in a multi-threaded
environment however. The approach of multiple instances of mongrel and
multiple ruby interpreters is a good workaround. That said, I think I'd
have to rewrite ActiveRecord anyways as it relies on config files to set
datasources and such. Our application will probably need to set
datasources at run time so that we can split a table across multiple db
servers and let it know, at run-time, which server the data resides on.

Also, your information on persistence is useful; however, I'm still
unclear about a few things.

I can't wrap my head around when something becomes bound to the "global"
scope and when it is bound to a "request" scope. I'm defining "global"
to mean from the application start to its end and "request" scope to
mean the life of one request.

For example, if I "require" a file with a class, that class will now
become part of the global scope. But what if I define a method in the
"require"d file as well? Does that method become part of the global
scope?

If a "require"d file becomes part of the global scope always, is there
any way to create a class that IS NOT part of the global scope.

Also, if a single request say modifies the "class" at runtime by adding
methods to it, does this change persist into all the other request or
does the change only persist for the one request? What if the class is
modified in non-"require"d code?

I understand if this is too many questions for you. Just wanted to say
thanks either way for the information provided. It's nice to have lots
of useful experts online.

Sunny Hirai
CEO, MeZine Inc.

···

--
Posted via http://www.ruby-forum.com/\.

Hi Ezra,

Thanks for the detailed reply.

Sunny Hirai
CEO, MeZine Inc.

···

--
Posted via http://www.ruby-forum.com/.

As far as threadedness goes, I believe activerecord can be used thread
safely (backgroundrb used to do that before they switched to
process-based workers instead of thread-based workers).

If you are using something like fastcgi or mongrel, then each instance
of those has its own ruby interpreter: changes made in one of those
don't change what's happening in another process.

So if I add a method to a class, change a class variable etc... that
will only affect the mongrel/fastcgi process that that statement
executed in. Subsequent requests to the same mongrel will see that
change, but those that get handled by a different mongrel won't see the
change.

Fred

···

--
Posted via http://www.ruby-forum.com/.

Sunny Hirai wrote:

To M. Edward

Thanks for the info. In terms of VM, basically I'm looking for something that is significantly faster than Ruby is right now. Ultimately it would have been nice to start clean on Ruby 2.0 semantics and its upcoming VM but I don't have much confidence on timelines here, especially to a stable build. I'd prefer jRuby because it would allow us to hook into Java easier which a lot of reference implementations for integration are done in; however, I'm worried about difference between it and the reference implementation.
  

From what I've seen posted on the list, plus my own profiling of the Ruby interpreter, there's probably at least a 30 percent performance improvement easily available in the current interpreter. And I think jRuby will be better than that on the average because it jas the JIT compiler and knows a lot (or can be taught a lot) about the x86-64 architecture. I *hope* you're using the x86-architecture. :slight_smile:

From what I heard in Denver at RubyConf 2006, there is little risk of jRuby diverging in syntax and semantics from the current Ruby 1.8.5.

I do have to disagree a little with your "one last comment" however. I find it necessary to learn everything I need to know about a language to scale. I find it uncomfortable when I don't know what is happening under the hood because things can take me by surprise.
  

[snip]

I like knowing this type of stuff so I know what happened when things go wrong and to prevent it from happening in the first place.
  

Now *you're* preaching to the choir. :slight_smile: I do that sort of thing (performance engineering) for a living (on the Linux platform). though.

Thanks also for the information on Mongrel. This project sounds interesting. I am disappointed to learn that Rails is not thread safe. I am still hoping that ActiveRecord will work well in a multi-threaded environment however. The approach of multiple instances of mongrel and multiple ruby interpreters is a good workaround. That said, I think I'd have to rewrite ActiveRecord anyways as it relies on config files to set datasources and such. Our application will probably need to set datasources at run time so that we can split a table across multiple db servers and let it know, at run-time, which server the data resides on.
  

You definitely should spend some time with Zed Shaw (Mongrel's inventor). He's done some things that most of us thought were impossible in pure Ruby. And he's -- well -- "zealous" about performance and scalability. :slight_smile:

···

--
M. Edward (Ed) Borasky, FBG, AB, PTA, PGS, MS, MNLP, NST, ACMC(P)
http://borasky-research.blogspot.com/

If God had meant for carrots to be eaten cooked, He would have given rabbits fire.

<snip>

Thanks also for the information on Mongrel. This project sounds
interesting. I am disappointed to learn that Rails is not thread safe. I
am still hoping that ActiveRecord will work well in a multi-threaded
environment however. The approach of multiple instances of mongrel and
multiple ruby interpreters is a good workaround. That said, I think I'd
have to rewrite ActiveRecord anyways as it relies on config files to set
datasources and such. Our application will probably need to set
datasources at run time so that we can split a table across multiple db
servers and let it know, at run-time, which server the data resides on.

<snip>

It's an opinion of the Rails developers that processes are the new
threads -- scaling with processes is better/easier than scaling with
threads.

Joe

···

On 11/27/06, Sunny Hirai <sunny@citymax.com> wrote: