I was wondering if anyone could give me some advice/thoughts/input
regarding scaling/designing in Ruby for an enterprise level app. Mainly
I am looking for suggestions for existing Ruby code/apps I can use to
address my design and concerns. I would like a lot of feedback on
FastCGI vs. running things the .NET or J2EE way. In a way I'm building
a framework, but a project specific one for an enterprise app, not for
the average personal dynamic website. I may release it after the
project is complete.
Please excuse the length of this post but I think I should explain the
situation a bit since scalability and design are broad and complex
issues. I know that many do not consider Ruby to be an enterprise level
language, but I am a believer in rapid development and scaling through
good design/programming. Language does not matter to me. I'll also
mention that I know performance != scalability, though I will say it
can help scalability, particularly vertically.
Background:
I'll try to oversimplify this in the interest of this post not being
more of a book than it already is. The software is actually a new idea
that I'm not going to reveal for creative reasons, but it is probably
closest to some of the existing social networking/dating apps. The app
must support over a million users (yes, this is realistic if not a
gross underestimate if all goes well). We expect to have several
thousand concurrent users. AJAX and querying DB data will play a heavy
role in the system. Users will have lots of preferences/settings to
manage/configure, most of which will be stored to disk as xml
(serialized objects perhaps?) rather than the database.
Our reason for selecting Ruby is because we want to continue to bring
more attention to what we feel is a wonderful language (thank you Rails
and others for contributing already) and also since we want to reduce
our time to market as much as possible. We also want to try to give
back some of our end results to the Ruby community in terms of code.
Our development team is very experienced in web apps, but new to the
Ruby world as far as serious applications. We are trying to find out
more about what is already out there, needs to be done, and what
limitations we might face. We've done extensive research but I would
like some opinions directly from rubyists, rubycons, whatever. We'd
like to scale horizontally and vertically and leverage cheap hardware
to start until the site gets going more.
Current Specs/Design:
Our design uses MVC/n-tier. Business logic should be able to run
independently on its own servers and not care about the web. Some of
our basic design/plans are as follows:
1. Web Server -- Lighthttpd - seems to work well with FastCGI, very
quick. Load balanced to send user to best server. Dedicated servers for
things like serving images.
2. FastCGI or SCGI - We would like to replace FastCGI with something
else if possible since we have concerns about all of our processes
being constantly occupied by AJAX polling back to server code. We're
not entirely convinced FastCGI is a great architecture for us but if we
do use it, we would like to scale it accross many servers and use a
SessionID to bind a user to a server. I worry about running out of
available FastCGI processes, even with multiple machines.
3. Database -- MySQL or Postgres - We need transaction support and data
integrity -- doubts about MySQL in these areas. We also need good join
performance - database is heavily normalized, may denormalize if
needed. Separate DBs for things like Logging/Auditing vs. Content. Want
to cluster DBs. We want connection pooling. Considering making some
modifications to DBI. Use stored procedures in the database, no sql in
the code/dynamic sql. I loathe O/R mappers for complex databases and
they would likely bring our system to a screeching halt. Unfortunately
we cannot afford to use Oracle right now.
4. Caching -- We'll cache user settings, DB data, etc. Need a good ruby
caching solution. Considering using memcached or something else
existing.
5. Templating -- Not satisfied with anything. Considered Clearsilver
but now developing our own system in mainly C++ that is more
accomodating to the way our site works/AJAX.
6. Unit Testing and Performance numbers for every procedure. Is there a
good code profiling tool for Ruby? Use both with makefile process.
Where we identify bottlenecks, we may refactor or write in C++. We may
move some of the critical components to C++ where possible. We've
looked into using SWIG to help our C++ efforts.
7. Sessions - Separate server for managing Sessions if possible. I
would like to persist things in memory and share if possible. If not
possible, we'll settle for persisting info to the database. Been
looking at Session affinity for Ruby some.
8. AJAX - Polling and Queuing system for AJAX interactions in
Javascript. Queing server side. Likely we will use and possibly extend
an existing AJAX library such as Mochikit or base it off another
existing one.
9. Remote - Looked at Drb for some remote things.
10. External Services - We may expose some web services in the future
or have internal web services. We may also be providing RSS feeds from
things like blogs or lists. We'll use ReXML for our xml needs, but we
may switch to c++. I've heard performance concerns about REXML but have
yet to test for myself.
11. Events/Threading - I am worried about Ruby here, especially after
reading about Myriad's problems with libevent. We will definitely need
something similar to delegates and events and some good queueing and
threading functionality.
We looked at Rails like everyone else but after using it a bit, reading
the author's blog, and from previous experience in other languages it
is clearly not suitable for the size and complexity of our application.
I also will reiterate that I hate O/R mapping unless it's for a quick
personal app.
Concerns: Our main concern is of course scalability. Our AJAX controls
(many already finished) will need to poll the server in some cases over
a specified interval. We know that this is going to create some new
demands that we did not have to worry about in the purely synchronous
webpage development model. We have already seen these issues some in
our .NET, J2EE, PHP, and Python apps, but none of them have to deal
with this much traffic. We feel that many of our AJAX controls are
going to create unique demands on the application, particularly given
the environment Ruby runs in and its threading limitations.
For instance, a purely hypothetical example might be we have a control
that lists online users along with status information.The controls
would need to requery information every 20 seconds to obtain fresh
information about the users (are they online? what is their current
mood? what did they last do?). This means that our server is going to
get hit a lot harder than your typical web application that serves up a
dynamic page, then sits idle until the user moves on to a new page.
Final Thoughts:
I believe my greatest concerns are shared memory, caching, sessions,
and the FastCGI dynamic. We need to support a lot of simultaneous users
and I'm worried the process model needs to be perhaps replaced with a
threadpool/queing model (one that responds quick though). Caching on
all levels will have to take a lot of the pressure off of us. I've
never been a fan of Session variables but we do need to manage a lot of
session info. We need a lot of user to user interaction so this is
another concern with the FastCGI setup.
The read-write ratio in the app is probably about 70% read 30% write.
At any one time however we can expect the app is doing a significant
amount of writes to the DB but not compared to the amount of reads, so
keep this in mind. Our entire design needs to factor this into the
equation.
I hope that even though I'm new here I can get some good suggestions.
I'm excited to write something on this level in Ruby and get away from
the .NET, PHP, and J2EE stuff I've worked a lot with in the past. FYI,
I started doing C, Pascal, COBOL, etc. in desktop and client/server
apps for many years before moving on to mainly web apps so forgive me
if I'm naturally skeptical of everything.
Thanks and I would deeply appreciate any input no matter how big or
small.