Large scale sites? Anyone? Anyone?

I have an upcoming project that is for a news site with daily and hourly content updates. Currently everything is done by static HTML. Obviously it's a lot of work to get their stuff up there.

I've had a great time working in Ruby and Rails of rother web projects. But I'm just wondering if it's possible to do large scale sites with RoR.

The site needs to be able to handle a minimum of 200 req/sec. I'd figure RoR, with a proper caching configuration, and maybe Squid in front to do some short term page caching could possibly handle the project, but I've never used RoR for anything this large. It' might be possible since most of the site doen't need to keep a user state, since it's only news pages, no logins (except for the writers/editors which is a small fraction of the traffic).

Has anyone had any success running sites this large or larger with RoR? If so, were there any special requirements you needed to get it running? Were more than one server needed for such applications? What were the machine specs?

Thanks for anyone's input.

Sean

A very simple solution would be to just have the system output static html.
This would scale infinite :slight_smile:

Why worry about caching when you can do even better by just serving static
html.

"SEan Wolfe" <nospam@nowhere.com> wrote in message
news:11j90hjesob623e@news.supernews.com...

···

I have an upcoming project that is for a news site with daily and hourly
content updates. Currently everything is done by static HTML. Obviously
it's a lot of work to get their stuff up there.

I've had a great time working in Ruby and Rails of rother web projects.
But I'm just wondering if it's possible to do large scale sites with RoR.

The site needs to be able to handle a minimum of 200 req/sec. I'd figure
RoR, with a proper caching configuration, and maybe Squid in front to do
some short term page caching could possibly handle the project, but I've
never used RoR for anything this large. It' might be possible since most
of the site doen't need to keep a user state, since it's only news pages,
no logins (except for the writers/editors which is a small fraction of the
traffic).

Has anyone had any success running sites this large or larger with RoR? If
so, were there any special requirements you needed to get it running? Were
more than one server needed for such applications? What were the machine
specs?

Thanks for anyone's input.

Sean

Hey Sean-
     I just finished and launched a similar site for the newspaper I work for last month. Here's the site: <http://yakimaherald.com>. I get about 55-60,000 hits a day right now I'm not sure offhand how many concurrent users at this time. But the traffick it serves right now is only taking up about 10-16% of my processing power on a dual 2.3Ghz G5 xserve that is dedicated to the site. I started using apache1.3/fcgi but quickly switched to lighttpd/fcgi and it is working great and using much less cpu than apache did. Feel free to contact me off list if you want to discuss more details of what I am doing for caching strategies and all the different data sources I am collection data from.
     Here is a link to a more complete write up of the complexity of this application: <http://permalink.gmane.org/gmane.comp.lang.ruby.rails/20637&gt;\. I describe most of dev process and trials and tribulations I went through in that post. Also I mention my deployment scheme. I also have very good config files for lighttpd and other aspects oif the install.

HTH-
-Ezra Zygmuntowicz
Yakima Herald-Republic
WebMaster

509-577-7732
ezra@yakima-herald.com

···

On Sep 23, 2005, at 3:31 PM, SEan Wolfe wrote:

I have an upcoming project that is for a news site with daily and hourly content updates. Currently everything is done by static HTML. Obviously it's a lot of work to get their stuff up there.

I've had a great time working in Ruby and Rails of rother web projects. But I'm just wondering if it's possible to do large scale sites with RoR.

The site needs to be able to handle a minimum of 200 req/sec. I'd figure RoR, with a proper caching configuration, and maybe Squid in front to do some short term page caching could possibly handle the project, but I've never used RoR for anything this large. It' might be possible since most of the site doen't need to keep a user state, since it's only news pages, no logins (except for the writers/editors which is a small fraction of the traffic).

Has anyone had any success running sites this large or larger with RoR? If so, were there any special requirements you needed to get it running? Were more than one server needed for such applications? What were the machine specs?

Thanks for anyone's input.

Sean

Do you mean simultaneous users or actually completed requests per second?
200 req/sec is over 8 million requests in a 12 hour period. 200 simultaneous
connections is another story, and shouldn't be a real issue for
apache/lighttpd and rails.

The main issues you will have is setting up load balancing and failover. We
use ServerIrons a lot as front end load balancers. Personally I love them as
they take a lot less time to manage then most open source solutions and have
a lower failure rate. Distributing access to your databases will probably be
one of the main challenges, as well as having some type of failover
mechanism. For a good general caching system you might take a look at
memcached. We use it a lot and it's a great tool for taking the load off of
your database servers.

In our case everything is dynamic so we don't have a use for squid, although
you could use that in conjunction with a ServerIron, or in place of if you
just want to throw up several linux boxes with squid and use round robin
dns. Personally I don't care much for messing with dns techniques unless you
need to geographically distribute your servers.

Chris

···

On 9/23/05, SEan Wolfe <nospam@nowhere.com> wrote:

I have an upcoming project that is for a news site with daily and hourly
content updates. Currently everything is done by static HTML. Obviously
it's a lot of work to get their stuff up there.

I've had a great time working in Ruby and Rails of rother web projects.
But I'm just wondering if it's possible to do large scale sites with RoR.

The site needs to be able to handle a minimum of 200 req/sec.

It' might be possible since most
of the site doen't need to keep a user state, since it's only news
pages, no logins (except for the writers/editors which is a small
fraction of the traffic).

That's your key. As long as there's no state, scaling doesn't really
have anything to do with Rails. Page caching will allow you to scale
high and mighty with very little effort. lighttpd is able to push out
thousands of static HTML pages per second.

···

--
David Heinemeier Hansson
http://www.loudthinking.com -- Broadcasting Brain
http://www.basecamphq.com -- Online project management
http://www.backpackit.com -- Personal information manager
http://www.rubyonrails.com -- Web-application framework

snacktime wrote:

Do you mean simultaneous users or actually completed requests per second?
200 req/sec is over 8 million requests in a 12 hour period. 200 simultaneous
connections is another story, and shouldn't be a real issue for
apache/lighttpd and rails.

Well, basically the site currently serves about 160,000 visits/day, 8,000,000 pages/day, and 18,000,000 hits/day and pushes out about 135GB/day. This is all currently being served by apache and static HTML with a lot of includes.

Their box is a 4x CPU, linux box, with 1.5GB of RAM. I took some activity samples, and it seems that their load on the box averages about 35-40%. MySql is pretty much idle, since currently it's barely used.

They update about every hour, but if they had some sort of CMS, they would like to update more frequently, be even more dynamic.

The idea of keeping things on static pages seems fine. I just don't know how responsive it would be to frquent updates. Each update changes different navigation in several locations (always keeping the freshest content within a click, from any location on the site).

The main issues you will have is setting up load balancing and failover. We
use ServerIrons a lot as front end load balancers. Personally I love them as
they take a lot less time to manage then most open source solutions and have
a lower failure rate. Distributing access to your databases will probably be
one of the main challenges, as well as having some type of failover
mechanism. For a good general caching system you might take a look at
memcached. We use it a lot and it's a great tool for taking the load off of
your database servers.

Currently, their emphasis is on Cost to Implemenet at the moment. Their Hosting service currently provides them with one dedicated box, and a block of bulk bandwith. The provider thinks that a CMS would require a second box to split the load. This would add additional monthly costs to their plan. So any dedicated load balancer is up to the location company, and the cost to implement. It's hard to believe that such a large site has such a small bugget! :stuck_out_tongue:

Becasue of cost, they were also looking into off-the-shelf open source solutions. But most of the ones that I've seen didn't seem to fit well, or were just a big mess of PHP.

Also there is the issue with my costs as well, since I don't want to spend a great deal of time developing the site. Ruby makes a great choise for this aspect.

In our case everything is dynamic so we don't have a use for squid, although
you could use that in conjunction with a ServerIron, or in place of if you
just want to throw up several linux boxes with squid and use round robin
dns. Personally I don't care much for messing with dns techniques unless you
need to geographically distribute your servers.

The only DNS trick I guess I was thinking of possibly doing is having two sites, one for the CMS application where the editors and writers add their content, and then one site that the visitors see. The CMS site would then publish content to the live site. The CMS site could be more heavy on the app side, and not need any serious hardware specs, since it would only have to serve at most, about 50 users a day.

Anyways, this is still in the very early stages yet. I would like to hear more input!

thanks,

Sean

···

On 9/23/05, SEan Wolfe <nospam@nowhere.com> wrote:

The ad at the start that pretends to be a Windows system message is a
little irritating. I actually closed the tab when it showed up because I
was unsure what clicking any of the buttons would do.

M.

···

On Sat, 24 Sep 2005 07:56:40 +0900, Ezra Zygmuntowicz wrote:

Here's the site: <http://yakimaherald.com>.

Yeah, having a CMS app where writers and editors can add/edit content, and
publishing to a static html website, sounds like it'd solve all your
issues. As long as there isn't anything in the site that has to appear
dynamic to the user, I'd say a webapp was overkill.

martin

···

Aemca <none@none.com> wrote:

A very simple solution would be to just have the system output static html.
This would scale infinite :slight_smile:

Why worry about caching when you can do even better by just serving static
html.

Rails + Caching should be a perfect fit for you. Just expire the
caches whenever new content is generated and it should be almost as
fast as plain static pages.

···

On 9/23/05, SEan Wolfe <nospam@nowhere.com> wrote:

snacktime wrote:
> On 9/23/05, SEan Wolfe <nospam@nowhere.com> wrote:
> Do you mean simultaneous users or actually completed requests per second?
> 200 req/sec is over 8 million requests in a 12 hour period. 200 simultaneous
> connections is another story, and shouldn't be a real issue for
> apache/lighttpd and rails.

Well, basically the site currently serves about 160,000 visits/day,
8,000,000 pages/day, and 18,000,000 hits/day and pushes out about
135GB/day. This is all currently being served by apache and static HTML
with a lot of includes.

Their box is a 4x CPU, linux box, with 1.5GB of RAM. I took some
activity samples, and it seems that their load on the box averages about
35-40%. MySql is pretty much idle, since currently it's barely used.

They update about every hour, but if they had some sort of CMS, they
would like to update more frequently, be even more dynamic.

The idea of keeping things on static pages seems fine. I just don't know
how responsive it would be to frquent updates. Each update changes
different navigation in several locations (always keeping the freshest
content within a click, from any location on the site).

> The main issues you will have is setting up load balancing and failover. We
> use ServerIrons a lot as front end load balancers. Personally I love them as
> they take a lot less time to manage then most open source solutions and have
> a lower failure rate. Distributing access to your databases will probably be
> one of the main challenges, as well as having some type of failover
> mechanism. For a good general caching system you might take a look at
> memcached. We use it a lot and it's a great tool for taking the load off of
> your database servers.

Currently, their emphasis is on Cost to Implemenet at the moment. Their
Hosting service currently provides them with one dedicated box, and a
block of bulk bandwith. The provider thinks that a CMS would require a
second box to split the load. This would add additional monthly costs to
their plan. So any dedicated load balancer is up to the location
company, and the cost to implement. It's hard to believe that such a
large site has such a small bugget! :stuck_out_tongue:

Becasue of cost, they were also looking into off-the-shelf open source
solutions. But most of the ones that I've seen didn't seem to fit well,
or were just a big mess of PHP.

Also there is the issue with my costs as well, since I don't want to
spend a great deal of time developing the site. Ruby makes a great
choise for this aspect.

> In our case everything is dynamic so we don't have a use for squid, although
> you could use that in conjunction with a ServerIron, or in place of if you
> just want to throw up several linux boxes with squid and use round robin
> dns. Personally I don't care much for messing with dns techniques unless you
> need to geographically distribute your servers.

The only DNS trick I guess I was thinking of possibly doing is having
two sites, one for the CMS application where the editors and writers add
their content, and then one site that the visitors see. The CMS site
would then publish content to the live site. The CMS site could be more
heavy on the app side, and not need any serious hardware specs, since it
would only have to serve at most, about 50 users a day.

Anyways, this is still in the very early stages yet. I would like to
hear more input!

You might want to look into things like gzip compression and maybe a site
redesign with CSS.
Both can bring your bandwith down a lot which could bring a better TCO to
the site.

When going with the static site idea you can also get a low end server with
mysql / rails / php / whatever to generate the pages.
The main server would then only have to serve the static pages.

"SEan Wolfe" <nospam@nowhere.com> wrote in message
news:11j95pb80n02rcd@news.supernews.com...

···

snacktime wrote:

On 9/23/05, SEan Wolfe <nospam@nowhere.com> wrote:
Do you mean simultaneous users or actually completed requests per second?
200 req/sec is over 8 million requests in a 12 hour period. 200
simultaneous
connections is another story, and shouldn't be a real issue for
apache/lighttpd and rails.

Well, basically the site currently serves about 160,000 visits/day,
8,000,000 pages/day, and 18,000,000 hits/day and pushes out about
135GB/day. This is all currently being served by apache and static HTML
with a lot of includes.

Their box is a 4x CPU, linux box, with 1.5GB of RAM. I took some activity
samples, and it seems that their load on the box averages about 35-40%.
MySql is pretty much idle, since currently it's barely used.

They update about every hour, but if they had some sort of CMS, they would
like to update more frequently, be even more dynamic.

The idea of keeping things on static pages seems fine. I just don't know
how responsive it would be to frquent updates. Each update changes
different navigation in several locations (always keeping the freshest
content within a click, from any location on the site).

The main issues you will have is setting up load balancing and failover.
We
use ServerIrons a lot as front end load balancers. Personally I love them
as
they take a lot less time to manage then most open source solutions and
have
a lower failure rate. Distributing access to your databases will probably
be
one of the main challenges, as well as having some type of failover
mechanism. For a good general caching system you might take a look at
memcached. We use it a lot and it's a great tool for taking the load off
of
your database servers.

Currently, their emphasis is on Cost to Implemenet at the moment. Their
Hosting service currently provides them with one dedicated box, and a
block of bulk bandwith. The provider thinks that a CMS would require a
second box to split the load. This would add additional monthly costs to
their plan. So any dedicated load balancer is up to the location company,
and the cost to implement. It's hard to believe that such a large site has
such a small bugget! :stuck_out_tongue:

Becasue of cost, they were also looking into off-the-shelf open source
solutions. But most of the ones that I've seen didn't seem to fit well, or
were just a big mess of PHP.

Also there is the issue with my costs as well, since I don't want to spend
a great deal of time developing the site. Ruby makes a great choise for
this aspect.

In our case everything is dynamic so we don't have a use for squid,
although
you could use that in conjunction with a ServerIron, or in place of if
you
just want to throw up several linux boxes with squid and use round robin
dns. Personally I don't care much for messing with dns techniques unless
you
need to geographically distribute your servers.

The only DNS trick I guess I was thinking of possibly doing is having two
sites, one for the CMS application where the editors and writers add their
content, and then one site that the visitors see. The CMS site would then
publish content to the live site. The CMS site could be more heavy on the
app side, and not need any serious hardware specs, since it would only
have to serve at most, about 50 users a day.

Anyways, this is still in the very early stages yet. I would like to hear
more input!

thanks,

Sean

Are you talking about the pop-under add? God I hate that thing. I am just the developer so I don't get to make revenue decisions about the site. But I have been campaigning to get rid of that thing.. Hopefully I will be able to remove it at the end of the month. Don't you hate ad campaigns like that. I have been telling my boss that it is a major turn off for a lot of people and we are not making enough money off that to justify its existence. Anyway, my sincerest apologies for that f*&#ng thing. I hate it as much as you.

-Ezra Zygmuntowicz
WebMaster
Yakima Herald-Republic Newspaper
ezra@yakima-herald.com
509-577-7732

···

On Sep 23, 2005, at 10:21 PM, Michael Vondung wrote:

On Sat, 24 Sep 2005 07:56:40 +0900, Ezra Zygmuntowicz wrote:

Here's the site: <http://yakimaherald.com>.

The ad at the start that pretends to be a Windows system message is a
little irritating. I actually closed the tab when it showed up because I
was unsure what clicking any of the buttons would do.

M.

Depending on the structure of the site, maybe you could just start with a
simple content editor that stores articles or newsitems in the database, and
a publish mechanism to wrap the content in an html template and publish it
as a static page to the website. If the pages on the site that need updating
are the same ones day in and day out I don't think it would be that much
work.

For instance you have an html template made up of a header and footer. To
create the final page you you start with the header, query the database to
get any content items for that page, then add the footer. The editor writes
out the final html page right there, and it's ready to be pushed to the
server when it's time to publish.

I would definitely use something else besides apache for serving the static
pages. Squid, thttpd, lighttpd, zeus (which also supports fastcgi), and
others will probably cut down on server resources being used and boost
performance.

Chris

You might want to look into things like gzip compression and maybe a
site redesign with CSS.
Both can bring your bandwith down a lot which could bring a better
TCO to the site.

But keep in mind that gzip compression increases load on the server especially with dynamic content!

When going with the static site idea you can also get a low end
server with mysql / rails / php / whatever to generate the pages.
The main server would then only have to serve the static pages.

My first attempt at this would probably be this: create two web apps, one for CM and one for the site (as static HTML). The CM webapp updates static HTML of the other. Whether this will work, depends mostly on the # of editors and the average change frequency. If not every change has to be make public immediately you can even think of a batched update of the static part every 5 minutes or so.

Just my 0.02 EUR...

Kind regards

    robert

···

Aemca <none@none.com> wrote: