zachary.com

personal pages

All ad proceeds donated to charity.

Millions of dynamic web application hits a day with open source

At my day job, our traffic has grown quite a bit over the last few months, and we're now serving web pages to hundreds of thousands of visitors per day. When we designed our basic system architecture, we planned for this and more traffic, and I'm happy to say that our design seems to be scaling well, using a modest hardware investment.

In many ways, our application is a "traditional" three-tiered web application -- front-end web-servers connected to mid-tier application servers connected to back end databases. Our front-end essentially consists of a web server (lighttpd), content cache (Squid), and load-balancing proxy (HAProxy). Lighttpd, though not without its warts, is a reliable very fast webserver. All of our static web content is server by lighttpd, and the source data fits within the OS's page cache. We've seen peaks of more than 1000 requests / second handled by the front end without any trouble. Any complete page that can be cached for a reasonable amount of time is requested through the Squid cache. All requests that need to go to the middle tier -- both cache misses and completely dynamic pages are routed through the HAProxy load balancer. HAProxy is a very, very nice product -- it handles millions of requests/day, proxying to and monitoring the next tier, and reporting and allowing configuration of everything anyone could want -- all without placing much of a load on the CPU. We used to use Pound for this task, but switch to HAProxy to fix some timeout issues. Pound worked OK, but HAProxy provides a much greater level of monitoring and control.

Our application server middle tier consists of -- well -- our application servers, written in Python. What makes this layer interesting from a systems-architecture perspective is its statelessness. Client sessions are not stored in this tier, and so we can add appserver capacity by simply adding more application servers to our cluster. The application servers are managed as a unit, so things like pushing software releases can be accomplished with a single click. The application server's main function is to process web requests and return result pages (and AJAX page content). Much of this result data is aggressively cached in a shared memcached pool -- all of the application servers share the same pool and can therefore benefit from the work of their peers.

Finally we come to the back-end database. We run PostgreSQL on x86_64 Linux, and I'm very pleased with it's performance. As our load has grown, we have migrated to a more capable database configuration. We're currently running our primary database on an 8-CPU, 32GB Penguin Computing server, with lots of very fast disks. We work hard to minimize our database load, and our current system should handle our load well through at least the next 5x growth in traffic. In the grand scheme of things, RAM is pretty cheap -- even 32GB, and our system has room for another 32GB. With some help from Varlena LLC we learned about many of the ins and outs of the Slony replication system, and replicate our database in near-real time to both a co-located backup system as well as an offsite backup, which we also use for analytics.

The whole thing is monitored with Munin using and a collection of standard and custom plugins, and the system sends us SMS notifications of anomalous events. Putting together the whole system has been a ton of fun. I'm especially happy that we here able to do it entirely with open source software, contributing fixes and enhancements as we go along.

Categories: technology python


This page last modified Wednesday 24 January, 2007 by David Creemer
All content Copyright 2003-2005, David Z Creemer