Whether you're administering a single, dedicated Web server, or one serving hundreds of hosts, each site's client expects and deserves a certain level of support and service.
Customer Number One (Web Techniques, Nov 2001)
Certainly, the normal requirementsfor instance, that the server is robust and secure and that there are sufficient hardware and network resources for the siteare ones I've discussed in previous articles. However, increasingly I hear from clients who want the benefits of dedicated hosting, but in a shared server environment. The two main benefits they want are: guaranteed quality of service (assurance that other sites or virtual hosts won't gobble up all of the server's resources), and secure, dedicated handling of local server-side applicationswhether they're CGI scripts or mod_perl/PHP/JSPeven if they only serve as front-ends to backend application servers.
Fortunately, with a little know-how, these demands are relatively easy to address. A few simple techniques can have you on your way to treating every customer as if he or she is your only customer.
Finding a Balance
Ideally, no one site or client takes more than its fair share of what it needs. With no overlap and no one hogging resources, no clients are jilted, and we can use the server to its full potential. Of course, this is seldom the case.
As frequent readers know, I'm a hard and fast fan of Apache (and, I believe, with good reason). One incredibly useful Apache feature is its support for virtual hosts; one instance of Apache can handle several separate and distinct domains. Typically, all virtual hosts are considered equal, at least as far as Apache is concerned. There's no way to tell Apache, "don't let this virtual host use more than X gigabytes of bandwidth," for example. Because of this limitation, it's possible for one site to use so many server resources that, for all intents and purposes, all other virtual hosts are blocked out. In such cases, the shared server effectively becomes a dedicated server.
For example, several years ago a user of my ISP posted pictures of his girlfriend in a bikini on his personal Web page. The increased traffic to his page basically took over the server, and everyone else's pages were adversely affected.
The Apache module mod_throttle is ideally suited to this task. A few other modules perform similar functions, but I've found mod_throttle to be the most reliable and best suited to the differing setups I've encountered over the years. It has a long and comprehensive list of items and parameters that it can monitor and allocate for each server, virtual host, or user. This means that you can use one tool for everything from dedicated servers (where you want to provide finer control over how Apache uses resources), to traditional shared server virtual hosts, to multiuser setups (with URLs like www.foobar.com/~jim).
Installing mod_throttle is relatively easy. But because it needs to keep track of various parameters, such as the number of requests or the total bandwidth used across all Apache processes, mod_throttle must create its own personal and private shared memory allocation. Editing the mod_throttle.c file to specify which method to use for your particular platform is probably the least intuitive part of the build.
For FreeBSD and Linux systems in particular, I've achieved the best results using the System V shared memory implementation; with Solaris 7 and 8, Posix works best.
Fundamentally, mod_throttle limits either the number of requests Apache will handle, or the total bandwidth used. As mentioned above, this can be on a per virtual host basis so it's easy, for example, to set up one virtual host with an allocation of 5 GB per day, and another with 10 GB per day. Or, if you prefer, you can limit the throughput of the virtual hosts to something like 128Kb per second. The choice is yours, depending on whether you want to control the total usage, the peaks, or some combination of both.
The module allows four areas of control, each represented by a distinct directive:
ThrottlePolicy. Allows for throttling control in a traditional Apache container, such as <VirtualHost>,<Directory> or <Location>, as well as server wide.
ThrottleUser. Useful in ISP setups where you want to limit on a per-local-user basis.
ThrottleRemoteUser. Used to control the way specific outside users load the site. For example, you can allow specific authenticated users faster throughput than non-authenticated ones.
ThrottleClientIP. Lets you track the IP address of the requesting client and have separate control over a designated period of time. This lets you do things like assign a different limit to recurring visitors than sporadic or unique ones.
I've found that I use ThrottlePolicy almost exclusively. ThrottleUser is mainly useful if you run an ISP that supports user home pages. The directive below, for example, limits the throughput of each user's page to 560Kb every 10 seconds (56Kb per second):
ThrottleUser * Original 560k 10s
"Original" is a mod_throttle policy that describes what mod_throttle should monitor and limit. In this case, we're controlling the throughput (or volume) that's allowed. Other policies specify the number of requests per second (Document, Concurrent, and Request), a percentage of requests to accept (Random), a minimum idle time between requests (Idle), as well as other ways of controlling throughput (Speed and Volume).
Normally, mod_throttle adds a small delay to incoming requests as it approaches the throughput limit. If it reaches the limit, then its default action is to refuse requests entirely, which isn't what you want in a production environment. I usually set mod_throttle to let the delay keep on growing, so that requests aren't refused. I do this by setting the ThrottleMaxDelay directive to zero. With this set, mod_throttle doesn't deny requests, and as page demand decreases, mod_throttle also decreases the delays.
With its rich feature set, mod_throttle proves to be a very robust way of implementing a Quality of Service plan for your hosting environment.
Giving CGI Scripts Unique IDs
Now that we can monitor, control, and limit basic resource usage per site, next we must let each site operate under its own user ID. This is important in a shared environment, such as the traditional Apache virtual host setup. When Apache starts, it gives up its privileged (or root) user ID for a much safer one (traditionally, the Unix nobody account). It does this so that, should any security holes or exploits be discovered, a compromised unprivileged account is much safer than a compromised root account.
The disadvantage is that internal scripts (such as PHP or mod_perl code) or external CGIs all operate under this same user ID. Even if each script is for a separate virtual host, the operating system has no clue that they're intended for different entities. Thus, if virtual host www.foo.com has a script that writes information to a specific file, it would be fairly easy for a virtual host on the same shared server to read or overwrite the data in that file as well, because all of the hosts run as the same user ID.
For CGI scripts, this is easily fixed. Apache has a capability called suExec, which is a smart CGI wrapper implementation. When Apache needs to call an external CGI script, it uses the suExec program to execute it, instead of doing it directly. Because the suExec program is itself suid root (meaning that it runs under the permissions and privileges of the root user ID), it can run the script under any user or group ID that you'd like.
Thus, each virtual host can be given a unique user and group ID, using the traditional User and Group Apache directives. When a CGI script is called for that virtual host, suExec runs the script, but as the specific user and group, not as the generic IDs under which Apache runs. This prevents the security problem described above.
Unfortunately, either solution works only with CGI scripts. Server-side scripting extensions, such as JSP, PHP, or mod_perl are unaffected, as these run in the Apache process rather than as external programs (which is why they're so fast and efficient). PHP has a mode of operation called Safe Mode, which helps a bit, but it isn't entirely safe (nor is it described as such). When I want, or need, better protection, I have to use another solution.
Running Multiple Instances
Apache's support for multiple virtual hosts on a single instance sometimes makes people forget that they can also run several instances of Apache at once. Although it's true that this is a less efficient way to use Apache, it provides many worthwhile features that are directly applicable to our present topic.
Of course, there are some caveats about running multiple instances. First, it isn't feasible if you're running many virtual hosts at the same time, unless you have lots of memory to spare. You'll have to bump up various kernel level limits (such as the number of available processes) to higher than normal values as well. Secondly, this technique doesn't work at all with name-based virtual hosts. Each virtual host requires its own IP address or port, which also gobbles up resources.
If a given virtual host really requires its own, unique identity, however, I can create a separate Apache instance just for that host, with its own configuration file and layout. This lets me set the server-wide User and Group settings to values unique to just that host. Once you've done this, there's no risk of compromise or overlap, even with server-side languages. I can also use mod_throttle to control and manage the resources of each host from a server-wide aspect.
To make it easy to administer this setup, I keep Apache in its traditional location: /usr/ local/apache. For each instance of Apache, I create a separate configuration directory, such as /usr/local/apache/configs/www.foo.com and /usr/local/apache/configs/www.bar.com. Once the config files in those directories are in place, I can start up the Apache instance from the command line:
This makes it easy to determine each instance at a glance from a ps listing, because the configuration directory location provides that information.
If I have only a handful of instances to worry about, then I can start, stop, or restart the server by hand. But when there are more than a few, I use scripts to make my life easier. I make a copy of the apachectl script and modify it to reflect the changes for each instance (where that instance's PID file is located, and the startup command line). I then place the modified copy in each instance's configuration directory.
With that done, it's a simple matter to administer each instance. If I see that www.foo.com needs to be restarted, I simply add:
% cd /usr/local/apache/configs/www.foo.com
% ./apachectl restart
I've also used this solution in client environments that required very high levels of configuration and access control. When this is the case, you can actually let customers maintain their own configuration files. I certainly wouldn't recommend this in many cases, but having a separate instance for each virtual host at least makes it possible.
Sometimes you can better allocate resources using separate instances of Apache as well. Instead of having 500 Apache processes running, each responsible for ten virtual hosts, you could have ten separate instances, each configured for a maximum of fifty processes. Being able to control the maximum number of Apache processes for each virtual host supplements the control that mod_throttle provides and lets you better use your server resources.
Now That's Satisfaction
To give customers dedicated server class performance and quality guarantees, you don't have to provide a dedicated server for each client. We've looked at two simple yet powerful methods that, when combined, provide you with a high level of control and manageability, while customers receive the service they expect. Certainly there are other methods available, but the ones I've presented are simple, easy to add, and even easier to use. There's nothing wrong with keeping your clients happy while keeping yourself sane.
Jim is best known as one of the core developers of Apache, and a member of the board of the ASF. He's a senior consultant for Covalent Technologies. Contact him at jim@jaguNET.com.