Look at That 800 Pound Apache Hiding in the Corner

800 Pound ApacheWindows is not usually the operating system that comes to mind when deploying Apache. Not all of us, however, have a choice in what operating system we run, and the fact that open source software like Apache and PHP run on the Windows platform allows more people to get acquainted with the open source stack without having to switch operating systems.

Still, sites that run Apache in Production on Windows are few and far between. This means that Windows-specific parts of the Apache code receive far less attention and use than code specific to platforms like Linux, which leads to interesting and hard to analyze bugs. For instance, there’s this Windows box running Apache, MySQL and PHP to support a couple of popular PHP applications like Joomla and Gallery. No, I won’t tell you where it is, even though it’s all better now. It was crashing on a regular basis, say every half hour or so. Normally, it’s not so bad when an Apache child process crashes: especially with the Prefork MPM (still the one recommended by the PHP folks), only one client connection will die and there are plenty of other children available so there is no interruption in service. The Windows MPM only has one child process, so when that crashes the server is offline for a couple of seconds while the parent spins up a new child. This is very frustrating and clearly not acceptable in a production situation. But how to debug?

The account that runs the Apache service (perhaps I should put up a deployment best practices guide at some point) was not allowed to write crash dumps. Even though I enlisted the help of a very experienced Windows programmer, we were not able to make Apache dump core. I did observe though that the child process never seemed to grow beyond 256Mb before it wrote a whole bunch of out-of-memory errors to the PHP error log and then crashed with either an access violation or a terse message about how the zend_mm_heap was corrupted. This led me to a crazed and unsuccessful Google search for process memory limits on Windows 2003, and equally unsuccessful attempts to recycle the process before it crashed by setting MaxRequestsPerChild.

So how did I solve the problem? I didn’t solve it, but made it go away (which is something different although the immediate result is the same) by lowering the ThreadsPerChild value from the configuration file default of 256 to a more conservative 100. This lower number must have prevented PHP memory management from stepping on its own toes, and the result was that the server stayed up for 19 days straight before it was manually restarted. Better? You bet! The only, slightly worrisome thing: the child process ballooned to a working set of 800Mb of RAM, and has even been up to 1.2Gb before settling down. Good thing the server has 2Gb installed. Since the server ran for 19 days, I am convinced that the situation was stable, and even if there were a slow leak I could always put MaxRequestsPerChild back in. It just goes to show that PHP applications like Joomla are very large, and cause Apache/PHP to allocate an enormous amount of thread-local storage.

I would still like to know what caused the crashes, but making them go away is almost just as good as actually solving the problem.

Be Sociable, Share!

2 thoughts on “Look at That 800 Pound Apache Hiding in the Corner

  1. While it makes sense to me to allocate a request-specific bucket brigade out of memory that is tied to that request, this particular patch won’t do you any good because ptrans is itself a subpool of pchild. In any case, perhaps I wasn’t clear enough in my original post: the server rapidly balloons to 800Mb and then stays there. It does not actually leak memory, although on a box that doesn’t have the RAM to accomodate such an elephant this will end badly before it hits its comfort zone.

Comments are closed.