What’s happening when Apache’s httpd %mem climbs 20+ and the CPU’s %wa is running high?

Posted on

What’s happening when Apache’s httpd %mem climbs 20+ and the CPU’s %wa is running high? – Managing your servers can streamline the performance of your team by allowing them to complete complex tasks faster. Plus, it can enable them to detect problems early on before they get out of hand and compromise your business. As a result, the risk of experiencing operational setbacks is drastically lower.

But the only way to make the most of your server management is to perform it correctly. And to help you do so, this article will share nine tips on improving your server management and fix some problem about linux, apache-2.2, django, httpd, mod-python.

I’m encountering unfamiliar Apache symptoms and I’m curious if anyone here knows how to diagnose them. I’ve got a pair of app servers running mod_python and Apache, recently upgraded to Django 1.2.3. They plug into a db server that runs PostGIS and memcached.

Here’s what I’m seeing in ‘top’:

  • The app servers httpd processes climb to the
    low 20s.

  • The app servers CPU’s %wa, which, in the past,
    had been almost always near zero,
    starts dancing around %50.

I restart apache, the problems go away. It’s only recurred once so far, but I’m worried it might, and I’m curious to get to the bottom of it. Anyone seen this before? Know the smart way to deal with it? I’m planning on trying to closely the examine io operations if it crops again, but don’t have a good grip on it.

Solution :

Use strace -T -f -p 1154 where 1154 is the process ID of the offending process. Then use grep and sed/awk and lsof to try and sort out which system calls are taking a long time. You will likely find that a variant of read() or write() against a particular file is taking a long time. You should try inspecting the list of open files first with lsof to get the file descriptor (e.g. 5) and then search for read(5, and inspect the number at the end (e.g. <0.00056>). The larger this number, the more you need to investigate the device the file is on, which is why lsof is so handy.

By the way, on some systems I have to issue a SIGCONT against the process and it’s children because strace issued a SIGSTOP. Type as root: cd /proc/1154/task; kill -CONT *; cd /

Leave a Reply

Your email address will not be published. Required fields are marked *