How to debug unresponsive web server – Managing your servers can streamline the performance of your team by allowing them to complete complex tasks faster. Plus, it can enable them to detect problems early on before they get out of hand and compromise your business. As a result, the risk of experiencing operational setbacks is drastically lower.
But the only way to make the most of your server management is to perform it correctly. And to help you do so, this article will share nine tips on improving your server management and fix some problem about linux, apache-2.2, ubuntu, php, amazon-web-services.
We have an Medium EC2 instance running Ubuntu 12.04, serving about a dozen small PHP web applications via Apache.
Approximately every other day, the server becomes unresponsive and rebooting the instance is required to restore functionality. During this time, the server cannot be accessed via HTTP or SSH.
Every time, the last logged Apache request is to a PHP application that serves a 4MB PDF document. The User Agent always identifies the client as an iPad (specifically
Mozilla/5.0 (iPad; CPU OS 6_1_3 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10B329 Safari/8536.25) and is often the same IP address, and therefore likely the same user.
The PHP application is a legacy application, and checks some permissions before echo-ing the contents of a file from disk to the client. We have not been able to reproduce this issue ourselves, either using an iPad, nor by accessing the file by any other means.
We’ve tried a few monitoring solutions to try and get a better picture of what’s happening when the server goes down, but none of them appear show any issue with system resources.
My question is what are some strategies we can use to try and troubleshoot and hopefully resolve this issue?
Start by monitor system resources (cpu load, memory, disk), for example with collectd or sysstat.
Keep in mind that I’m going out on a limb here, the problem you are describing might result from an exhaustion of a resource (most likely memory), run
egrep -i 'killed process' /var/log/* to look for OOM killer invocations.
System logs might contain traces of the cause (
/var/log/messages, apache’s error logs).
Try enabling more detailed logs and pay close attention to your system while testing it.