Instance freezing/locking up without detailing why in the logs

Posted on

Instance freezing/locking up without detailing why in the logs – Managing your servers can streamline the performance of your team by allowing them to complete complex tasks faster. Plus, it can enable them to detect problems early on before they get out of hand and compromise your business. As a result, the risk of experiencing operational setbacks is drastically lower.

But the only way to make the most of your server management is to perform it correctly. And to help you do so, this article will share nine tips on improving your server management and fix some problem about linux, apache-2.2, ubuntu, amazon-web-services, .

was hoping someone can shed some light on this reoccurring problem.

We have a medium tier instance running on AWS. But it will go down a random times, normally once a week. I’ve spent many hours looking through access/error logs of apache trying to spot why, but it’s proving to be a nightmare to diagnose. From EC2 console, the instance has a green tick icon, revealing it’s running, but I can’t ssh on and have to stop then start the instance for it to come back online.

From the apache error logs around the time it went down earlier

127.0.0.1 - - [22/Jan/2012:06:25:03 +0000] "OPTIONS * HTTP/1.0" 200 152 "-" "Apache/2.2.20 (Ubuntu) (internal dummy connection)"
127.0.0.1 - - [22/Jan/2012:06:25:03 +0000] "OPTIONS * HTTP/1.0" 200 152 "-" "Apache/2.2.20 (Ubuntu) (internal dummy connection)"
127.0.0.1 - - [22/Jan/2012:06:25:03 +0000] "OPTIONS * HTTP/1.0" 200 152 "-" "Apache/2.2.20 (Ubuntu) (internal dummy connection)"
127.0.0.1 - - [22/Jan/2012:06:25:03 +0000] "OPTIONS * HTTP/1.0" 200 152 "-" "Apache/2.2.20 (Ubuntu) (internal dummy connection)"
127.0.0.1 - - [22/Jan/2012:07:19:46 +0000] "OPTIONS * HTTP/1.0" 200 152 "-" "Apache/2.2.20 (Ubuntu) (internal dummy connection)"
127.0.0.1 - - [22/Jan/2012:07:19:47 +0000] "OPTIONS * HTTP/1.0" 200 152 "-" "Apache/2.2.20 (Ubuntu) (internal dummy connection)"

From the access logs:

[Sun Jan 22 06:25:03 2012] [notice] Apache/2.2.20 (Ubuntu) PHP/5.3.6-13ubuntu3.2 with Suhosin-Patch configured -- resuming normal operations
[Sun Jan 22 10:01:50 2012] [notice] Apache/2.2.20 (Ubuntu) PHP/5.3.6-13ubuntu3.2 with Suhosin-Patch configured -- resuming normal operations
[Sun Jan 22 10:11:26 2012] [notice] caught SIGTERM, shutting down

Can someone advise what my next steps in diagnosing this problem might be.

Thanks

Solution :

Don’t treat this as purely an Apache problem. The symptoms you’ve given don’t warrant that. (e.g., those logs contain nothing unusual– just that Apache didn’t shut down cleanly at the time of the incident, which you already knew, since you bounced the machine) I’m not saying you can entirely rule Apache out, but I wouldn’t check it first if a machine is pingable but not SSH-able. (Green in EC2 means pingable, right? If not, then you definitely should ping it!)

Check system logs (eg the message log, sometimes found at at /var/log/messages, and other stuff in /var/log and other log locations, including logs for any other applications you run on that system, including stuff like sshd) logs for other applications in addition to Apache.

Also, when you couldn’t SSH in, was that with a connection refused or with ssh hanging? Just curious. If/when this issue happens again, check whether any other services are accessible, if you have any open besides httpd and sshd (and it’s pingable!).

Hope you track down the issue! 🙂

Leave a Reply

Your email address will not be published.