Our Red Hat Enterprise 5 Server is swapping itself to death – need a plan for detecting the cause – Managing your servers can streamline the performance of your team by allowing them to complete complex tasks faster. Plus, it can enable them to detect problems early on before they get out of hand and compromise your business. As a result, the risk of experiencing operational setbacks is drastically lower.
But the only way to make the most of your server management is to perform it correctly. And to help you do so, this article will share nine tips on improving your server management and fix some problem about linux, apache-2.2, redhat, swapping, .
At seemingly random intervals, the memory usage on our server is increasing over the maximum available and swapping until the CPU usage is also 100%. It then starts killing off processes when it runs out of swap memory and we have to restart the server.
When this happens our website and internal systems become unresponsive. I also cannot SSH into the server at this point so I have no way of identifying the processes which are killing the it.
I don’t have a huge amount of experience with server admin but I’m looking for ideas for how to detect the problem. Let me know what extra information you may need.
Could be a fork-bomb tbh (i.e. a process that’s infinitely forking children and hence exhausting the resources). Could also be a memory leak type issue.
Identifying the key process(es) is key here. Try this:
When you next restart the server leave a console open as root but use renice to set its priority to -20. Once that’s done run (top with priority -20) and watch to see what’s causing the issue.
This command ought to do it:
sudo bash renice -n -20 -u root top
When things start looking tight resort to the killall command or kill the parent and then the zombies.
At -20 you should be able to keep an active connection over ssh and still do your work, its same priority as the Kernel.
Don’t forget to look in the logs (web server and otherwise in /var/log) as well since they can be quite revealing.
If you identify the problem let us know what it is and if you require further help and assistance.
See the renice man page and top man page.
Install (and read the documentation carefully!)
sysstat, configure it and analyze the collected data after such an incident.
Review the security policies in place (SELinux active,
ulimit for the various users, …). Check that everything is up to date (a malfunctioning program certainly can cause this).
Check any homebrew systems for possible loops or other resource exhaustion. Real all logs, even for databases and such.