Tuesday, January 10, 2012

OOM - Who gets thrown off the plane first?

Just today, one of our production servers exhaust all of its 48G of memory and OOM killed the largest offender, which was mysqld at 38G, off the plane.

So - how do we avoid this, how can we tell OOM that mysqld is the pilot of the plane and if you throw it off then we'll crash for sure?

I've researched this once before, here, here and Linux has an option in proc to adjust the oom condition level.

Users and system administrators have often asked for ways to control the behavior of the OOM killer. To facilitate control, the /proc/<pid>/oom_adj knob was introduced to save important processes in the system from being killed, and define an order of processes to be killed. The possible values of oom_adj range from -17 to +15. The higher the score, more likely the associated process is to be killed by OOM-killer. If oom_adj is set to -17, the process is not considered for OOM-killing.

So, we want OOM to ignore mysqld by setting the value to -17.
echo -17 > /proc/`pgrep ^mysqls$`/oom_adj

Searching for "oom_adj" I ran across this from kernel.org, which suggest that sometime in 2012 or later that the oom_adj option will be replaced in favor of oom_score_adj.

My easiest fix to my problem with OOM is to create a cronjob that runs every 5 minutes and checks the oom_adj value of the mysqld process and set it to -17. I believe Percona may add a daemon to do this for us in the future.

No comments:

Post a Comment