Debug : AWS High CPU Load on MySQL – AWS, Amazon RDS
This is a case study for solving AWS’s high CPU load on RDS MySql DB instances.
Let us go through step by step to understand the situation and the process to solve the issue of the high CPU load of 100% utilization.
Table of Contents
One of our new customers, Came to us with a high CPU load in his MySql DB instances even while running an m1.large MySQL RDS server.
As per the log from last week or so CPU utilization has been in the high 90s, reaching 100% or thereabouts consistently from 5 days.
The quick solution that customers recommend us is to upgrade instances to xlarge however, we were able to solve this by debugging slow Mysql query and MySQL sleep query.
Let us now understand the debug process.
I’ve run a show process list on the MySQL server and have been monitoring the same through MySQL admin.
There seem to be any particularly long-running queries or a high volume of queries.
There’re a couple of processes lying in a sleep state for a long time… these are isolated workers’ daemons running outside of our main app which communicates with the database.
I’ve copied in the process list output below with server names changed to give a description of what they are:
rdsadmin | localhost:43513 | mysql | Sleep | 14 | | NULL | | 15 | proddbuser | app-server-1.eu-west-
1.compute.internal:36460 | proddb | Sleep | 46 | | NULL | | 451 | proddbuser | app-server-1.eu-west-
1.compute.internal:55512 | proddb | Sleep | 29 | | NULL | | 912 | proddbuser | app-server-1.eu-west-
1.compute.internal:45171 | proddb | Sleep | 13 | | NULL | | 941 | proddbuser | app-server-1.eu-west-
You may notice in the “Host” field of SHOW PROCESSLIST output not only host but also port is specified, showing you something like
This port can be used to identify the process which owns the connection in question.
Login into ec2 or any server which is running the query into this overloaded RDS.
In our case it was a Linux based ec2, for Linux run the following command:
netstat -ntp | grep :36460
tcp 0 0 192.168.1.70:36460 192.168.1.82:3306 ESTABLISHED 28540/php-cgi
We can find php-cgi is holding the connection in question, Now you know the process and you can use your favorite tools to check what that process is doing.
[root@w1 ~]# netstat -ntp | grep 36460
tcp 0 0 192.168.1.70:58555 192.168.1.90:11211 ESTABLISHED 36460/php-cgi
tcp 0 0 192.168.1.70:52711 192.168.1.88:8080 ESTABLISHED 36460/php-cgi
tcp 0 0 192.168.1.70:45384 192.168.1.82:3306 ESTABLISHED 36460/php-cgi
tcp 0 0 192.168.1.70:45399 192.168.1.82:3306 ESTABLISHED 36460/php-cgi
tcp 0 0 192.168.1.70:45407 192.168.1.82:3306 ESTABLISHED 36460/php-cgi
tcp 0 0 192.168.1.70:45408 192.168.1.82:3306 ESTABLISHED 36460/php-cgi
tcp 0 0 192.168.1.70:35556 192.168.1.92:11211 ESTABLISHED 36460/php-cgi
Using the same netstat command and filtering on the PID we can find which connections does this process has.
Here you can see it has a couple of memcached connections.
Few MySQL connections (to the same host, which is usually a bad idea) and connections to some external web server.
You can use strace -p to see what the host is doing, it often gives a clue.
In this case, I, for example, found the process is stuck in pool() system call reading from a network. Using netstat can give you an idea of what it can be but if you do not like guessing you can use gdb -p.
It will not print your exact line of code in PHP which is running but can give you some good ideas – for example in this case I could find stack trace originated from PHP stream functions not from libmysql or memcache.so, which means it is not MySQL or memcache connections leaving the last candidate as the only choice.
I also could see some of the variables in GDB “bt” command output which also hinted what could be the problem.
Yet another great tool that you can use is server-status if you’re running apache.
This way you will see the URL which that process is processing and so get a few more hints on what may be happening or even get a repeatable example in some cases.
The tools I have mentioned regarding figuring out what is happening with the process are not only helpful to debug sleeping connections with MySQL but many other cases when you see web applications locking up or starting to run in the tight loop consuming too much CPU time.
You can also enable slow_query_log in DB Parameter Groups on AWS RDS, This will keep a log of all slow query which can be helpful to you.
If you happen to find more ways to debug high CPU load on AWS RDS, feel free to share the same in the comments below.
Leave a Comment
Subscribe: Trusted By 1M+ Readers
Get the weekly Tech Update straight to your inbox.