Skip to content

Debug : AWS High CPU Load on MySQL – AWS, Amazon RDS

Amazon cloud hosting services is a really amazing platform, however, you should have in-depth knowledge of technical know-how to Manage AWS cloud hosting.
Reading Time: 3 minutes

This is a case study for solving AWS’s high CPU load on RDS MySql DB instances.

Let us go through step by step to understand the situation and the process to solve the issue of the high CPU load of 100% utilization.

Table of Contents

Scenario

One of our new customers, Came to us with a high CPU load in his MySql DB instances even while running an m1.large MySQL RDS server. 

As per the log from last week or so CPU utilization has been in the high 90s, reaching 100% or thereabouts consistently from 5 days. 

The quick solution that customers recommend us is to upgrade instances to xlarge however, we were able to solve this by debugging slow Mysql query and MySQL sleep query.

cpu utilization

Let us now understand the debug process. 

Debug

I’ve run a show process list on the MySQL server and have been monitoring the same through MySQL admin. 

There seem to be any particularly long-running queries or a high volume of queries. 

There’re a couple of processes lying in a sleep state for a long time… these are isolated workers’ daemons running outside of our main app which communicates with the database.

 I’ve copied in the process list output below with server names changed to give a description of what they are:

rdsadmin | localhost:43513 | mysql | Sleep | 14 | | NULL | | 15 | proddbuser | app-server-1.eu-west-
1.compute.internal:36460 | proddb | Sleep | 46 | | NULL | | 451 | proddbuser | app-server-1.eu-west-
1.compute.internal:55512 | proddb | Sleep | 29 | | NULL | | 912 | proddbuser | app-server-1.eu-west-
1.compute.internal:45171 | proddb | Sleep | 13 | | NULL | | 941 | proddbuser | app-server-1.eu-west-

You may notice in the “Host” field of SHOW PROCESSLIST output not only host but also port is specified, showing you something like

“1.compute.internal:36460”

This port can be used to identify the process which owns the connection in question.

Login into ec2 or any server which is running the query into this overloaded RDS.

In our case it was a Linux based ec2, for Linux run the following command:

netstat -ntp | grep :36460

tcp 0 0 192.168.1.70:36460 192.168.1.82:3306 ESTABLISHED 28540/php-cgi

We can find php-cgi is holding the connection in question, Now you know the process and you can use your favorite tools to check what that process is doing.

[root@w1 ~]# netstat -ntp | grep 36460

tcp 0 0 192.168.1.70:58555 192.168.1.90:11211 ESTABLISHED 36460/php-cgi
tcp 0 0 192.168.1.70:52711 192.168.1.88:8080 ESTABLISHED 36460/php-cgi
tcp 0 0 192.168.1.70:45384 192.168.1.82:3306 ESTABLISHED 36460/php-cgi
tcp 0 0 192.168.1.70:45399 192.168.1.82:3306 ESTABLISHED 36460/php-cgi
tcp 0 0 192.168.1.70:45407 192.168.1.82:3306 ESTABLISHED 36460/php-cgi
tcp 0 0 192.168.1.70:45408 192.168.1.82:3306 ESTABLISHED 36460/php-cgi
tcp 0 0 192.168.1.70:35556 192.168.1.92:11211 ESTABLISHED 36460/php-cgi

Using the same netstat command and filtering on the PID we can find which connections does this process has.

Here you can see it has a couple of memcached connections.

Few MySQL connections (to the same host, which is usually a bad idea) and connections to some external web server.

You can use strace -p to see what the host is doing, it often gives a clue. 

In this case, I, for example, found the process is stuck in pool() system call reading from a network. Using netstat can give you an idea of what it can be but if you do not like guessing you can use gdb -p. 

It will not print your exact line of code in PHP which is running but can give you some good ideas – for example in this case I could find stack trace originated from PHP stream functions not from libmysql or memcache.so, which means it is not MySQL or memcache connections leaving the last candidate as the only choice. 

I also could see some of the variables in GDB “bt” command output which also hinted what could be the problem.

Yet another great tool that you can use is server-status if you’re running apache. 

This way you will see the URL which that process is processing and so get a few more hints on what may be happening or even get a repeatable example in some cases.

The tools I have mentioned regarding figuring out what is happening with the process are not only helpful to debug sleeping connections with MySQL but many other cases when you see web applications locking up or starting to run in the tight loop consuming too much CPU time.

Conclusion

You can also enable slow_query_log in DB Parameter Groups on AWS RDS, This will keep a log of all slow query which can be helpful to you.

If you happen to find more ways to debug high CPU load on AWS RDS, feel free to share the same in the comments below.

Facebook
Twitter
LinkedIn
Digg
Shashi kant Pandidhar

Shashi kant Pandidhar

I’ve been helping businesses to be online for over 15 years. Today my team and I, focus on helping real businesses to overcome real-life challenges and analyse data in a way that can help businesses grow in the right direction of this digital age.

Leave a Comment

Get the latest news and deals

Sign up for email updates covering blogs, offers, and lots more.

Current Deals at Netspace

Subscribe: Trusted By 1M+ Readers

Get the weekly Tech Update straight to your inbox.

WeCreativez WhatsApp Support
Our customer sales team is here to answer your questions. Ask us anything!
👋 Hi, how can I help?