How to debug DO managed DB throughput/latency spikes?

Posted February 15, 2020 1.1k views
DigitalOcean Managed MySQL Database

I recently moved my WordPress database to a DO managed MySQL db. So, I have a single droplet connected over a private network to a single DO db. Since then, I have been seeing occasional large spikes in db throughput and latency. The spikes are very short in duration, but significant in terms of load on the db. Here is a screenshot showing a recent spike. The spikes appear to be triggered by some process on the droplet since the droplet also has a load average spike at the same time as the db spike, and also shows a corresponding data transfer spike on the private network. Here is a screenshot showing the droplet spike corresponding to the db spike shown in the other screenshot. There is no spike in public network bandwidth, so the db data seems to be requested by some process internal to the droplet. There are no cron jobs (e.g., such as a db backup) scheduled at the time of the spike. The spikes happen once or twice a day at varying times. There are no logs in syslog that correspond to the spikes.

I need to find out what process is requesting so much data from the db during these spikes, but I’m not sure how best to go about it. The spikes are too short in duration to catch them while they’re happening. Does anyone have any ideas?

These answers are provided by our Community. If you find them useful, show some love by clicking the heart. If you run into issues leave a comment, or add your own answer to help others.

Submit an Answer
1 answer

Unfortunately, there are very limited profiling and debugging information available on managed DB. Unlike AWS RDS/Aurora, you cannot see much beyond the DB CPU and network utilization.

This is more akin to the AWS Lightsail managed instances. I tried both.

There are things you can do, however. First, I have successfully utilized MySQL Workbench (on Windows) with DO’s managed DBs. With it, you can gather some performance metrics such as cache utilization, key efficiency or GET/second.

The issue is that this is in Real-Time, so you can go back using logs.

Looking at your screenshot, I wonder if the number of GET/seconds also increases (I assume it does since Throughput goes up), and that would explain why latency is going up and requests are starting to stack-up, probably.

If this is a website, this could be related to an uptick in search bot activity, as they tend to crawl a bunch of uncached content at once, then stop when latency goes up.

(check your Apache or web-server logs, and try to match the time of the spike)

  • Thanks for the tips, Zehubert! I’ll check into Workbench and see what it can do for me. Your observation about bots is interesting… that may well explain what’s happening, although I would expect a bump in the public bandwidth in that situation. Google Analytics doesn’t measure bot traffic, but I may run my logs through a traditional log analyzer and see what it reveals.

    • Np! By thw way, I meant that since the MySQL worbench server status works in real-time, you CANNOT use it to analize, after the fact. Sorry for the typo and good luck!