Report this

What is the reason for this report?

Supervisord stopped working on 80% of droplets at the same time

Posted on May 17, 2020

I have supervisord (latest version, 4.2.0) on my ubuntu 18.04 droplets in different regions.

Today I saw that exactly at the same time almost all of droplets has dropped in CPU usage from 20-30% to almost zero. It turned out that supervisord stopped working.

In supervisor logs I can see that someone send SIGTERM:

2020-05-16 12:33:59,831 WARN received SIGTERM indicating exit request

The only relevant answer I googled is https://stackoverflow.com/questions/28440543/supervisor-gets-a-sigterm-for-some-reason-quits-and-stops-all-its-processes

However, I’ve checked out that the date of unattended upgrade is different, though minutes are the same, which is suspicious:

Start-Date: 2020-05-15  06:33:13
Commandline: /usr/bin/unattended-upgrade
Upgrade: libjson-c3:amd64 (0.12.1-1.3, 0.12.1-1.3ubuntu0.1)
End-Date: 2020-05-15  06:33:13

Notice that both hours and days are different.

Is it possible to somehow figure out why supervisord stopped working, and how can I prevent this to happen in future? Since it’s crucial for me, I need it up and running for 100% of time :(



This textbox defaults to using Markdown to format your answer.

You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!

These answers are provided by our Community. If you find them useful, show some love by clicking the heart. If you run into issues leave a comment, or add your own answer to help others.

Hi @Akcium,

It seems Supervisord was killed by the server. It received a Sigkill signal which is basically when the server kills processes when they are out of memory. I’ll recommend checking if this is true by ‘grep’-ing in /var/log/messages for oom or kill. Here is an example

grep -i oom /var/log/messages
grep -i kill /var/log/messages

Most probably in the logs, the time will match with what you saw in your supervisord log. Now, you know the server killed them however you’ll need to find out why. If you see the oom signal, it means the server was out of memory. You can confirm this by using the sar command like so:

sar -r 

It will show you for a certain period of time what was your memory usage.

That would be a good way to start troubleshooting.

Regards, KDSys

The developer cloud

Scale up as you grow — whether you're running one virtual machine or ten thousand.

Get started for free

Sign up and get $200 in credit for your first 60 days with DigitalOcean.*

*This promotional offer applies to new accounts only.