Question

Supervisord stopped working on 80% of droplets at the same time

I have supervisord (latest version, 4.2.0) on my ubuntu 18.04 droplets in different regions.

Today I saw that exactly at the same time almost all of droplets has dropped in CPU usage from 20-30% to almost zero. It turned out that supervisord stopped working.

In supervisor logs I can see that someone send SIGTERM:

2020-05-16 12:33:59,831 WARN received SIGTERM indicating exit request

The only relevant answer I googled is https://stackoverflow.com/questions/28440543/supervisor-gets-a-sigterm-for-some-reason-quits-and-stops-all-its-processes

However, I’ve checked out that the date of unattended upgrade is different, though minutes are the same, which is suspicious:

Start-Date: 2020-05-15  06:33:13
Commandline: /usr/bin/unattended-upgrade
Upgrade: libjson-c3:amd64 (0.12.1-1.3, 0.12.1-1.3ubuntu0.1)
End-Date: 2020-05-15  06:33:13

Notice that both hours and days are different.

Is it possible to somehow figure out why supervisord stopped working, and how can I prevent this to happen in future? Since it’s crucial for me, I need it up and running for 100% of time :(

Show comments

Submit an answer

This textbox defaults to using Markdown to format your answer.

You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!

Sign In or Sign Up to Answer

These answers are provided by our Community. If you find them useful, show some love by clicking the heart. If you run into issues leave a comment, or add your own answer to help others.

Want to learn more? Join the DigitalOcean Community!

Join our DigitalOcean community of over a million developers for free! Get help and share knowledge in Q&A, subscribe to topics of interest, and get courses and tools that will help you grow as a developer and scale your project or business.

Hi @Akcium,

It seems Supervisord was killed by the server. It received a Sigkill signal which is basically when the server kills processes when they are out of memory. I’ll recommend checking if this is true by ‘grep’-ing in /var/log/messages for oom or kill. Here is an example

grep -i oom /var/log/messages
grep -i kill /var/log/messages

Most probably in the logs, the time will match with what you saw in your supervisord log. Now, you know the server killed them however you’ll need to find out why. If you see the oom signal, it means the server was out of memory. You can confirm this by using the sar command like so:

sar -r 

It will show you for a certain period of time what was your memory usage.

That would be a good way to start troubleshooting.

Regards, KDSys