Question

Nginx & Passenger stopping / crashing on Ubuntu droplet intermittently

Posted March 20, 2021 495 views
Ruby on RailsNginxUbuntu 20.04

I have a small Rails app that I’m running on a $5/mo droplet. At the moment it gets almost no traffic, me and a few other people. I first deployed it back in early December and for about two months it worked without issue, however for the last month or so Nginx has been shutting down intermittently. I’ll restart and within 2-6 hours the site will be down.

This is the nginx error log:

2021/03/18 01:21:55 [info] 283837#283837: Using 32768KiB of shared memory for nchan in /etc/nginx/nginx.conf:63
2021/03/18 01:21:56 [notice] 283841#283841: signal process started
[ N 2021-03-18 01:21:56.4329 269553/T6 age/Cor/CoreMain.cpp:670 ]: Signal received. Gracefully shutting down... (send signal 2 more time(s) to force shutdown)
[ N 2021-03-18 01:21:56.4329 269553/T1 age/Cor/CoreMain.cpp:1245 ]: Received command to shutdown gracefully. Waiting until all clients have disconnected...
[ N 2021-03-18 01:21:56.4330 269553/T1 age/Cor/CoreMain.cpp:1146 ]: Checking whether to disconnect long-running connections for process 269652, application /srv/project/current (production)
[ N 2021-03-18 01:21:56.4332 269553/T9 Ser/Server.h:901 ]: [ApiServer] Freed 0 spare client objects
[ N 2021-03-18 01:21:56.4332 269553/T9 Ser/Server.h:558 ]: [ApiServer] Shutdown finished
[ N 2021-03-18 01:21:56.4332 269553/T6 Ser/Server.h:901 ]: [ServerThr.1] Freed 0 spare client objects
[ N 2021-03-18 01:21:56.4333 269553/T6 Ser/Server.h:558 ]: [ServerThr.1] Shutdown finished
[ N 2021-03-18 01:21:56.4372 269553/T1 age/Cor/CoreMain.cpp:1146 ]: Checking whether to disconnect long-running connections for process 269652, application /srv/project/current (production)
[ N 2021-03-18 01:21:56.5257 283842/T1 age/Wat/WatchdogMain.cpp:1373 ]: Starting Passenger watchdog...
[ N 2021-03-18 01:21:56.5981 283848/T1 age/Cor/CoreMain.cpp:1340 ]: Starting Passenger core...
[ N 2021-03-18 01:21:56.5983 283848/T1 age/Cor/CoreMain.cpp:256 ]: Passenger core running in multi-application mode.
[ N 2021-03-18 01:21:56.6164 283848/T1 age/Cor/CoreMain.cpp:1015 ]: Passenger core online, PID 283848
[ N 2021-03-18 01:21:57.1347 269553/T1 age/Cor/CoreMain.cpp:1325 ]: Passenger core shutdown finished
2021/03/18 01:21:58 [notice] 283869#283869: signal process started
[ N 2021-03-18 01:21:58.7808 283848/T6 age/Cor/CoreMain.cpp:670 ]: Signal received. Gracefully shutting down... (send signal 2 more time(s) to force shutdown)
[ N 2021-03-18 01:21:58.7808 283848/T1 age/Cor/CoreMain.cpp:1245 ]: Received command to shutdown gracefully. Waiting until all clients have disconnected...
[ N 2021-03-18 01:21:58.7810 283848/Tb Ser/Server.h:901 ]: [ApiServer] Freed 0 spare client objects
[ N 2021-03-18 01:21:58.7810 283848/Tb Ser/Server.h:558 ]: [ApiServer] Shutdown finished
[ N 2021-03-18 01:21:58.7810 283848/T6 Ser/Server.h:901 ]: [ServerThr.1] Freed 0 spare client objects
[ N 2021-03-18 01:21:58.7810 283848/T6 Ser/Server.h:558 ]: [ServerThr.1] Shutdown finished

I also did a grep through the logs to find out if I was running out of memory, I know very little about devops and frankly if that’s the issue I’m not sure where I would start. I did find this:

/var/log/kern.log:Mar 17 01:51:45 kernel: [8741879.980107] Out of memory: Killed process 243117 (node) total-vm:976588kB, anon-rss:232900kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:6136kB oom_score_adj:0
/var/log/kern.log:Mar 17 01:52:37 kernel: [8741931.472999] Out of memory: Killed process 227427 (ruby) total-vm:501472kB, anon-rss:114644kB, file-rss:892kB, shmem-rss:0kB, UID:1000 pgtables:432kB oom_score_adj:0
/var/log/kern.log:Mar 17 02:44:46  kernel: [8745060.656866] Out of memory: Killed process 250331 (node) total-vm:1017780kB, anon-rss:258104kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:6980kB oom_score_adj:0

None of these timestamps line up exactly with when the site actually went down (or even the 5-minute window between UptimeRobot checking on it), but it does seem like Ubuntu is killing things due to a memory issue.

I’m looking of course for a solution that will keep Nginx running smoothly all the time and not crashing every few hours. Short of that I’m wondering if there are any clues or logs I should be looking into.

These answers are provided by our Community. If you find them useful, show some love by clicking the heart. If you run into issues leave a comment, or add your own answer to help others.

×
Submit an Answer
1 answer

Hello @tfantinaStarfish ,

Thanks for sharing the error logs. As you anticipated, it appears the application is running out of memory.

You can enter the Droplet via web console or SSH to explore the CPU usage on the Droplet:

https://www.digitalocean.com/community/tutorials/how-to-monitor-cpu-use-on-digitalocean-droplets

https://www.digitalocean.com/community/tutorials/how-to-use-ps-kill-and-nice-to-manage-processes-in-linux

This would provide better insight into what is using the CPU usage and contributing to the load within the Droplet.

If you think the process which is consuming the high CPU usage is a mandatory process related to your application, then I would recommend you to resize your Droplet to next size.

https://www.digitalocean.com/pricing/

Please see our tutorial on how to resize Droplets:

https://www.digitalocean.com/docs/droplets/how-to/resize/

Please note that once you resize to a larger disk, you will no longer be able to resize to a smaller plan. This is because there will be a disk mismatch and we do not currently support shrinking a virtual server’s disk.

Hope this helps!

Cheers,
Lalitha

by Justin Ellingwood
Process management is an essential skill when using any computer system. This is especially true when administrating a server environment. This article will introduce some powerful tools that can be used to manage processes on a Linux system.
  • @Lalitha thank you for your comments. I’ve installed the monitoring and have been looking into my system. This just happened again a few minutes ago, I did not receive any alerts. When I check the CPU usage graph for the last hour no events leading up to the shutdown even come close to 2% usage. I’ve restarted and will continue to watch.