Question

My k8s cluster seems to die every few days

Nothing seems to trigger it - i have a cluster of 4 x 8GB - so i don’t think it’s the resources. But every now and then the dash reports a 503 and my services become unresponsive.

I have to recycle the nodes before anything comes back up

Has anyone else seen this?


Submit an answer

This textbox defaults to using Markdown to format your answer.

You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!

Sign In or Sign Up to Answer

These answers are provided by our Community. If you find them useful, show some love by clicking the heart. If you run into issues leave a comment, or add your own answer to help others.

Want to learn more? Join the DigitalOcean Community!

Join our DigitalOcean community of over a million developers for free! Get help and share knowledge in Q&A, subscribe to topics of interest, and get courses and tools that will help you grow as a developer and scale your project or business.

Greetings!

I’m sorry about the trouble this is causing for you. My instinct is that this is a memory issue, but it’s hard to prove. It’s just that this is so commonly the reason for similar scenarios that it is very reasonable to assume it first. You may want to check logs on one of the application servers to see if this is in fact the case. Here’s a quick read on the OOM killer, which is what I suspect to be the cause:

https://www.thegeekdiary.com/what-is-out-of-memory-oom-killer-in-linux-causes-troubleshooting-mitigation/

If you find that it isn’t the cause, I would try to see what layer this is occurring at. Would it be that one of the nodes is actually not responding on network at all, or just that the application is not responding? If the first, there may be a larger conversation. If the second, it may be worth reviewing the logs generated by the application.

Jarland