Featured AI Products
Compute
Build, deploy, and scale cloud compute resources
Containers and Images
Safely store and manage containers and backups
Managed Databases
Fully managed resources running popular database engines
Management and Dev Tools
Control infrastructure and gather insights
Networking
Secure and control traffic to apps
Security
Help protect your account and resources with these security features
Storage
Store and access any amount of data reliably in the cloud
Browse all products
AI/ML
CMS
Data and IoT
Developer Tools
Gaming and Media
GPU
Hosting
Security and Networking
Startups and SMBs
Web and App Platforms
See all solutions
Community
Documentation
Developer Tools
Get Involved
Utilities and Help
Become a Partner
Marketplace
Pricing

- Community
- DigitalOcean
- Community
- DigitalOcean

My k8s cluster seems to die every few days

Posted on March 14, 2019

Nothing seems to trigger it - i have a cluster of 4 x 8GB - so i don’t think it’s the resources. But every now and then the dash reports a 503 and my services become unresponsive.

I have to recycle the nodes before anything comes back up

Has anyone else seen this?

This textbox defaults to using Markdown to format your answer.

You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!

These answers are provided by our Community. If you find them useful, show some love by clicking the heart. If you run into issues leave a comment, or add your own answer to help others.

jarland

March 15, 2019

Greetings!

I’m sorry about the trouble this is causing for you. My instinct is that this is a memory issue, but it’s hard to prove. It’s just that this is so commonly the reason for similar scenarios that it is very reasonable to assume it first. You may want to check logs on one of the application servers to see if this is in fact the case. Here’s a quick read on the OOM killer, which is what I suspect to be the cause:

https://www.thegeekdiary.com/what-is-out-of-memory-oom-killer-in-linux-causes-troubleshooting-mitigation/

If you find that it isn’t the cause, I would try to see what layer this is occurring at. Would it be that one of the nodes is actually not responding on network at all, or just that the application is not responding? If the first, there may be a larger conversation. If the second, it may be worth reviewing the logs generated by the application.

Jarland

Become a contributor for community

Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.

DigitalOcean Documentation

Full documentation for every DigitalOcean product.

Learn more

Resources for startups and AI-native businesses

The Wave has everything you need to know about building a business, from raising funding to marketing your product.

Learn more

Get our newsletter

Stay up to date by signing up for DigitalOcean’s Infrastructure as a Newsletter.

New accounts only. By submitting your email you agree to our Privacy Policy

The developer cloud

Scale up as you grow — whether you're running one virtual machine or ten thousand.

View all products

Start building today

From GPU-powered inference and Kubernetes to managed databases and storage, get everything you need to build, scale, and deploy intelligent applications.

Dark mode is coming soon.