Question

k8s nodes are in NotReady state

I’ve woken up to find my services have all fallen over. Investigating that I’ve found all my k8s nodes are in NotReady state. Deploying isn’t working.

No notifications about this happening. No emails. Nothing from DO to say “by the way, your nodes have fallen over”.

Can someone from DO help me with this?

Subscribe
Share

Hi @davidAngler - no idea why it did it. DO said my nodes were OOM. But I think that’s inaccurate. A few other peoples k8s clusters had a fit, not just mine, clearly. I had to get onto twitter before they’d give any support. How you got it sorted dude.

@colinjohnriddell Getting the same answer from DO,

They seem okay but they seem to say the node OS went in “panic” mode, meaning out of resources, software is quite light.

I will try a stronger pool, and move the resources there and play with podAntiAffinities to prefer them on different nodes, because when it came back online everything went on the same node. :|

Anyways Im sure its probably DO playing in the back and telling us a different story.

Hey bud, I have the same issue, do you know why it did it?


Submit an answer
You can type!ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!

These answers are provided by our Community. If you find them useful, show some love by clicking the heart. If you run into issues leave a comment, or add your own answer to help others.

Hello @colinjohnriddell ,

NotReady status on a node can be caused due to multiple reasons::

  • The node kubelet service has stopped running.
  • The container runtime(Docker) has stopped running.
  • The node VM is no longer available.
  • Resource contention on the Nodes.

It is a best practice that Kubernetes nodes should be treated as ephemeral. Because of this, it is common to recycle a node that has an issue to replace it with a healthy node. This can fix many common problems specific to nodes. Generally, we see Node in Not Ready state due to the lack of resources.

If you want to check about the specific incident you can review events around the nodes using the following commands:

kubectl get nodes kubectl describe node <name_of_node> kubectl get events n kube​system

Coming to the notification option, at present, this feature is not there. However, this is already there in our roadmap. I don’t have a specific ETA for it. Our product team always look for such feature request and product feedback, I request you to vote/add on the idea here and subscribe for updates: https://ideas.digitalocean.com/ideas/

We use that page to help gauge demand for new features, so adding it, or adding your vote, will help us to prioritize when we can implement this feature.

I hope this helps!

Best Regards, Purnima Kumari Developer Support Engineer II, DigitalOcean

No answer to this. something happened with DO clusters. DO blamed my nodes being OOM. They’re fine now and they were fine before.

Hi, how are you defining your pods for your service(s)? It’s not clear from your original post. If you have a public repository, can you drop a link in the comments? Well, I must go and I look forward to any additional feedback.

Think different and code well,

-Conrad