Question

Kubernetes nodes unhealthy but pods are still running, how do I troubleshoot?

I’ve been getting familiar with Kubernetes and have a 2 node pool with a 2 pod application load-balanced.

Lately, the health checks have been failing and the application is going down.

When I run kubectl get nodes both nodes have a status of ready. when I run kubectl get pods all pods are running with 0 restarts.

How does the health check work and how can I debug/troubleshoot this issue to figure out why the nodes are becoming unhealthy so that I can resolve the issue.


Submit an answer

This textbox defaults to using Markdown to format your answer.

You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!

Sign In or Sign Up to Answer

These answers are provided by our Community. If you find them useful, show some love by clicking the heart. If you run into issues leave a comment, or add your own answer to help others.

Want to learn more? Join the DigitalOcean Community!

Join our DigitalOcean community of over a million developers for free! Get help and share knowledge in Q&A, subscribe to topics of interest, and get courses and tools that will help you grow as a developer and scale your project or business.

I had this problem a while back, and at least for me the problem was that the both instances of the ingress service I was using (nginx) were running on the same node. Because of that one node was responding and one was not.

To check if you’re having the same problem, you can check how many instances of your ingress controller are running and on which nodes.

If this is your problem, I solved it by adding this to my deployment spec:

      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchExpressions:
                  - key: "app"
                    operator: In
                    values:
                    - nginx-ingress-pod
              topologyKey: "kubernetes.io/hostname"

However, I think the ideal solution would be to run the the service as a daemon set: https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/ (instead of running it as a deployment)

I haven’t made that change yet myself, but would like to at some point.

Hi there!

The LB is typically checking a specific nodePort service not just checking whether a node is healthy. I understand this can be confusing with the current iteration of the LB UI.

The externaltrafficpolicy setting on some kubernetes services may be set to “Local”. This means a node without a pod running locally for that service will reject the LB healthcheck and show as down. In other words, to save on overhead from unnecessary network hops between nodes, only nodes hosting pods for the LoadBalancer service will report as healthy.

With externaltrafficpolicy set to “cluster”, all nodes forward that traffic to other nodes that are hosting pods for that service. In this case even a node not hosting a pod for that particular service would show as “Healthy” as it would then just forward it to a node that can service the request.

To change this setting for a particular service use the following command:

kubectl patch svc myservice -p ‘{“spec”:{“externalTrafficPolicy”:“Cluster”}}’

An important thing to note here is that if using the externaltrafficpolicy of “Cluster” you will lose the original client IP address due to this extra network hop. So if your application is checking for or dependent on knowing the client IP, the “Local” setting is required.

You can find more information on externaltrafficpolicy here: https://kubernetes.io/docs/tutorials/services/source-ip/#source-ip-for-services-with-type-nodeport

Regards,

John Kwiatkoski Senior Developer Support Engineer - Kubernetes