Using DO managed kubernetes cluster with helm chart stable/prometheus results in some node_exporters being unreachable.

I have three nodes in the cluster. Prometheus pods (which include server, alertmanager, node_exporter etc) start just fine. Unfortunately 2 of the 3 node_exporters cannot be reached. This seems like it must be some issue with flannel, but I don’t know how to begin to debug this.

Prometheus itself (the dashboard) reports the error “context deadline exceeded” for the 2 node_exporter pods. When I create a single “curl” pod for curling ClusterIPs, the curl command hangs when trying to connect to these two.

So the question is how does one verify that flannel is functioning correctly?


helm install --name prometheus-service stable/prometheus kubectl port-forward prometheus-service-server-<id> 9090 http://localhost:9090/targets (view in browser)

And see that some (perhaps all but one) of the node_exporter pods report “context deadline exceeded”.

Submit an answer

This textbox defaults to using Markdown to format your answer.

You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!

Sign In or Sign Up to Answer

These answers are provided by our Community. If you find them useful, show some love by clicking the heart. If you run into issues leave a comment, or add your own answer to help others.

Want to learn more? Join the DigitalOcean Community!

Join our DigitalOcean community of over a million developers for free! Get help and share knowledge in Q&A, subscribe to topics of interest, and get courses and tools that will help you grow as a developer and scale your project or business.

Make sure to open tcp/9100 from in the DO firewall panel of your cluster.

You might also have noticed that prometheus fails to get kubelet metrics. Watch this question for updates:

I opened port 9100 via the droplet networking control panel, and it now seems to work, and it looks like because there’s one firewall config for the cluster, that is applied to subsequent nodes that are spun up. The same sources as the rest of the k8s rules can be used.

Would be curious to see if this worked for you.

I ran into the same issue. The reason for this is probably that the node-exporter are configured with hostNetwork: true. This is required to be able to scrape some networking metrics but also mean that it runs in the host network namespace. It seems that this traffic is not going through Cilium but directly via the private network. Therefore we need to add a firewall rule as @manelpb already stated. Nevertheless you can leave out (I believe it’s the load balancer subnet), so allowing and as source ip range on port 9100 works.