The applications in our Kubernetes clusters seem to experience issues with looking up DNS from time to time. We see it regularly for our applications connecting to the databases hosted in DO (Redis, Postgres and Mongo). We have also experienced it with name resolution for other services as well.
It appears to be an issue with CoreDNS. The logs of CoreDNS seem to show that the application in itself isn’t very healthy as it prints a lot of the following:
[WARNING] plugin/health: Local health request to "http://:8080/health" failed: Get "http://:8080/health": context deadline exceeded (Client.Timeout exceeded while awaiting headers) [WARNING] plugin/health: Local health request to "http://:8080/health" took more than 1s: 1.153882521s
The following log entry is especially interesting:
[ERROR] plugin/errors: REDACTED.mongo.ondigitalocean.com. A: dial udp REDACTED:53: i/o timeout
CoreDNS also continuously spams the following two log lines:
[WARNING] No files matching import glob pattern: custom/*.override [WARNING] No files matching import glob pattern: custom/*.server
We have done no configuration for CoreDNS - the cluster is a simple setup using Terraform for provisioning, and it’s all the default configuration.
Any help is appreciated!
Edit: Here is the specification for our cluster setup and CoreDNS:
Nodes: K8s version: 1.22.8 OS: Debian GNU/Linux 10 (buster) Kernel Version: 5.10.0-0.bpo.9-amd64 Container runtime: containerd://1.4.13 CoreDNS: Image: docker.io/coredns/coredns:1.8.4
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!
These answers are provided by our Community. If you find them useful, show some love by clicking the heart. If you run into issues leave a comment, or add your own answer to help others.
Click below to sign up and get $200 of credit to try our products over 60 days!
We are seeing the same issues with our DOKS cluster as well. The cluster was running fine since almost 2 months, the issue started appearing in last few weeks.