Question

Kubernetes apiserver down, causing network issues.

We’re experiencing some issues in our cluster where our network seems unstable. We noticed deploys failing sometimes because application pods could not connect to dependent services (e.g . database) a couple times. After some poking around I found that Cilium Operator and CoreDNS show a high number of restarts. This seems to be because they could not reach the kube apiserver (or because etcd is down, not sure).

Logs for Cilium Operator

level=info msg="Cilium Operator " subsys=cilium-operator
level=info msg="Starting apiserver on address :9234" subsys=cilium-operator
level=info msg="Connecting to etcd server..." config=/var/lib/etcd-config/etcd.config endpoints="[https://0a18c093-ee32-45d2-a8a6-d630a6242716.internal.k8s.ondigitalocean.com:2379]" subsys=kvstore
level=info msg="Establishing connection to apiserver" host="https://10.245.0.1:443" subsys=k8s
level=info msg="Establishing connection to apiserver" host="https://10.245.0.1:443" subsys=k8s
level=warning msg="Health check status" error="Not able to connect to any etcd endpoints" subsys=cilium-operator
level=error msg="Unable to contact k8s api-server" error="Get https://10.245.0.1:443/api/v1/componentstatuses/controller-manager: dial tcp 10.245.0.1:443: i/o timeout" ipAddr="https://10.245.0.1:443" subsys=k8s
level=fatal msg="Unable to connect to Kubernetes apiserver" error="unable to create k8s client: unable to create k8s client: Get https://10.245.0.1:443/api/v1/componentstatuses/controller-manager: dial tcp 10.245.0.1:443: i/o timeout" subsys=cilium-operator

Logs for CoreDNS

.:53
2019-09-30T09:06:13.965Z [INFO] CoreDNS-1.3.1
2019-09-30T09:06:13.965Z [INFO] linux/amd64, go1.11.4, 6b56a9c
CoreDNS-1.3.1
linux/amd64, go1.11.4, 6b56a9c
2019-09-30T09:06:13.965Z [INFO] plugin/reload: Running configuration MD5 = 2e2180a5eeb3ebf92a5100ab081a6381
E0930 09:06:49.312873       1 reflector.go:251] github.com/coredns/coredns/plugin/kubernetes/controller.go:317: Failed to watch *v1.Endpoints: Get https://10.245.0.1:443/api/v1/endpoints?resourceVersion=2667502&timeout=6m0s&timeoutSeconds=360&watch=true: dial tcp 10.245.0.1:443: connect: connection refused
E0930 09:06:49.312873       1 reflector.go:251] github.com/coredns/coredns/plugin/kubernetes/controller.go:317: Failed to watch *v1.Endpoints: Get https://10.245.0.1:443/api/v1/endpoints?resourceVersion=2667502&timeout=6m0s&timeoutSeconds=360&watch=true: dial tcp 10.245.0.1:443: connect: connection refused
log: exiting because of error: log: cannot create log: open /tmp/coredns.coredns-9d6bf9876-t4jlt.unknownuser.log.ERROR.20190930-090649.1: no such file or directory

Any ideas what might be causing these issues? I’m not sure where to look, as I understand it DO would manage the master node running etcd and kubernetes api server, so what might we be doing that causes those components to fail?


Submit an answer

This textbox defaults to using Markdown to format your answer.

You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!

Sign In or Sign Up to Answer

These answers are provided by our Community. If you find them useful, show some love by clicking the heart. If you run into issues leave a comment, or add your own answer to help others.

Want to learn more? Join the DigitalOcean Community!

Join our DigitalOcean community of over a million developers for free! Get help and share knowledge in Q&A, subscribe to topics of interest, and get courses and tools that will help you grow as a developer and scale your project or business.

Same thing here after a 1.16 upgrade. Did you guys get a solution?

level=info msg="Connecting to kvstore..." address= kvstore=etcd subsys=cilium-operator
level=info msg="Connecting to etcd server..." config=/var/lib/etcd-config/etcd.config endpoints="[https://b0709a0f-c4b9-40d7-a65e-182eabbb3f1a.internal.k8s.ondigitalocean.com:2379]" subsys=kvstore
level=info msg="Starting to synchronize k8s nodes to kvstore..." subsys=cilium-operator
{"level":"warn","ts":"2020-01-08T19:36:34.408Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"passthrough:///https://b0709a0f-c4b9-40d7-a65e-182eabbb3f1a.internal.k8s.ondigitalocean.com:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
{"level":"warn","ts":"2020-01-08T19:36:49.409Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"passthrough:///https://b0709a0f-c4b9-40d7-a65e-182eabbb3f1a.internal.k8s.ondigitalocean.com:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
{"level":"warn","ts":"2020-01-08T19:37:04.411Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"passthrough:///https://b0709a0f-c4b9-40d7-a65e-182eabbb3f1a.internal.k8s.ondigitalocean.com:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
{"level":"warn","ts":"2020-01-08T19:37:19.412Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"passthrough:///https://b0709a0f-c4b9-40d7-a65e-182eabbb3f1a.internal.k8s.ondigitalocean.com:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
level=warning msg="Health check status" error="not able to connect to any etcd endpoints" subsys=cilium-operator
{"level":"warn","ts":"2020-01-08T19:37:34.413Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"passthrough:///https://b0709a0f-c4b9-40d7-a65e-182eabbb3f1a.internal.k8s.ondigitalocean.com:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
level=warning msg="Health check status" error="not able to connect to any etcd endpoints" subsys=cilium-operator
{"level":"warn","ts":"2020-01-08T19:37:49.414Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"passthrough:///https://b0709a0f-c4b9-40d7-a65e-182eabbb3f1a.internal.k8s.ondigitalocean.com:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
level=warning msg="Health check status" error="not able to connect to any etcd endpoints" subsys=cilium-operator
level=fatal msg="Unable to start status api: http: Server closed" subsys=cilium-operator

Hitting the same issue after update the cluster to 1.16:

level=warning msg="Health check status" error="not able to connect to any etcd endpoints" subsys=cilium-operator
{"level":"warn","ts":"2019-12-04T21:49:01.258Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"passthrough:///https://eece3570-12b4-40fa-8f9b-3a2b417d9cf9.internal.k8s.ondigitalocean.com:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadli
ne exceeded"}
level=warning msg="Health check status" error="not able to connect to any etcd endpoints" subsys=cilium-operator
level=fatal msg="Unable to start status api: http: Server closed" subsys=cilium-operator

Running into a similar issue at the moment. Is it worth raising another ticket or has #02979561 been investigated/resolved already?