Kubernetes apiserver down, causing network issues.

September 30, 2019 1.9k views
Kubernetes

We’re experiencing some issues in our cluster where our network seems unstable. We noticed deploys failing sometimes because application pods could not connect to dependent services (e.g . database) a couple times. After some poking around I found that Cilium Operator and CoreDNS show a high number of restarts. This seems to be because they could not reach the kube apiserver (or because etcd is down, not sure).

Logs for Cilium Operator

level=info msg="Cilium Operator " subsys=cilium-operator
level=info msg="Starting apiserver on address :9234" subsys=cilium-operator
level=info msg="Connecting to etcd server..." config=/var/lib/etcd-config/etcd.config endpoints="[https://0a18c093-ee32-45d2-a8a6-d630a6242716.internal.k8s.ondigitalocean.com:2379]" subsys=kvstore
level=info msg="Establishing connection to apiserver" host="https://10.245.0.1:443" subsys=k8s
level=info msg="Establishing connection to apiserver" host="https://10.245.0.1:443" subsys=k8s
level=warning msg="Health check status" error="Not able to connect to any etcd endpoints" subsys=cilium-operator
level=error msg="Unable to contact k8s api-server" error="Get https://10.245.0.1:443/api/v1/componentstatuses/controller-manager: dial tcp 10.245.0.1:443: i/o timeout" ipAddr="https://10.245.0.1:443" subsys=k8s
level=fatal msg="Unable to connect to Kubernetes apiserver" error="unable to create k8s client: unable to create k8s client: Get https://10.245.0.1:443/api/v1/componentstatuses/controller-manager: dial tcp 10.245.0.1:443: i/o timeout" subsys=cilium-operator

Logs for CoreDNS

.:53
2019-09-30T09:06:13.965Z [INFO] CoreDNS-1.3.1
2019-09-30T09:06:13.965Z [INFO] linux/amd64, go1.11.4, 6b56a9c
CoreDNS-1.3.1
linux/amd64, go1.11.4, 6b56a9c
2019-09-30T09:06:13.965Z [INFO] plugin/reload: Running configuration MD5 = 2e2180a5eeb3ebf92a5100ab081a6381
E0930 09:06:49.312873       1 reflector.go:251] github.com/coredns/coredns/plugin/kubernetes/controller.go:317: Failed to watch *v1.Endpoints: Get https://10.245.0.1:443/api/v1/endpoints?resourceVersion=2667502&timeout=6m0s&timeoutSeconds=360&watch=true: dial tcp 10.245.0.1:443: connect: connection refused
E0930 09:06:49.312873       1 reflector.go:251] github.com/coredns/coredns/plugin/kubernetes/controller.go:317: Failed to watch *v1.Endpoints: Get https://10.245.0.1:443/api/v1/endpoints?resourceVersion=2667502&timeout=6m0s&timeoutSeconds=360&watch=true: dial tcp 10.245.0.1:443: connect: connection refused
log: exiting because of error: log: cannot create log: open /tmp/coredns.coredns-9d6bf9876-t4jlt.unknownuser.log.ERROR.20190930-090649.1: no such file or directory

Any ideas what might be causing these issues? I’m not sure where to look, as I understand it DO would manage the master node running etcd and kubernetes api server, so what might we be doing that causes those components to fail?

1 Answer

Hi there,

Please open a support ticket so that we can look further into the cluster on your account.

Regards,

John Kwiatkoski

Have another answer? Share your knowledge.