Kubernetes apiserver down, causing network issues.

Question

We’re experiencing some issues in our cluster where our network seems unstable. We noticed deploys failing sometimes because application pods could not connect to dependent services (e.g . database) a couple times. After some poking around I found that Cilium Operator and CoreDNS show a high number of restarts. This seems to be because they could not reach the kube apiserver (or because etcd is down, not sure).

Logs for Cilium Operator

level=info msg="Cilium Operator " subsys=cilium-operator
level=info msg="Starting apiserver on address :9234" subsys=cilium-operator
level=info msg="Connecting to etcd server..." config=/var/lib/etcd-config/etcd.config endpoints="[https://0a18c093-ee32-45d2-a8a6-d630a6242716.internal.k8s.ondigitalocean.com:2379]" subsys=kvstore
level=info msg="Establishing connection to apiserver" host="https://10.245.0.1:443" subsys=k8s
level=info msg="Establishing connection to apiserver" host="https://10.245.0.1:443" subsys=k8s
level=warning msg="Health check status" error="Not able to connect to any etcd endpoints" subsys=cilium-operator
level=error msg="Unable to contact k8s api-server" error="Get https://10.245.0.1:443/api/v1/componentstatuses/controller-manager: dial tcp 10.245.0.1:443: i/o timeout" ipAddr="https://10.245.0.1:443" subsys=k8s
level=fatal msg="Unable to connect to Kubernetes apiserver" error="unable to create k8s client: unable to create k8s client: Get https://10.245.0.1:443/api/v1/componentstatuses/controller-manager: dial tcp 10.245.0.1:443: i/o timeout" subsys=cilium-operator

Logs for CoreDNS

.:53
2019-09-30T09:06:13.965Z [INFO] CoreDNS-1.3.1
2019-09-30T09:06:13.965Z [INFO] linux/amd64, go1.11.4, 6b56a9c
CoreDNS-1.3.1
linux/amd64, go1.11.4, 6b56a9c
2019-09-30T09:06:13.965Z [INFO] plugin/reload: Running configuration MD5 = 2e2180a5eeb3ebf92a5100ab081a6381
E0930 09:06:49.312873       1 reflector.go:251] github.com/coredns/coredns/plugin/kubernetes/controller.go:317: Failed to watch *v1.Endpoints: Get https://10.245.0.1:443/api/v1/endpoints?resourceVersion=2667502&timeout=6m0s&timeoutSeconds=360&watch=true: dial tcp 10.245.0.1:443: connect: connection refused
E0930 09:06:49.312873       1 reflector.go:251] github.com/coredns/coredns/plugin/kubernetes/controller.go:317: Failed to watch *v1.Endpoints: Get https://10.245.0.1:443/api/v1/endpoints?resourceVersion=2667502&timeout=6m0s&timeoutSeconds=360&watch=true: dial tcp 10.245.0.1:443: connect: connection refused
log: exiting because of error: log: cannot create log: open /tmp/coredns.coredns-9d6bf9876-t4jlt.unknownuser.log.ERROR.20190930-090649.1: no such file or directory

Any ideas what might be causing these issues? I’m not sure where to look, as I understand it DO would manage the master node running etcd and kubernetes api server, so what might we be doing that causes those components to fail?

QGBA · Answer

Hitting the same issue after update the cluster to 1.16: level=warning msg="Health check status" error="not able to connect to any etcd endpoints" subsys=cilium-operator {"level":"warn","ts":"2019-12-04T21:49:01.258Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"passthrough:///https://eece3570-12b4-40fa-8f9b-3a2b417d9cf9.internal.k8s.ondigitalocean.com:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadli ne exceeded"} level=warning msg="Health check status" error="not able to connect to any etcd endpoints" subsys=cilium-operator level=fatal msg="Unable to start status api: http: Server closed" subsys=cilium-operator

antoinedao1 · Answer

Running into a similar issue at the moment. Is it worth raising another ticket or has #02979561 been investigated/resolved already?

Richard Thombs · Answer

Same thing here after a 1.16 upgrade. Did you guys get a solution? level=info msg="Connecting to kvstore..." address= kvstore=etcd subsys=cilium-operator level=info msg="Connecting to etcd server..." config=/var/lib/etcd-config/etcd.config endpoints="[https://b0709a0f-c4b9-40d7-a65e-182eabbb3f1a.internal.k8s.ondigitalocean.com:2379]" subsys=kvstore level=info msg="Starting to synchronize k8s nodes to kvstore..." subsys=cilium-operator {"level":"warn","ts":"2020-01-08T19:36:34.408Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"passthrough:///https://b0709a0f-c4b9-40d7-a65e-182eabbb3f1a.internal.k8s.ondigitalocean.com:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"} {"level":"warn","ts":"2020-01-08T19:36:49.409Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"passthrough:///https://b0709a0f-c4b9-40d7-a65e-182eabbb3f1a.internal.k8s.ondigitalocean.com:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"} {"level":"warn","ts":"2020-01-08T19:37:04.411Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"passthrough:///https://b0709a0f-c4b9-40d7-a65e-182eabbb3f1a.internal.k8s.ondigitalocean.com:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"} {"level":"warn","ts":"2020-01-08T19:37:19.412Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"passthrough:///https://b0709a0f-c4b9-40d7-a65e-182eabbb3f1a.internal.k8s.ondigitalocean.com:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"} level=warning msg="Health check status" error="not able to connect to any etcd endpoints" subsys=cilium-operator {"level":"warn","ts":"2020-01-08T19:37:34.413Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"passthrough:///https://b0709a0f-c4b9-40d7-a65e-182eabbb3f1a.internal.k8s.ondigitalocean.com:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"} level=warning msg="Health check status" error="not able to connect to any etcd endpoints" subsys=cilium-operator {"level":"warn","ts":"2020-01-08T19:37:49.414Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"passthrough:///https://b0709a0f-c4b9-40d7-a65e-182eabbb3f1a.internal.k8s.ondigitalocean.com:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"} level=warning msg="Health check status" error="not able to connect to any etcd endpoints" subsys=cilium-operator level=fatal msg="Unable to start status api: http: Server closed" subsys=cilium-operator

Report this

Kubernetes apiserver down, causing network issues.

Become a contributor for community

DigitalOcean Documentation

Resources for startups and AI-native businesses

The developer cloud

Start building today