By jimvdv
We’re experiencing some issues in our cluster where our network seems unstable. We noticed deploys failing sometimes because application pods could not connect to dependent services (e.g . database) a couple times. After some poking around I found that Cilium Operator and CoreDNS show a high number of restarts. This seems to be because they could not reach the kube apiserver (or because etcd is down, not sure).
Logs for Cilium Operator
level=info msg="Cilium Operator " subsys=cilium-operator
level=info msg="Starting apiserver on address :9234" subsys=cilium-operator
level=info msg="Connecting to etcd server..." config=/var/lib/etcd-config/etcd.config endpoints="[https://0a18c093-ee32-45d2-a8a6-d630a6242716.internal.k8s.ondigitalocean.com:2379]" subsys=kvstore
level=info msg="Establishing connection to apiserver" host="https://10.245.0.1:443" subsys=k8s
level=info msg="Establishing connection to apiserver" host="https://10.245.0.1:443" subsys=k8s
level=warning msg="Health check status" error="Not able to connect to any etcd endpoints" subsys=cilium-operator
level=error msg="Unable to contact k8s api-server" error="Get https://10.245.0.1:443/api/v1/componentstatuses/controller-manager: dial tcp 10.245.0.1:443: i/o timeout" ipAddr="https://10.245.0.1:443" subsys=k8s
level=fatal msg="Unable to connect to Kubernetes apiserver" error="unable to create k8s client: unable to create k8s client: Get https://10.245.0.1:443/api/v1/componentstatuses/controller-manager: dial tcp 10.245.0.1:443: i/o timeout" subsys=cilium-operator
Logs for CoreDNS
.:53
2019-09-30T09:06:13.965Z [INFO] CoreDNS-1.3.1
2019-09-30T09:06:13.965Z [INFO] linux/amd64, go1.11.4, 6b56a9c
CoreDNS-1.3.1
linux/amd64, go1.11.4, 6b56a9c
2019-09-30T09:06:13.965Z [INFO] plugin/reload: Running configuration MD5 = 2e2180a5eeb3ebf92a5100ab081a6381
E0930 09:06:49.312873 1 reflector.go:251] github.com/coredns/coredns/plugin/kubernetes/controller.go:317: Failed to watch *v1.Endpoints: Get https://10.245.0.1:443/api/v1/endpoints?resourceVersion=2667502&timeout=6m0s&timeoutSeconds=360&watch=true: dial tcp 10.245.0.1:443: connect: connection refused
E0930 09:06:49.312873 1 reflector.go:251] github.com/coredns/coredns/plugin/kubernetes/controller.go:317: Failed to watch *v1.Endpoints: Get https://10.245.0.1:443/api/v1/endpoints?resourceVersion=2667502&timeout=6m0s&timeoutSeconds=360&watch=true: dial tcp 10.245.0.1:443: connect: connection refused
log: exiting because of error: log: cannot create log: open /tmp/coredns.coredns-9d6bf9876-t4jlt.unknownuser.log.ERROR.20190930-090649.1: no such file or directory
Any ideas what might be causing these issues? I’m not sure where to look, as I understand it DO would manage the master node running etcd and kubernetes api server, so what might we be doing that causes those components to fail?
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!
Hitting the same issue after update the cluster to 1.16:
level=warning msg="Health check status" error="not able to connect to any etcd endpoints" subsys=cilium-operator
{"level":"warn","ts":"2019-12-04T21:49:01.258Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"passthrough:///https://eece3570-12b4-40fa-8f9b-3a2b417d9cf9.internal.k8s.ondigitalocean.com:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadli
ne exceeded"}
level=warning msg="Health check status" error="not able to connect to any etcd endpoints" subsys=cilium-operator
level=fatal msg="Unable to start status api: http: Server closed" subsys=cilium-operator
Running into a similar issue at the moment. Is it worth raising another ticket or has #02979561 been investigated/resolved already?
Same thing here after a 1.16 upgrade. Did you guys get a solution?
level=info msg="Connecting to kvstore..." address= kvstore=etcd subsys=cilium-operator
level=info msg="Connecting to etcd server..." config=/var/lib/etcd-config/etcd.config endpoints="[https://b0709a0f-c4b9-40d7-a65e-182eabbb3f1a.internal.k8s.ondigitalocean.com:2379]" subsys=kvstore
level=info msg="Starting to synchronize k8s nodes to kvstore..." subsys=cilium-operator
{"level":"warn","ts":"2020-01-08T19:36:34.408Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"passthrough:///https://b0709a0f-c4b9-40d7-a65e-182eabbb3f1a.internal.k8s.ondigitalocean.com:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
{"level":"warn","ts":"2020-01-08T19:36:49.409Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"passthrough:///https://b0709a0f-c4b9-40d7-a65e-182eabbb3f1a.internal.k8s.ondigitalocean.com:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
{"level":"warn","ts":"2020-01-08T19:37:04.411Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"passthrough:///https://b0709a0f-c4b9-40d7-a65e-182eabbb3f1a.internal.k8s.ondigitalocean.com:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
{"level":"warn","ts":"2020-01-08T19:37:19.412Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"passthrough:///https://b0709a0f-c4b9-40d7-a65e-182eabbb3f1a.internal.k8s.ondigitalocean.com:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
level=warning msg="Health check status" error="not able to connect to any etcd endpoints" subsys=cilium-operator
{"level":"warn","ts":"2020-01-08T19:37:34.413Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"passthrough:///https://b0709a0f-c4b9-40d7-a65e-182eabbb3f1a.internal.k8s.ondigitalocean.com:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
level=warning msg="Health check status" error="not able to connect to any etcd endpoints" subsys=cilium-operator
{"level":"warn","ts":"2020-01-08T19:37:49.414Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"passthrough:///https://b0709a0f-c4b9-40d7-a65e-182eabbb3f1a.internal.k8s.ondigitalocean.com:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
level=warning msg="Health check status" error="not able to connect to any etcd endpoints" subsys=cilium-operator
level=fatal msg="Unable to start status api: http: Server closed" subsys=cilium-operator
Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.
Full documentation for every DigitalOcean product.
The Wave has everything you need to know about building a business, from raising funding to marketing your product.
Stay up to date by signing up for DigitalOcean’s Infrastructure as a Newsletter.
New accounts only. By submitting your email you agree to our Privacy Policy
Scale up as you grow — whether you're running one virtual machine or ten thousand.
Sign up and get $200 in credit for your first 60 days with DigitalOcean.*
*This promotional offer applies to new accounts only.