During last day (also some previous days in last weeks) we experienced several outages of Kubernetes cluster which impacted thousands of our clients in PRODUCTION environment.
The situation was monitored by our tech team which noticed that the containers were repeatedly restarted or terminated without our intervention. Our services became unavailable, droplets of cluster were removed from LoadBalancer (had to add them manually over and over again) and totally were not able to get logs of any of running containers (kubectl logs container_name) - following error was received:
Error from server: Get https://10.133.4.193:10250/containerLogs/OUR_NAMESPACE/OUR_POD_NAME-854bf7bc4f-vbxn6/gateway: net/http: TLS handshake timeout
The executing of commands inside containers was not working neither (kubectl exec -it container_name sh) - command just got stuck.
When we wanted to access services in cluster (websites, webapps, etc.) from outside we noticed that the request were not forwarded from LoadBalancer to cluster / droplets.
Just adding - we did not change any certificates, nor did any other configuration. Our apps were running without any problem for several days.
Kindly asking you for issue investigating or a statement saying if there were any problems with Kubernetes service / network infrastructure / etc. Thanks
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!
Same behavior on a cluster, control plane become unavailable, all nodes under Load Balancer “Down” but containers still working and processing jobs.
doctl kubernetes cluster list
ID Name Region Version Auto Upgrade Status Node Pools
some-id k8s ams3 1.16.2-do.1 false degraded k8s-std k8s-cpu
Greetings!
I’m sorry that I didn’t reply here earlier. I want you to know that we saw this and began discussing it, but I didn’t have anything solid to report until now. This cluster should now be healthy, after several hours of discussion between our engineers. Credit to John K and Nan Z for resolving this, I just wanted to share the news.
Should anyone else find themselves in a similar situation, please don’t hesitate to write to our support team here: https://www.digitalocean.com/company/contact/#support
Jarland
Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.
Full documentation for every DigitalOcean product.
The Wave has everything you need to know about building a business, from raising funding to marketing your product.
Stay up to date by signing up for DigitalOcean’s Infrastructure as a Newsletter.
New accounts only. By submitting your email you agree to our Privacy Policy
Scale up as you grow — whether you're running one virtual machine or ten thousand.
Sign up and get $200 in credit for your first 60 days with DigitalOcean.*
*This promotional offer applies to new accounts only.