This is the third time when Kubernetes starts to be unavailable for “kubectl”. Sometimes I cannot connect to the cluster using Kubernetes CLI and get the following errors:

Unable to connect to the server: dial tcp x.x.x.x:443: i/o timeout

Or

Unable to connect to the server: net/http: TLS handshake timeout

The indicator near the k8s logo lights yellow whereas it’s green when all Okey. When I try to add extra nodes it stucks in “loading” state without any changes. We rely on your servers and clusters, but it lets us down every week. On your servers, we host production environments of projects and monitoring infrastructure.
I cannot Google any information about this issue.
What do we do wrong? How to avoid these problems in the future?

2 comments
  • Did anyone find a solution to this? I am having the exact issue with one of my clusters today.

  • I’m having this issue as well. Not with anything crazy or particularly API heavy. I can’t imagine what would be causing these kinds of problems. I’ve enabled auto-scaling on my pool and all its running is a proxy and monitoring.

These answers are provided by our Community. If you find them useful, show some love by clicking the heart. If you run into issues leave a comment, or add your own answer to help others.

×
15 answers

The same problem.
Cluster is not accessible for the last 2 hours.
Apps running on cluster are not accessible too.

Managed Kubernetes in DO is not production ready at all :(

Did you find a solution to this? I’ve had this issue twice today, and droplets disappearing from the load balancer as well. Popped in a ticket, and have had no reply.

Feels like the Digital Ocean’s Kubernetes solution is not quite production-ready.

As of today is still happening, when installing prometheus-operator chart it loses the connectivity to the cluster to the point where any kubectl command it’s not working.

Even a single kubectl apply -f for a single resource, triggered the connectivity loss to the cluster for more than 2 hours.

And adding nodes to the current pool, took more than 3 hours to complete.

I hope we get more transparency of what is happening with our clusters.

Hi there!

This can occur if you have any API heavy workloads deployed that put strain on the master node. If you want you can open a support ticket and I can dig in to see whats occurring on your cluster’s control plane.

Regards,

John Kwiatkoski
Senior Developer Support Engineer - Kubernetes

Same here.

Cluster is not accessible since 16hours now (sometimes it works, you have to try it about 50 times to have a single lucky connection).

support tickets are simply ignored or answered with “everything seems to work”.

Sorry, managed kubernetes in DO seems to be toy. and the support too.

Same problem. Tasks like “helm upgrade” for a simple Prometheus installation cause the DO k8s going yellow in DO panel and cannot get any answer to kubectl commands. There was not heavy use of k8s API.

It seems to me that DO k8s is not production ready yet.

The same issue, clusters change from ready to unready and no way to debug is available not even shell over the browser. Total invisibility on what is going on.

Completely agree. Managed Kubernetes in DO is not production ready at all :(
I have the same problem.
Fortunately I tested it in test mode.

This issue still exists, looks like I won’t be able to use the Kubernetes offering for the time being.

Happened twice already today when installing the Prometheus operator on a cluster with more than enough capacity. Had to create a new cluster both times as it was completely stuck: no kubectl connectivity, no Kubernetes dashboard, and adding more nodes / replacing existing ones didn’t work. It was simply unusable afterwards.

Really hope this gets fixed :)

This happened to me for the 4th or 5th time last night when upgrading the Prometheus Operator helm chart, which is set up according to the DO guide. No response to my ticket for the last 15 hours. This would be utterly unacceptable if I were in production, so I probably won’t be able to use DOKS for my production needs.

Last time I put in a ticket about this, they restarted the master and basically said I’d need to spend more money on nodes to get a better master instance (without saying how where the breakpoints on that are).

Same here, sometimes it is working sometimes it is not, And we already have some small apps on production but fortunately it is affecting “only” releasing process, It wasn’t like this before.


attempt 1:
Using token [ xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx ]

Validating token... invalid token

Error: Unable to use supplied token to access API: Get https://api.digitalocean.com/v2/account: net/http: TLS handshake timeout

attempt 2:
Using token [ xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx ]

Validating token... invalid token

Error: Unable to use supplied token to access API: Get https://api.digitalocean.com/v2/account: net/http: TLS handshake timeout

attempt 3:
Using token [ xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx ]

Validating token... invalid token

Error: Unable to use supplied token to access API: Get https://api.digitalocean.com/v2/account: net/http: TLS handshake timeout

attempt 4:
Using token [ xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx ]

Validating token... OK

Notice: Adding cluster credentials to kubeconfig file found in "/root/.kube/config"
Notice: Setting current-context to do-lon1-xxxx

kubectl get no
Error: Get https://api.digitalocean.com/v2/kubernetes/clusters/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/credentials: net/http: TLS handshake timeout
Error: Get https://api.digitalocean.com/v2/kubernetes/clusters/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/credentials: net/http: TLS handshake timeout
Error: Get https://api.digitalocean.com/v2/kubernetes/clusters/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/credentials: net/http: TLS handshake timeout
error: the server doesn't have a resource type "no"

attempt 5:
Using token [ xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx ]

Validating token... invalid token

Error: Unable to use supplied token to access API: Get https://api.digitalocean.com/v2/account: net/http: TLS handshake timeout

attempt 6:
Using token [ xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx ]

Validating token... invalid token

Error: Unable to use supplied token to access API: Get https://api.digitalocean.com/v2/account: net/http: TLS handshake timeout

attempt 7:
Using token [ xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx ]

Validating token... invalid token

Error: Unable to use supplied token to access API: Get https://api.digitalocean.com/v2/account: net/http: TLS handshake timeout

attempt 8:
Using token [ xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx ]

Validating token... OK

Error: Get https://api.digitalocean.com/v2/kubernetes/clusters?page=1&per_page=200: net/http: TLS handshake timeout

attempt 9:
Using token [ xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx xxxxx ]

Validating token... OK

It seems many of us are experiencing this, so I have raised an “idea” to get this fixed / explanation / workaround to be provided.
Please vote for it:
https://ideas.digitalocean.com/ideas/K8SX-I-34

Just experienced this now. This is severe. When a cluster is having issues, being able to connect to the master node to debug and repair is essential. I am running a 3 node cluster with production plan nodes.

Same issue here. Attempting to upgrade prometheus-operator helm chart causes it to go down and everything starts timing ou. Can’t access Kube dashboard either.

This issue should really be addressed.

I’ve just experienced this issue for 4-th time and each time I’ve ended up creating a new cluster. I am personally using K8s for my own side-project and it’s still very frustrating even though it isn’t real production yet.

This time the issue started to kick in imminently after I’ve tried to install Kubernetes Monitoring Stack.
It never even fully installed, just created Prometheus Operator namespace and its secret.
It seems like the issue is somehow related to Prometheus Operator as many others already noticed.

So I’ve googled both of these terms together and found this blog post that I think actually reveals and solves the mystery. And although it intended for GKE it still might help newcomers to deal with this problem.

edited by MattIPv4
Submit an Answer