Question

Can't schedule workloads after upgrade of Kubernetes from 1.12.7 to 1.12.8

Fortunately I don’t have mission critical services running on my cluster and now I likely never will be :)

I have started the automatic upgrade process of my DO-managed Kubernetes cluster from 1.12.7 to 1.12.8. After the upgrade of the Master control-plane went through I was expecting the worker nodes to get upgraded as well, however it somehow got stuck and the nodes are not being upgraded.

So currently I can no longer schedule new workloads as all new pods are going into the state ContainerCreating and are stuck there.

I tried to resize the node pool and this caused a new droplet to be spun up with the up-to-date Kubernetes version from DO (Debian do-kube-1.12.8-do.4). Using kubectl get nodes I can see the old nodes still in status Ready while the new node (even after around 30min) still shows up as NotReady and the latest event is Kubelet starting.

In addition all the old but still running droplets in the node pool no longer report metrics into the DO web interface.

Any idea what I can do? If nothing works I will probably have to tear down the whole cluster and set it up from scratch again. To be honest, this is very worrying to me.


Submit an answer


This textbox defaults to using Markdown to format your answer.

You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!

Sign In or Sign Up to Answer

These answers are provided by our Community. If you find them useful, show some love by clicking the heart. If you run into issues leave a comment, or add your own answer to help others.

Steven Normore
DigitalOcean Employee
DigitalOcean Employee badge
June 7, 2019
Accepted Answer

Hi there,

I’m the Engineering Manager on the Kubernetes team at DO and I wanted to follow-up on Ethan’s previous posts as we’ve continued to work through the resolution of this issue.

A recent update to our auto-upgrade process introduced drift in our upgrade logic that resulted in the failed upgrades we’ve been discussing here. Our team has identified the problem and are in the process of rolling out a fix. The data and workloads in your clusters will remain intact. Affected clusters that are within their maintenance window will resume and complete the upgrade process when the fix is rolled out. If your cluster is now outside of it’s scheduled maintenance window, you can recycle your worker nodes in the cloud panel to trigger and resume their upgrade.

Unfortunately, our testing pipeline did not catch this issue. As is common procedure at DO, we will be following up this incident with an internal retrospective that will help us evolve the system and testing pipeline to be more resilient going forward. We understand that you place your trust in our platform and sincerely apologize for any trouble this has caused. Our system and processes will get better as a result of this.

Best, Steven Normore Engineering Manager, DOKS

Cant you just recycle the nodes?

I’m facing same problem. K8 13.3 13.5 upgrade. Master upgraded but workers stuck in upgrading Forever

Become a contributor for community

Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.

DigitalOcean Documentation

Full documentation for every DigitalOcean product.

Resources for startups and SMBs

The Wave has everything you need to know about building a business, from raising funding to marketing your product.

Get our newsletter

Stay up to date by signing up for DigitalOcean’s Infrastructure as a Newsletter.

New accounts only. By submitting your email you agree to our Privacy Policy

The developer cloud

Scale up as you grow — whether you're running one virtual machine or ten thousand.

Get started for free

Sign up and get $200 in credit for your first 60 days with DigitalOcean.*

*This promotional offer applies to new accounts only.