How to Upgrade DOKS Clusters to Newer Versions

DigitalOcean Kubernetes (DOKS) is a managed Kubernetes service that lets you deploy Kubernetes clusters without the complexities of handling the control plane and containerized infrastructure. Clusters are compatible with standard Kubernetes toolchains and integrate natively with DigitalOcean Load Balancers and block storage volumes.

You can upgrade DigitalOcean Kubernetes clusters to newer patch versions (e.g. 1.13.1 to 1.13.2) as well as new minor versions (e.g. 1.12.1 to 1.13.1) in the DigitalOcean Control Panel or in doctl, the command line interface (CLI) tool.

There are two ways to upgrade:

  • On demand. When an upgrade becomes available for DigitalOcean Kubernetes, you can manually trigger the upgrade process. You can upgrade to a new minor version using the manual process, provided you first perform all available patch-level upgrades for your current minor version.

  • Automatically. You can enable automatic upgrades for a cluster that happen within a maintenance window you specify. Automatic updates trigger on new patch versions of Kubernetes and new point releases of DigitalOcean Kubernetes subsystems, like the DigitalOcean Cloud Controller Manager or DigitalOcean Container Storage Interface. However, your cluster will not be automatically upgraded to new minor Kubernetes versions (e.g. 1.12.1 to 1.13.1).

The Upgrade Process

During an upgrade, the control plane (Kubernetes master) is replaced with a new master running the new version of Kubernetes. This process takes a few minutes, during which API access to the cluster is unavailable but workloads are not impacted.

Once the master has been replaced, the worker nodes are replaced in a rolling fashion, one worker pool at a time. Kubernetes reschedules each worker node's workload, then replaces the node with a new node running the new version and reattaches any block storage volumes to the new nodes. The new worker nodes have new IP addresses.

Warning
Any data stored on the local disks of the worker nodes will be lost in the upgrade process. We recommend using persistent volumes for data storage, and not relying on local disk for anything other than temporary data.

During this process, workloads running on clusters with a single worker node will experience downtime because there is no additional capacity to host the node's workload during the replacement.

If security-related issues arise, it may be necessary for us to force cluster upgrades even on clusters with automatic upgrades disabled. When this is the case, we work to upgrade during specified maintenance windows with advance notification via email, control panel notifications, and via our status page.

Upgrading via Control Panel

Upgrading On Demand

To update a cluster manually, visit the Overview tab of the cluster in the control panel. Under Available Upgrades, you will see an Upgrade Now button if there is a new version available for your cluster. Click this button to begin the upgrade process.

Upgrading to a New Minor Version

The on-demand process is required when upgrading your cluster to a new minor version of Kubernetes. During this process, you can run our cluster linter before upgrading. This automatically checks the cluster to ensure it's conforming to some common best practices, and links to the fixes recommended in our documentation, to help mitigate issues that might affect your cluster's compatibility with the newer version of Kubernetes. Click Run Linter on the upgrade modal to begin.

Screenshot of upgrade modal showing 'Run Linter' link.

Upgrading Automatically

To enable automatic upgrades for a cluster, visit the Settings tab of the cluster. In the Version Upgrades section, click Enable Auto Upgrades.

Automatic upgrades occur during a cluster's 4-hour maintenance window. The default maintenance window is chosen by the DigitalOcean Kubernetes backend to guarantee an even workload across all maintenance windows for optimal processing.

You can specify a different maintenance window in the Settings tab of a cluster. In the Maintenance Window section, click Edit to specify a different start time. Maintenance windows are made up of two parts: a time of day and, optionally, a day of the week. For example, you can set your maintenance window to 5am any day of the week or to 8pm on Mondays.

Even if you have auto upgrades enabled, you can still upgrade on-demand by clicking the Upgrade Now button in the Overview tab.

Upgrading via CLI

Upgrading to the latest version

First, obtain your cluster ID:

doctl kubernetes cluster list

Then pass the cluster ID to the upgrade command to upgrade to the latest version:

doctl kubernetes cluster upgrade 41b74c5d-9bd0-5555-5555-a57c495b81a3

Upgrading to a specific version

To upgrade to a specific Kubernetes version, rather than just automatically upgrading to the latest version, you must first use your cluster ID to get a list of available upgrades for that cluster:

doctl kubernetes cluster get-upgrades 41b74c5d-9bd0-5555-5555-a57c495b81a3

Then, use the slug value returned by the get-upgrades call to perform the upgrade:

doctl kubernetes cluster upgrade 41b74c5d-9bd0-5555-5555-a57c495b81a3 --version 1.15.3-do.3

Enabling Disruption-Free Upgrades

Upgrading your cluster can cause disruptions in the availability of services running in your workloads. Consider the following measures to ensure service availability during upgrades.

Configure a PodDisruptionBudget

A PodDisruptionBudget (PDB) specifies the minimum number of replicas that an application can tolerate having during a voluntary disruption, relative to how many it is intended to have. For example, if you set the replicas value for a deployment to 5, and set the PDB to 1, potentially disruptive actions like cluster upgrades and resizes will occur with no fewer than four pods running.

For more information, see Specifying a Disruption Budget for your Application in the Kubernetes documentation.

Implement Graceful Shutdowns

Ensure that the containers in your workload respond to shutdown requests in a way that doesn't suddenly destroy service. You can use tools like a preStop hook that responds to a scheduled Pod shutdown, and specify a grace period other than the 30-second default.

This is important because cluster upgrades will result in Pod shutdowns, which follow the standard Kubernetes termination lifecycle:

  1. The Pod is set to the “Terminating” state and removed as an endpoint.
  2. The preStop hook is executed, if it exists.
  3. A SIGTERM signal is sent to the Pod, notifying the containers that they are going to be shut down soon. Your code should listen for this event and start shutting down at this point.
  4. Kubernetes waits for a grace period to pass; the default grace period is 30 seconds.
  5. A SIGKILL signal is sent to any containers that still haven't shut down, and the Pod is removed.

For more information, see Termination of Pods in the Kubernetes documentation.

Set up Readiness Probes

Readiness probes are useful if applications are running but not able to serve traffic, due to things like external services that are still starting up, loading of large data sets, etc. You can configure a readiness probe to report such a status. Think of a command that you could execute in the container every few seconds that would indicate readiness if it returns 0, and specify the command and the schedule in your Pod spec.

For more information, see Configure Liveness, Readines and Startup Probes in the Kubernetes Documentation.