Example of Cluster Autoscaling Working With Horizontal Pod Autoscaling

DigitalOcean Kubernetes (DOKS) is a managed Kubernetes service that lets you deploy Kubernetes clusters without the complexities of handling the control plane and containerized infrastructure. Clusters are compatible with standard Kubernetes toolchains and integrate natively with DigitalOcean Load Balancers and block storage volumes.

Cluster Autoscaling (CA) manages the number of nodes in a cluster. It monitors the number of idle pods, or unscheduled pods sitting in the pending state, and uses that information to determine the appropriate cluster size.

Horizontal Pod Autoscaling (HPA) adds more pods and replicas based on events like sustained CPU spikes. HPA uses the spare capacity of the existing nodes and does not change the cluster’s size.

CA and HPA can work in conjunction: if the HPA attempts to schedule more pods than the current cluster size can support, then the CA responds by increasing the cluster size to add capacity. These tools can take the guesswork out of estimating the needed capacity for workloads while controlling costs and managing cluster performance.

This article walks you through deploying an example application that simulates workloads so you can see how the interaction between the CA and the HPA works, both when scaling up in response to demand and scaling down as load decreases.

Test Cluster Autoscaling

To run the example application, you need to set up two tools:

  1. Install doctl, the DigitalOcean command-line tool, v1.32.2 or higher.

  2. Install kubectl, the Kubernetes command-line tool.

Once you have doctl and kubectl, create a DigitalOcean Kubernetes cluster with autoscaling enabled:

doctl k8s cluster create mycluster \
    --node-pool "name=mypool;auto-scale=true;min-nodes=1;max-nodes=10"

Install the DigitalOcean Kubernetes metrics server tool from the DigitalOcean Marketplace so the HPA can monitor the cluster’s resource usage. Confirm that the metrics-server is installed and starts to report metrics:

kubectl top nodes

It takes a few minutes for the metrics-server to start reporting the metrics. The command outputs the CPU and memory statistics such as this:

NAME           CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
mypool-3z4hs   369m         36%    783Mi           49%       
mypool-3z4tz   84m          8%     791Mi           50%       
mypool-3z520   425m         42%    917Mi           58%       
mypool-3z52d   341m         34%    937Mi           59%       
mypool-3zhiq   324m         32%    856Mi           54%  

Deploy the CPU-spiking service and HPA itself using hpa.yaml, which defines a custom resource definition (CRD) for an HPA configured to scale to up to 20 replicas of any service on the cluster that experiences a CPU spike at or above 80%:

kubectl apply -f <path-to-hpa-yaml-file>

Test the autoscaling behavior by scheduling the load generator using load-generator.yaml, which repeatedly sends requests to the CPU-spiking service:

kubectl apply -f <path-to-load-generator-yaml-file>

As the load generator runs, you can check the status of the HPA and CA:

kubectl describe hpa hello # Check HPA status
kubectl get configmap cluster-autoscaler-status -n kube-system -oyaml # Check CA status

Continue checking the status of the HPA and CA. You can apply pressure to the cluster capacity by scaling up the load generator:

kubectl scale deployment/load-generator --replicas 2

After 5 minutes of sustained CPU spiking, the HPA starts scheduling more and more pods. Another 5 minutes after that, when the cluster runs out of capacity and the unscheduled pods start piling up, the CA kicks in to add more nodes.

Conversely, you can scale down the load generator and watch the number of pods decrease in your workload:

kubectl scale deployment/load-generator --replicas 1

After 5 minutes of lowered CPU use, the HPA starts to delete unutilized pods. Another 5 minutes after that, the CA notices the excess capacity and begins scaling down the number of nodes in the cluster as well.

Going Further

You can customize many parts of this example’s configuration, including the kinds of events that trigger an action from the HPA and how long they need to last to trigger a response. In general, you need to configure the HPA to balance responsiveness (being sensitive enough for timely responses to load changes) against thrashing (being too sensitive and causing wild fluctuations). For more details on configuring HPAs, see Horizontal Pod Autoscaler in the Kubernetes documentation.