Published on August 24, 2022
Kubernetes aims to provide both resilience and scalability. It achieves this by deploying multiple pods with different resource allocations, to provide redundancy for your applications. Although you can grow and shrink your own deployments manually based on your needs, Kubernetes provides first-class support for scaling on-demand, using a feature called Horizontal Pod Autoscaling. It is a closed loop system that automatically grows or shrinks resources (application Pods) based on your current needs. You create a HorizontalPodAutoscaler (or HPA) resource for each application deployment that needs autoscaling, and let it take care of the rest for you automatically.

At a high level, HPA does the following:

  1. It keeps an eye on resource requests metrics coming from your application workloads (Pods), by querying the metrics server.
  2. It compares the target threshold value that you set in the HPA definition with the average resource utilization observed for your application workloads (CPU and memory).
  3. If the target threshold is reached, then HPA will scale up your application deployment to meet higher demands. Otherwise, if below the threshold, it will scale down the deployment. To see what logic HPA uses to scale your application deployment, you can review the algorithm details page from the official documentation.

Under the hood, a HorizontalPodAutoscaler is a CRD (Custom Resource Definition) which drives a Kubernetes control loop implemented via a dedicated controller within the Control Plane of your cluster. You create a HorizontalPodAutoscaler YAML manifest targeting your application Deployment, and then use kubectl to apply the HPA resource in your cluster.

In order to work, HPA needs a metrics server available in your cluster to scrape required metrics, such as CPU and memory utilization. One straightforward option is the Kubernetes Metrics Server. The Metrics Server works by collecting resource metrics from Kubelets and exposing them via the Kubernetes API Server to the Horizontal Pod Autoscaler. The Metrics API can also be accessed via kubectl top if needed.

In this tutorial, you will:

  • Deploy Metrics Server to your Kubernetes cluster.
  • Learn how to create Horizontal Pod Autoscalers for your applications.
  • Test each HPA setup, using two scenarios: constant and variable application load.

To follow this tutorial, you will need:

Step 1 – Install Metrics Server via Helm

You’ll start by adding the metrics-server repository to your helm package lists. You can use helm repo add:

  1. helm repo add metrics-server https://kubernetes-sigs.github.io/metrics-server

Next, use helm repo update to refresh the available packages:

  1. helm repo update metrics-server
Hang tight while we grab the latest from your chart repositories... ...Successfully got an update from the "metrics-server" chart repository Update Complete. ⎈Happy Helming!⎈

Now that you’ve added the repository to helm, you’ll be able to add metrics-server to your Kubernetes deployments. You could write your own deployment configuration here, but this tutorial will follow DigitalOcean’s Kubernetes Starter Kit, which includes a configuration for metrics-server.

To do that, clone the Kubernetes Starter Kit Git repository:

  1. git clone https://github.com/digitalocean/Kubernetes-Starter-Kit-Developers.git

The metrics-server configuration is located in Kubernetes-Starter-Kit-Developers/09-scaling-application-workloads/assets/manifests/metrics-server-values-v3.8.2.yaml. You can view or edit it by using nano or your favorite text editor:

  1. nano Kubernetes-Starter-Kit-Developers/09-scaling-application-workloads/assets/manifests/metrics-server-values-v3.8.2.yaml

It contains a few stock parameters. Note that replicas is a fixed value, 2.

## Starter Kit metrics-server configuration
## Ref: https://github.com/kubernetes-sigs/metrics-server/blob/metrics-server-helm-chart-3.8.2/charts/metrics-server

# Number of metrics-server replicas to run
replicas: 2

  # Specifies if the v1beta1.metrics.k8s.io API service should be created.
  # You typically want this enabled! If you disable API service creation you have to
  # manage it outside of this chart for e.g horizontal pod autoscaling to
  # work with this release.
  create: true

  # Specifies if metrics-server should be started in hostNetwork mode.
  # You would require this enabled if you use alternate overlay networking for pods and
  # API server unable to communicate with metrics-server. As an example, this is required
  # if you use Weave network on EKS
  enabled: false

Refer to the Metrics Server chart page for an explanation of the available parameters for metrics-server.

Note: You need to be fairly careful when matching Kubernetes deployments to your running version of Kubernetes, and the helm charts themselves are also versioned to enforce this. The current upstream helm chart for metrics-server is 3.8.2, which deploys version 0.6.1 of metrics-server itself. From the Metrics Server Compatibility Matrix, you can see that version 0.6.x supports Kubernetes 1.19+.

After you’ve reviewed the file and made any changes, you can proceed with deploying metrics-server, by providing this file along with the helm install command:

  2. helm install metrics-server metrics-server/metrics-server --version "$HELM_CHART_VERSION" \
  3. --namespace metrics-server \
  4. --create-namespace \
  5. -f "Kubernetes-Starter-Kit-Developers/09-scaling-application-workloads/assets/manifests/metrics-server-values-v${HELM_CHART_VERSION}.yaml"

This will deploy metrics-server to your configured Kubernetes cluster:

NAME: metrics-server LAST DEPLOYED: Wed May 25 11:54:43 2022 NAMESPACE: metrics-server STATUS: deployed REVISION: 1 TEST SUITE: None NOTES: *********************************************************************** * Metrics Server * *********************************************************************** Chart version: 3.8.2 App version: 0.6.1 Image tag: k8s.gcr.io/metrics-server/metrics-server:v0.6.1 ***********************************************************************

After deploying, you can use helm ls to verify that metrics-server has been added to your deployment:

  1. helm ls -n metrics-server
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION metrics-server metrics-server 1 2022-02-24 14:58:23.785875 +0200 EET deployed metrics-server-3.8.2 0.6.1

Next, you can check the status of all of the Kubernetes resources deployed to the metrics-server namespace:

  1. kubectl get all -n metrics-server

Based on the configuration you deployed with, both the deployment.apps and replicaset.apps values should count 2 available instances.

NAME READY STATUS RESTARTS AGE pod/metrics-server-694d47d564-9sp5h 1/1 Running 0 8m54s pod/metrics-server-694d47d564-cc4m2 1/1 Running 0 8m54s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/metrics-server ClusterIP <none> 443/TCP 8m54s NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/metrics-server 2/2 2 2 8m55s NAME DESIRED CURRENT READY AGE replicaset.apps/metrics-server-694d47d564 2 2 2 8m55s

You have now deployed metrics-server into your Kubernetes cluster. In the next step, you’ll review some of the parameters of a HorizontalPodAutoscaler Custom Resource Definition.

Step 2 - Getting to Know HPAs

So far, your configurations have used a fixed value for the number of ReplicaSet instances to deploy. In this step you will learn how to define a HorizontalPodAutoscaler CRD so that this value can dynamically grow or shrink.

A typical HorizontalPodAutoscaler CRD looks like this:

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
  name: my-app-hpa
    apiVersion: apps/v1
    kind: Deployment
    name: my-app-deployment
  minReplicas: 1
  maxReplicas: 3
    - type: Resource
        name: cpu
          type: Utilization
          averageUtilization: 50

The parameters used in this configuration are as follows:

  • spec.scaleTargetRef: A named reference to the resource being scaled.
  • spec.minReplicas: The lower limit for the number of replicas to which the autoscaler can scale down.
  • spec.maxReplicas: The upper limit.
  • spec.metrics.type: The metric to use to calculate the desired replica count. This example is using the Resource type, which tells the HPA to scale the deployment based on average CPU (or memory) utilization. averageUtilization is set to a threshold value of 50.

You have two options to create an HPA for your application deployment:

  1. Use the kubectl autoscale command on an existing deployment.
  2. Create a HPA YAML manifest, and then use kubectl to apply changes to your cluster.

You’ll try option #1 first, using another configuration from the DigitalOcean Kubernetes Starter Kit. It contains a deployment called myapp-test.yaml which will demonstrate HPA in action by creating some arbitrary CPU load.

You can review that file by using nano or your favorite text editor:

  1. nano Kubernetes-Starter-Kit-Developers/09-scaling-application-workloads/assets/manifests/hpa/metrics-server/myapp-test.yaml
apiVersion: apps/v1
kind: Deployment
  name: myapp-test
      run: myapp-test
  replicas: 1
        run: myapp-test
        - name: busybox
          image: busybox
              cpu: 50m
              cpu: 20m
          command: ["sh", "-c"]
            - while [ 1 ]; do
              echo "Test";
              sleep 0.01;

Note the last few lines of this file. They contain some shell syntax to repeatedly print “Test” a hundred times a second, to simulate load. Once you are done reviewing the file, you can deploy it into your cluster using kubectl:

  1. kubectl apply -f Kubernetes-Starter-Kit-Developers/09-scaling-application-workloads/assets/manifests/hpa/metrics-server/myapp-test.yaml

Next, use kubectl autoscale to create a HorizontalPodAutoscaler targeting the myapp-test deployment:

  1. kubectl autoscale deployment myapp-test --cpu-percent=50 --min=1 --max=3

Note the arguments passed to this command – this means that your deployment will be scaled between 1 and 3 replicas whenever CPU utilization reaches 50 percent.

You can check if the HPA resource was created by running kubectl get hpa:

  1. kubectl get hpa

The TARGETS column of the output will eventually show a figure of current usage%/target usage%.

NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE myapp-test Deployment/myapp-test 240%/50% 1 3 3 52s

Note: The TARGETS column value will display <unknown>/50% for a while (around 15 seconds). This is normal, because HPA needs to collect average values over time, and it won’t have enough data before the first 15 second interval. By default, HPA checks metrics every 15 seconds.

You can also observe the logged events that a HPA generates by using kubectl describe:

  1. kubectl describe hpa myapp-test
Name: myapp-test Namespace: default Labels: <none> Annotations: <none> CreationTimestamp: Mon, 28 May 2022 10:10:50 -0800 Reference: Deployment/myapp-test Metrics: ( current / target ) resource cpu on pods (as a percentage of request): 240% (48m) / 50% Min replicas: 1 Max replicas: 3 Deployment pods: 3 current / 3 desired ... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal SuccessfulRescale 17s horizontal-pod-autoscaler New size: 2; reason: cpu resource utilization (percentage of request) above target Normal SuccessfulRescale 37s horizontal-pod-autoscaler New size: 3; reason: cpu resource utilization (percentage of request) above target

This is the kubectl autoscale method. In a production scenario, you should usually instead use a dedicated YAML manifest to define each HPA. This way, you can track changes by having the manifest committed to a Git repository, and modify it as needed.

You will walk through an example of this in the last step of this tutorial. Before moving on, delete the myapp-test deployment and corresponding HPA resource:

  1. kubectl delete hpa myapp-test
  2. kubectl delete deployment myapp-test

Step 3 - Scaling Applications Automatically via Metrics Server

In this last step, you’ll experiment with two different ways of generating server load and scaling via a YAML manifest:

  1. An application deployment that creates constant load by performing some CPU intensive computations.
  2. A shell script simulates that external load by performing fast successive HTTP calls for a web application.

Constant Load Test

In this scenario, you will create a sample application implemented using Python, which performs some CPU intensive computations. Similar to the shell script from the last step, this Python code is included in one of the example manifests from the starter kit. You can open the constant-load-deployment-test.yaml using nano or your favorite text editor:

  1. nano Kubernetes-Starter-Kit-Developers/09-scaling-application-workloads/assets/manifests/hpa/metrics-server/constant-load-deployment-test.yaml
apiVersion: v1
kind: ConfigMap
  name: python-test-code-configmap
  entrypoint.sh: |-
    #!/usr/bin/env python

    import math

    while True:
      x = 0.0001
      for i in range(1000000):
        x = x + math.sqrt(x)

apiVersion: apps/v1
kind: Deployment
  name: constant-load-deployment-test
      run: python-constant-load-test
  replicas: 1
        run: python-constant-load-test
        - name: python-runtime
          image: python:alpine3.15
              cpu: 50m
              cpu: 20m
            - /bin/entrypoint.sh
            - name: python-test-code-volume
              mountPath: /bin/entrypoint.sh
              readOnly: true
              subPath: entrypoint.sh
        - name: python-test-code-volume
            defaultMode: 0700
            name: python-test-code-configmap

The Python code, which repeatedly generates arbitrary square roots, is highlighted above. The deployment will fetch a docker image hosting the required python runtime, and then attach a ConfigMap to the application Pod hosting the sample Python script shown earlier.

First, create a separate namespace for this deployment (for better observation), then deploy it via kubectl:

  1. kubectl create ns hpa-constant-load
  2. kubectl apply -f Kubernetes-Starter-Kit-Developers/09-scaling-application-workloads/assets/manifests/hpa/metrics-server/constant-load-deployment-test.yaml -n hpa-constant-load
configmap/python-test-code-configmap created deployment.apps/constant-load-deployment-test created

Note: The sample deployment also configures resource request limits for the sample application Pods. This is important because HPA logic relies on having resource requests limits set for your Pods. In general, it is advisable to set resource requests limits for all your application Pods, to avoid unpredictable bottlenecks.

Verify that the deployment was created successfully, and that it’s up and running:

  1. kubectl get deployments -n hpa-constant-load
NAME READY UP-TO-DATE AVAILABLE AGE constant-load-deployment-test 1/1 1 1 8s

Next, you’ll need to deploy another HPA to this cluster. There is an example matched to this scenario in constant-load-hpa-test.yaml, which you can open with nano or your favorite text editor:

  1. nano Kubernetes-Starter-Kit-Developers/09-scaling-application-workloads/assets/manifests/hpa/metrics-server/constant-load-hpa-test.yaml -n hpa-constant-load
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
  name: constant-load-test
    apiVersion: apps/v1
    kind: Deployment
    name: constant-load-deployment-test
  minReplicas: 1
  maxReplicas: 3
    - type: Resource
        name: cpu
          type: Utilization
          averageUtilization: 50

Deploy it via kubectl:

  1. kubectl apply -f Kubernetes-Starter-Kit-Developers/09-scaling-application-workloads/assets/manifests/hpa/metrics-server/constant-load-hpa-test.yaml -n hpa-constant-load

This will create a HPA resource, targeting the sample Python deployment. You can check the constant-load-test HPA state via kubectl get hpa:

  1. kubectl get hpa constant-load-test -n hpa-constant-load

Note the REFERENCE column targeting constant-load-deployment-test, as well as the TARGETS column showing current CPU resource requests versus the threshold value, as in the last example.

NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE constant-load-test Deployment/constant-load-deployment-test 255%/50% 1 3 3 49s

You may also notice that the REPLICAS column value increased from 1 to 3 for the sample application deployment, as stated in the HPA CRD spec. This happened very quickly because the application used in this example generates CPU load very quickly. As in the previous example, you can also inspect logged HPA events using kubectl describe hpa -n hpa-constant-load.

External Load Test

A more interesting and realistic scenario is to observe where external load is created. For this final example you’re going to use a different namespace and set of manifests to avoid reusing any data from the previous test.

This example will use the quote of the moment sample server. Every time an HTTP request is made to this server, it sends a different quote as a response. You’ll create load on your cluster by sending HTTP requests every 1ms. This deployment is included in quote_deployment.yaml. Review this file using nano or your favorite text editor:

  1. nano Kubernetes-Starter-Kit-Developers/09-scaling-application-workloads/assets/manifests/hpa/metrics-server/quote_deployment.yaml
apiVersion: apps/v1
kind: Deployment
  name: quote
  replicas: 1
      app: quote
        app: quote
        - name: quote
          image: docker.io/datawire/quote:0.4.1
            - name: http
              containerPort: 8080
              cpu: 100m
              memory: 50Mi
              cpu: 200m
              memory: 100Mi

apiVersion: v1
kind: Service
  name: quote
    - name: http
      port: 80
      targetPort: 8080
    app: quote

Note that the actual HTTP query script is not contained within the manifest this time – this manifest only provisions an app to run the queries for now. When you are done reviewing the file, create the quote namespace and deployment using kubectl:

  1. kubectl create ns hpa-external-load
  2. kubectl apply -f Kubernetes-Starter-Kit-Developers/09-scaling-application-workloads/assets/manifests/hpa/metrics-server/quote_deployment.yaml -n hpa-external-load

Verify that the quote application deployment and services are up and running:

  1. kubectl get all -n hpa-external-load
NAME READY STATUS RESTARTS AGE pod/quote-dffd65947-s56c9 1/1 Running 0 3m5s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/quote ClusterIP <none> 80/TCP 3m5s NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/quote 1/1 1 1 3m5s NAME DESIRED CURRENT READY AGE replicaset.apps/quote-6c8f564ff 1 1 1 3m5s

Next, you’ll create the HPA for the quote deployment. This is configured in quote-deployment-hpa-test.yaml. Review the file in nano or your favorite text editor:

  1. nano Kubernetes-Starter-Kit-Developers/09-scaling-application-workloads/assets/manifests/hpa/metrics-server/quote-deployment-hpa-test.yaml
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
  name: external-load-test
    apiVersion: apps/v1
    kind: Deployment
    name: quote
      stabilizationWindowSeconds: 60
  minReplicas: 1
  maxReplicas: 3
    - type: Resource
        name: cpu
          type: Utilization
          averageUtilization: 20

Note that in this case there’s a different threshold value set for the CPU utilization resource metric (20%). There is also a different scaling behavior. This configuration alters the scaleDown.stabilizationWindowSeconds behavior, and sets it to a lower value of 60 seconds. This is not always needed in practice, but in this case you may want to speed up things to see more quickly how the autoscaler performs the scale down action. By default, the HorizontalPodAutoscaler has a cool down period of 5 minutes. This is sufficient in most cases, and should avoid fluctuations when replicas are being scaled.

When you’re ready, deploy it using kubectl:

  1. kubectl apply -f Kubernetes-Starter-Kit-Developers/09-scaling-application-workloads/assets/manifests/hpa/metrics-server/quote-deployment-hpa-test.yaml -n hpa-external-load

Now, check if the HPA resource is in place and alive:

  1. kubectl get hpa external-load-test -n hpa-external-load
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE external-load-test Deployment/quote 1%/20% 1 3 1 108s

Finally, you will run the actual HTTP queries, using the shell script quote_service_load_test.sh. The reason that this shell script was not embedded into the manifest earlier is so that you can observe it running in your cluster while logging directly to your terminal. Review the script using nano or your favorite text editor:

  1. nano Kubernetes-Starter-Kit-Developers/09-scaling-application-workloads/assets/scripts/quote_service_load_test.sh
#!/usr/bin/env sh

echo "[INFO] Starting load testing in 10s..."
sleep 10
echo "[INFO] Working (press Ctrl+C to stop)..."
kubectl run -i --tty load-generator \
    --rm \
    --image=busybox \
    --restart=Never \
    -n hpa-external-load \
    -- /bin/sh -c "while sleep 0.001; do wget -q -O- http://quote; done" > /dev/null 2>&1
echo "[INFO] Load testing finished."

For this demonstration, open two separate terminal windows. In the first, run the quote_service_load_test.sh shell script:

  1. Kubernetes-Starter-Kit-Developers/09-scaling-application-workloads/assets/scripts/quote_service_load_test.sh

Next, in the second window, run a kubectl watch command using the -w flag on the HPA resource:

  1. kubectl get hpa -n hpa-external-load -w

You should see the load tick upwards and scale automatically:

NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE external-load-test Deployment/quote 1%/20% 1 3 1 2m49s external-load-test Deployment/quote 29%/20% 1 3 1 3m1s external-load-test Deployment/quote 67%/20% 1 3 2 3m16s

You can observe how the autoscaler kicks in when load increases, and increments the quote server deployment replica set to a higher value. As soon as the load generator script is stopped, there’s a cool down period, and after 1 minute or so the replica set is lowered to the initial value of 1. You can press Ctrl+C to terminate the running script after navigating back to the first terminal window.


In this tutorial, you deployed and observed the behavior of Horizontal Pod Autoscaling (HPA) using Kubernetes Metrics Server under several different scenarios. HPA is an essential component of Kubernetes that helps your infrastructure handle more traffic on an as-needed basis.

Metrics Server has a significant limitation in that it cannot provide any metrics beyond CPU or memory usage. You can further review Metrics Server documentation to understand how to work within its use cases. If you need to scale using any other metrics (such as disk usage or network load), you can use Prometheus via a special adapter, named prometheus-adapter.

