As cloud-native applications grow in scale and sophistication, the ability to dynamically allocate computing resources becomes critical—especially for workloads that require GPUs. Kubernetes provides built-in autoscaling, but it often falls short when you need to scale based on external or custom metrics. This is where KEDA (Kubernetes Event-Driven Autoscaling) excels. KEDA allows your Kubernetes workloads to scale based on real-time metrics from sources like Prometheus.
In this tutorial, we’ll walk through how to autoscale an AMD GPU-based workload running on DigitalOcean Kubernetes (DOKS) using Prometheus and KEDA. This setup allows you to react to live metrics and optimize GPU utilization efficiently and cost-effectively.
By the end of this tutorial, you’ll be able to deploy and autoscale GPU workloads on DOKS with confidence, leveraging event-driven scaling to optimize performance and cost for demanding AI/ML and compute-intensive applications.
You can create a new DOKS cluster by clicking the Create button on the top-right of your DigitalOcean dashboard, or by navigating to Resources within your project and selecting:
Note: We recommend creating two node pools
Why two node pools?
Isolating GPU and non-GPU workloads allows better scaling control, reduces resource waste, and avoids scheduling conflicts. Peripheral services don’t need expensive GPU nodes.
Note: For AMD GPUs, the AMD GPU device plugin is enabled by default.
Note: For AMD GPUs, the AMD GPU device plugin is enabled by default. Enable the AMD GPU Device Metrics Exporter then, you can make use of the API
-H "Content-Type: application/json" \
-H "Authorization: Bearer <your-api-token>" \
-d '{"name": "<your_cluster_name.", "amd_gpu_device_metrics_exporter_plugin": {"enabled": true}}' \
"https://api.digitalocean.com/v2/kubernetes/clusters/<your-cluster-uuid>"
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install prometheus prometheus-community/kube-prometheus-stack -n kube-system
Step 3: Install KEDA with Helm KEDA will evaluate Prometheus metrics and scale your workloads accordingly.
helm repo add kedacore https://kedacore.github.io/charts
helm repo update
helm install keda kedacore/keda
We’ll deploy a sample Go application that exposes a Prometheus gauge metric. This workload requests a GPU and will be scaled based on the metric value.
Create a file named resources.yaml
and paste the following manifests:
apiVersion: v1
kind: ConfigMap
metadata:
name: go-program
data:
main.go: |
package main
import (
"log"
"net/http"
"strconv"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promhttp"
)
var customGauge = prometheus.NewGauge(prometheus.GaugeOpts{
Name: "my_custom_gauge",
Help: "This is a custom gauge metric",
})
func init() {
prometheus.MustRegister(customGauge)
}
func handler(w http.ResponseWriter, r *http.Request) {
value := r.URL.Query().Get("value")
floatVal, err := strconv.ParseFloat(value, 64)
if err != nil {
http.Error(w, "Bad Request: invalid input", http.StatusBadRequest)
return
}
customGauge.Set(floatVal)
w.Write([]byte("Hello, from the handler!"))
}
func main() {
http.HandleFunc("/", handler)
http.Handle("/metrics", promhttp.Handler())
log.Fatal(http.ListenAndServe(":8080", nil))
}
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: go-from-configmap
spec:
replicas: 1
selector:
matchLabels:
app: go-from-configmap
template:
metadata:
labels:
app: go-from-configmap
spec:
nodeSelector:
doks.digitalocean.com/gpu-brand: amd
tolerations:
- key: CriticalAddonsOnly
operator: Exists
- key: amd.com/gpu
operator: Exists
containers:
- name: go-runner
image: golang:1.24
command: ["/bin/sh", "-c"]
resources:
requests:
amd.com/gpu: "1"
limits:
amd.com/gpu: "1"
ports:
- containerPort: 8080
args:
- |
set -e
cp /config/*.go /app/
cd /app
[ -f go.mod ] || go mod init temp
go mod tidy
go run .
volumeMounts:
- name: go-code
mountPath: /config
- name: app-dir
mountPath: /app
volumes:
- name: go-code
configMap:
name: go-program
- name: app-dir
emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
name: go-from-configmap
labels:
app: go-from-configmap
spec:
selector:
app: go-from-configmap
ports:
- name: metrics
port: 8080
targetPort: 8080
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: go-from-configmap
labels:
release: prometheus
spec:
selector:
matchLabels:
app: go-from-configmap
endpoints:
- port: metrics
path: /metrics
interval: 30s
namespaceSelector:
matchNames:
- kube-system
---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: go-from-configmap
spec:
scaleTargetRef:
name: go-from-configmap
pollingInterval: 30
cooldownPeriod: 30
minReplicaCount: 1
maxReplicaCount: 8
triggers:
- type: prometheus
metadata:
serverAddress: http://prometheus-operated.kube-system.svc.cluster.local:9090
metricName: my_custom_gauge
threshold: '80'
query: sum(my_custom_gauge{}) / sum(kube_deployment_status_replicas{deployment="go-from-configmap"})
Apply the resources:
kubectl apply -f resources.yaml -n <your-namespace>
We created a ConfigMap
that stores a simple Go HTTP server as a string. This app
8080
.my_custom_gauge
. This metric represents a simulated load value and will be the core driver for our autoscaling logic./metrics
endpoint compatible with Prometheus scraping/
where a user can update a gauge metric by calling /?value=123
Note: This setup is for demo purposes only and not intended for production use.
We define a Deployment
that uses the official golang:1.24
image. On startup, it:
ConfigMap
and compiles the Go server code dynamicallyamd.com/gpu: 1
)Service
resource allows Prometheus to access our /metrics
endpoint.app: go-from-configmap
To make Prometheus aware of our scraping endpoint, we define a ServiceMonitor
. This resource:
app: go-from-configmap
label/metrics
30s
Prometheus will now continuously collect the latest values of my_custom_gauge
.
The KEDA ScaledObject
ties everything together. It configures KEDA to:
my_custom_gauge
metric stored in PrometheusIn short, Keda will:
go-from-configmap
deployment.Get the pod name and exec into it:
kubectl get pods -l app=go-from-configmap
kubectl exec -it <go-pod-name> -- bash
Set the gauge to a high value (e.g. 160):
curl http://127.0.0.1:8080/?value=160
exit
Watch for pod scaling:
kubectl get pods -w -l app=go-from-configmap
KEDA will evaluate the query:
160 (gauge value) / 1 (replica) = 160
, which exceeds the threshold of 80
→ A second replica is triggered.
At this point, the new pod should be in a pending state since we don’t have enough GPUs based on the requested AMD GPU resources of that deployment. In the cloud UI, you should see a new autoscaling event (a new node coming up) under the AMD GPU node pool. Wait until the new GPU is done bootstrapping and check the pod. The pod must have changed from the Pending state to Running.
The query is continuously evaluated and scaling stops or continues depending on the metric’s behavior.
Yes. KEDA is agnostic to the type of workload as long as you can expose a metric (such as GPU utilization, queue length, or custom application metrics) that Prometheus can scrape. For AMD GPU workloads, ensure the AMD device plugin and metrics exporter are enabled (default on DOKS GPU node pools).
As of this writing, AMD MI300X GPU nodes are available in select regions such as TOR1, NYC2, and ATL1. Always check the DigitalOcean documentation for the latest supported regions and GPU types.
KEDA extends Kubernetes autoscaling by allowing you to scale workloads based on external or custom metrics (like Prometheus queries, queue length, or cloud events), not just CPU or memory. This is especially useful for GPU workloads where utilization patterns may not correlate with CPU usage.
If your scaling logic triggers more pods than there are available GPUs, new pods will remain in the Pending
state until additional GPU nodes are provisioned. DOKS will automatically scale the GPU node pool (if autoscaling is enabled) to accommodate the increased demand, subject to your configured limits.
Yes. The KEDA + Prometheus pattern is platform- and vendor-agnostic. You can adapt this approach for NVIDIA GPUs or other accelerators by using the appropriate device plugin and metrics exporter for your hardware and Kubernetes distribution.
This guide demonstrated how to build an autoscaling GPU workload on DOKS using KEDA and Prometheus, all with native Kubernetes constructs and open-source tooling.
This architecture empowers your workloads to:
By combining event-driven scaling with custom observability, teams can efficiently manage GPU resources and deliver high-performance compute workloads with minimal manual intervention.
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
I help Businesses scale with AI x SEO x (authentic) Content that revives traffic and keeps leads flowing | 3,000,000+ Average monthly readers on Medium | Sr Technical Writer @ DigitalOcean | Ex-Cloud Consultant @ AMEX | Ex-Site Reliability Engineer(DevOps)@Nutanix
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!
Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.
Full documentation for every DigitalOcean product.
The Wave has everything you need to know about building a business, from raising funding to marketing your product.
Stay up to date by signing up for DigitalOcean’s Infrastructure as a Newsletter.
New accounts only. By submitting your email you agree to our Privacy Policy
Scale up as you grow — whether you're running one virtual machine or ten thousand.
Sign up and get $200 in credit for your first 60 days with DigitalOcean.*
*This promotional offer applies to new accounts only.