Featured AI Products
Compute
Build, deploy, and scale cloud compute resources
Containers and Images
Safely store and manage containers and backups
Managed Databases
Fully managed resources running popular database engines
Management and Dev Tools
Control infrastructure and gather insights
Networking
Secure and control traffic to apps
Security
Help protect your account and resources with these security features
Storage
Store and access any amount of data reliably in the cloud
Browse all products
AI/ML
CMS
Data and IoT
Developer Tools
Gaming and Media
Hosting
Security and Networking
Startups and SMBs
Web and App Platforms
See all solutions
Community
Documentation
Developer Tools
Get Involved
Utilities and Help
Become a Partner
Marketplace
Pricing

How SMBs and startups scale on DigitalOcean Kubernetes: Best Practices Part IV - Scalability

Published: June 6, 2024
11 min read

Introduction

This article is part of a 6-part series on DigitalOcean Kubernetes best practices targeted at SMBs and startups.

In Part 1 of the series, we covered the challenges in adopting and scaling Kubernetes, and “Developer Productivity” best practices. We explored how Kubernetes can streamline development processes, increase efficiency, and enable faster application time-to-market.

In Part 2 of the series, we covered “observability” best practices. We discussed the importance of monitoring, logging, and tracing in a Kubernetes environment and how these practices contribute to maintaining a healthy and performant system.

In Part 3 of the series, we covered “reliability” best practices. We discussed right-sizing your nodes and pods, defining appropriate Quality of Service (QoS) for pods, utilizing probes for health monitoring, employing suitable deployment strategies, optimizing pod scheduling, enhancing upgrade resiliency, and leveraging tags in your container images.

In this current segment (Part 4), we focus on scalability. The meaning and scope of scalability can be different for a small and medium-sized business (SMB) compared to an enterprise. SMBs often have limited resources and smaller-scale deployments, which present unique challenges in ensuring the scalability of their Kubernetes clusters. We start with a broad overview of the challenges. Then, we provide a set of checklists and best practices that SMBs can follow to ensure scalability in their Kubernetes environments. Our focus is primarily on SMB-scale clusters, typically consisting of fewer than 500 nodes.

Scalability Challenges

“There became a point where we had revenue coming in and there was a fear that we couldn’t deliver at scale for a large event, with many concurrent users and more than a single stream per user. CTO.ai came in and saw what our current infrastructure was and highlighted the solutions such as DigitalOcean Kubernetes that could scale up and down automatically.” - Andrew Lombardi, Snipitz Chief Product Officer

Scalability challenges are common within Kubernetes and occur when the cluster fails to scale or scales too slowly to meet the demand. These issues can include:

Insufficient resources: Lack of available nodes or resources to accommodate the growing workload.
Slow autoscaling: Autoscaling mechanisms do not react quickly enough to sudden spikes in traffic.
Inefficient resource utilization: Suboptimal resource allocation or overprovisioning leads to wasted resources and increased costs.
Database scalability: Scaling databases to handle increased read/write operations is difficult.

To address these challenges, adhering to well-established practices in Kubernetes and cloud-native computing while considering your application’s specific requirements and characteristics is crucial.

Defining Kubernetes Scale

Scaling brings its own set of challenges and is highly dependent on the dimension of scale. Our definition of Kubernetes scale focuses on cluster sizes up to 500 nodes and running different types of applications at scale (e.g., data analytics, web scraping, metaverse, video streaming, etc.).

We can define scale at different layers as follows. The list below is not exhaustive but contains some key points to consider when considering scalability.

1. Cluster Scalability:

Cluster auto-scaling: Does the cluster auto-scaler rapidly and seamlessly scale from 10 nodes to 100+ nodes when needed?
Kubernetes API server performance: Can the API server handle the increased load and maintain low latency when scaling to 100+ nodes?
Cluster DNS scalability: Does the cluster DNS scale effectively to meet the demands of your growing applications?
etcd scalability: Can the etcd cluster scale horizontally to handle the increased data and traffic while ensuring high performance and availability?
Network scalability: Does the cluster network scale seamlessly with the growing number of nodes and pods, maintaining low latency and high throughput?

2. Application Scalability:

Horizontal pod auto-scaling: Can you leverage the Horizontal Pod Autoscaler (HPA) to automatically scale your application pods based on metrics like CPU utilization or custom metrics?
Buffer nodes: Are pre-provisioned buffer nodes available to accommodate sudden spikes in application traffic?
Load balancer scalability: Does the load balancer scale to meet your application demands and distribute traffic evenly across the application pods?
Database scalability: Can your database solution scale based on the application demand using techniques like sharding, replication, or distributed databases?
Caching and content delivery: Do you employ caching mechanisms and Content Delivery Networks (CDNs) to reduce backend load and improve response times?
Message queue scalability: If message queues are used, can they scale to handle the increased message volume, and can you horizontally scale the message queue consumers?

Scalability is not just about adding more resources but also about designing your applications and infrastructure to be scalable from the ground up. By asking the right questions and proactively addressing scalability concerns, startups can build Kubernetes environments that can more effectively handle the demands of their growing business. Regular testing, monitoring, and optimization are crucial to ensure a smooth scaling experience.

Scaling Best Practices

Scaling is a complex and multifaceted domain encompassing various system aspects, including API, network, DNS, load balancer, and application. In this section, we will focus on the top problem areas that users commonly encounter when scaling their applications on DigitalOcean Kubernetes. We assume you have already implemented observability measures and followed the appropriate checklist described in the reliability section. Additionally, this series will cover disaster recovery and security topics in future blog posts.

Checklist: Know your Default Account Limits

DigitalOcean accounts have default limits, with newer accounts having tighter restrictions than established ones. Kubernetes users should pay attention to the limits on Droplets, load balancers, firewalls, firewall rules, and Volumes.

Teams can visit their respective team page to review and request adjustments to Droplet limits. Limits on other resources are not publicly exposed but can be modified by contacting DigitalOcean support.

As Kubernetes customers enable autoscaling, they may need to adjust their account limits to accommodate their cluster’s expected maximum size. When considering account limits, keep the following points in mind:

Droplet account limits: Ensure that your nodes can scale out sufficiently to handle the expected maximum load, taking into account the max_nodes setting of each node pool and the account’s Droplet limit.
Volume limits: Be aware of the total volume limit and the hard limit of seven volumes per Droplet. Verify if your applications can scale to their maximum respective limits considering the total volumes limit and the per-Droplet volume limit. Pay close attention to the affinity and anti-affinity rules in StatefulSets that have persistent volume claim templates.
Load balancer limits: Adjust load balancer limits if necessary to allow the creation of Kubernetes services of type LoadBalancer.
Firewall limits: DOKS provisions two firewalls per Kubernetes cluster by default. Services of type NodePort automatically add firewall rules.

Checklist: Use Horizontal Pod Autoscaler

DigitalOcean supports the automatic scaling of both pods and worker nodes through the Horizontal Pod Autoscaler (HPA) and Cluster Autoscaler (CA), respectively. Here’s how these components work together to optimize your deployment:

The HPA automatically adjusts the number of pods in a deployment or replication controller based on observed CPU utilization or other specified metrics. When the HPA triggers an increase in the pod count to maintain set resource utilization thresholds, it ensures that your application maintains performance without manual intervention.
The CA monitors the need for additional worker nodes. If the increased pod count requires more resources than currently available, CA will provision additional worker nodes to accommodate the new pods. This ensures that the cluster adapts dynamically to the load.
Benefits of using HPA and CA Together: This combination allows for a highly responsive and resilient Kubernetes environment that can scale both horizontally (more pods) and vertically (more nodes) as needed.

By leveraging HPA and CA, you can ensure that your Kubernetes cluster scales efficiently in response to application demands, optimizing both resource usage and cost.

Checklist: Scaling applications rapidly using a buffer node

When applications need to scale rapidly, the time it takes to create a new node (approximately 5 minutes) can be too long. To address this issue, a good practice is to run low-priority buffer pods that utilize the Cluster Autoscaler to proactively create one or more buffer nodes.

Here’s how it works:

Deploy low-priority buffer pods: Create a deployment or replica set with low-priority pods that have minimal resource requirements. Set the priorityClassName of these pods to a lower value than your main application pods.
Configure cluster autoscaler: Set up the Cluster Autoscaler to monitor the resource utilization of the cluster. The Cluster Autoscaler will automatically create buffer nodes based on the demand from the low-priority buffer pods.
Scaling behavior: When the Horizontal Pod Autoscaler (HPA) triggers the scaling of your main application pods, Kubernetes will prioritize scheduling them on the available nodes. If there is insufficient capacity on the existing nodes, Kubernetes will evict the low-priority buffer pods to make room for the higher-priority application pods. The evicted buffer pods will be rescheduled onto the newly created buffer nodes by the Cluster Autoscaler.

By employing this strategy, your applications can scale instantly without waiting for new nodes to be provisioned. The buffer nodes act as a pre-provisioned resource pool, allowing for rapid scaling when demand increases. This approach helps maintain the responsiveness and availability of your applications during sudden spikes in traffic or workload. Note that using a buffer node will incur additional charges.

Checklist: Optimize Application Start-up Time

When rapidly scaling your cluster from a small number of nodes to a large number (e.g., from 5 to 50 nodes) to run batch jobs or handle increased workload, fetching many container images from the registry can become a bottleneck. Even though your cluster may scale quickly, the applications must fetch the required images and be ready to serve requests. This process can introduce latency and impact the overall start-up time of your applications.

To optimize the application start-up time and mitigate the impact of image fetching, consider the following strategies:

Use a registry in the same region: Ensure that your container image registry (or a mirror) is located in the same region as your Kubernetes cluster. This proximity helps reduce network latency and speeds up the image-fetching process.
Implement a pull-through cache: Deploy a local Docker registry within your Kubernetes cluster as a pull-through cache. Configure your applications to pull images from this local registry instead of the remote registry. The local registry will cache the images pulled from the remote registry, helping to reduce the network latency for subsequent image pulls.
Optimize image size: Minimize the size of your container images by removing unnecessary files and layers. Use minimal base images and leverage multi-stage builds to keep the final image size small. Smaller image sizes result in faster image downloads and shorter start-up times. For those seeking highly optimized and secure base images, consider using offerings from providers like Chainguard. These images are designed with security and minimal size in mind, which can further enhance the performance and security of your applications.
Utilize image pre-pulling: Consider implementing image pre-pulling techniques to proactively fetch and cache the required images on the nodes. This can be achieved using a DaemonSet that pulls the images on each node.

Checklist: DNS Scaling

To ensure efficient DNS resolution and handle increased traffic in your Kubernetes cluster, consider scaling DNS at three layers:

Cluster CoreDNS: Scale CoreDNS by increasing the number of replicas in the deployment. Adjust the replicas based on CPU utilization and DNS latency.
Node-Local DNS cache: Implement the NodeLocalDNSCache daemonset to enable DNS caching on each node. Node-local caching reduces the load on CoreDNS and improves DNS performance.
Cloud provider DNS forwarder: DigitalOcean manages and scales the DNS forwarder in each region. This layer handles external DNS resolution and is automatically scaled by the cloud provider.

To optimize DNS performance in your cluster:

Monitor CoreDNS performance and scale the number of replicas as needed.
Deploy the NodeLocalDNSCache daemonset to enable node-local DNS caching.

Checklist: Caching and Database Scaling

Databases work very well but also require due diligence when scaling. Help ensure your system remains efficient and responsive under heavy loads by implementing these key strategies:

Caching: Implement Redis, Memcached, or some other caching mechanism to store frequently accessed data, reducing database load and improving response times.
Connection pooling: Use connection pooling to manage database connections more efficiently, minimizing overhead and enhancing throughput.
Database scaling: For Horizontal (sharding), partition data across multiple servers to effectively manage large datasets and high traffic volumes. You can also use read replicas to distribute read queries and balance the load in read-heavy applications
Managed services: Consider DigitalOcean Managed Databases for automated scaling, maintenance, and security, simplifying database management.
Monitoring: Continuously monitor and optimize your database and caching setup to promptly identify and address performance bottlenecks.

Checklist: Network Load Testing

Network load testing is essential for validating the performance and scalability of your application’s network infrastructure under realistic and peak load conditions. This testing helps identify potential bottlenecks and performance constraints, providing insights in advance.

Implement realistic testing scenarios: Utilize tools like Apache JMeter, Locust, or Gatling to simulate real-world user behavior and traffic patterns. Ensure these scenarios cover both expected traffic and extreme load conditions to thoroughly test the network’s capability.
Incorporate chaos testing: In addition to standard load testing, integrate chaos testing methods to evaluate how your network handles unexpected disruptions. Tools such as ChaosMesh, LitmusChaos, or ChaosMonkey can introduce random failures into your network components (like randomly terminating instances) to test the resilience and failover mechanisms.

Checklist: Resilience to Kubernetes API Latency and Failure

Interacting with the Kubernetes control plane, particularly the kube-apiserver, is a common requirement for applications that dynamically manage resources within the cluster. Given the complexity and distributed nature of the control plane, these interactions can be susceptible to various failure modes, including network latency and system errors.

Implement strong communication practices:

Retries: Start by implementing retry mechanisms in your applications. This basic resilience strategy can handle intermittent failures by attempting the request multiple times.
Backoffs: To prevent overwhelming the API server during high latency or failure periods, integrate exponential backoff logic in your retries. This approach gradually increases the wait time between retries, reducing the load on the server and improving the chances of recovery.
Circuit Breaking: As a more advanced strategy, implement circuit breakers to stop cascading failures in an interconnected system. Circuit breakers can temporarily halt operations to a failing service until stability is restored, preventing failures from spreading across the system.

Handle failure responses gracefully:

Be proactive in handling HTTP error responses from the Kubernetes API. Common errors include 5xx (server-side problems) and 4xx ( client-side request issues). For example, a 503 Service Unavailable error might occur when requests to the underlying etcd database fail.
Design your application to understand and react appropriately to these errors, potentially logging incidents for further investigation or triggering alternative workflows to maintain operational stability.

In conclusion, scaling your applications on DigitalOcean Kubernetes requires careful consideration of various factors, including account limits, autoscaling mechanisms, application start-up time, DNS scaling, caching and database optimization, network load testing, and resilience to Kubernetes API latency and failures. By following the checklists and best practices outlined in this guide, you can more effectively scale your applications, handle increased traffic, and ensure optimal performance and reliability in your Kubernetes cluster.

Next Steps

As we continue to explore the ISV journey of Kubernetes adoption, our ongoing blog series will delve deeper into the resilience, efficiency, and security of your deployments.

Scalability (this blog): Explore how to manage application scaling by asking the right questions and proactively addressing scalability concerns, SMBs can build Kubernetes environments that can effectively handle the demands of their growing business.
Disaster preparedness (Part 5): Discuss the importance of having a solid disaster recovery plan, including backup strategies, practices, and regular drills to ensure business continuity.
Security (Part 6): Delve into securing your Kubernetes environment, covering best practices for network policies, access controls, and securing application workloads.

Each of these topics is crucial for navigating the complexities of Kubernetes and enhancing your infrastructure’s resilience, scalability, and security. Stay tuned for insights to help empower your Kubernetes journey.

Ready to embark on a transformative journey and get the most from Kubernetes on DigitalOcean? Sign up for DigitalOcean Kubernetes here.

About the author(s)

Bikram Gupta

Author

See author profile

Oliver Love

Author

See author profile

Engineering

Start building today

From GPU-powered inference and Kubernetes to managed databases and storage, get everything you need to build, scale, and deploy intelligent applications.

Engineering

The Inference Alpha: Maximizing Frontier Models on AMD

Balaji Varadarajan

June 10, 2026
12 min read

Engineering

The Inference Tax: How Prefix-Aware Routing Eliminates the Hidden Cost of LLMs at Scale

Piyush Srivastava

June 1, 2026
13 min read

Engineering

DigitalOcean Serverless Inference: A Deep Dive

smehta

June 1, 2026
17 min read

Engineering

How SMBs and startups scale on DigitalOcean Kubernetes: Best Practices Part IV - Scalability

By Bikram Gupta and Oliver Love

Published: June 6, 2024
11 min read

<- Back to blog home

Introduction

This article is part of a 6-part series on DigitalOcean Kubernetes best practices targeted at SMBs and startups.

Scalability Challenges

Scalability challenges are common within Kubernetes and occur when the cluster fails to scale or scales too slowly to meet the demand. These issues can include:

Insufficient resources: Lack of available nodes or resources to accommodate the growing workload.
Slow autoscaling: Autoscaling mechanisms do not react quickly enough to sudden spikes in traffic.
Inefficient resource utilization: Suboptimal resource allocation or overprovisioning leads to wasted resources and increased costs.
Database scalability: Scaling databases to handle increased read/write operations is difficult.

Defining Kubernetes Scale

We can define scale at different layers as follows. The list below is not exhaustive but contains some key points to consider when considering scalability.

1. Cluster Scalability:

Cluster auto-scaling: Does the cluster auto-scaler rapidly and seamlessly scale from 10 nodes to 100+ nodes when needed?
Kubernetes API server performance: Can the API server handle the increased load and maintain low latency when scaling to 100+ nodes?
Cluster DNS scalability: Does the cluster DNS scale effectively to meet the demands of your growing applications?
etcd scalability: Can the etcd cluster scale horizontally to handle the increased data and traffic while ensuring high performance and availability?
Network scalability: Does the cluster network scale seamlessly with the growing number of nodes and pods, maintaining low latency and high throughput?

2. Application Scalability:

Horizontal pod auto-scaling: Can you leverage the Horizontal Pod Autoscaler (HPA) to automatically scale your application pods based on metrics like CPU utilization or custom metrics?
Buffer nodes: Are pre-provisioned buffer nodes available to accommodate sudden spikes in application traffic?
Load balancer scalability: Does the load balancer scale to meet your application demands and distribute traffic evenly across the application pods?
Database scalability: Can your database solution scale based on the application demand using techniques like sharding, replication, or distributed databases?
Caching and content delivery: Do you employ caching mechanisms and Content Delivery Networks (CDNs) to reduce backend load and improve response times?
Message queue scalability: If message queues are used, can they scale to handle the increased message volume, and can you horizontally scale the message queue consumers?

Scaling Best Practices

Checklist: Know your Default Account Limits

Droplet account limits: Ensure that your nodes can scale out sufficiently to handle the expected maximum load, taking into account the max_nodes setting of each node pool and the account’s Droplet limit.
Volume limits: Be aware of the total volume limit and the hard limit of seven volumes per Droplet. Verify if your applications can scale to their maximum respective limits considering the total volumes limit and the per-Droplet volume limit. Pay close attention to the affinity and anti-affinity rules in StatefulSets that have persistent volume claim templates.
Load balancer limits: Adjust load balancer limits if necessary to allow the creation of Kubernetes services of type LoadBalancer.
Firewall limits: DOKS provisions two firewalls per Kubernetes cluster by default. Services of type NodePort automatically add firewall rules.

Checklist: Use Horizontal Pod Autoscaler

The HPA automatically adjusts the number of pods in a deployment or replication controller based on observed CPU utilization or other specified metrics. When the HPA triggers an increase in the pod count to maintain set resource utilization thresholds, it ensures that your application maintains performance without manual intervention.
The CA monitors the need for additional worker nodes. If the increased pod count requires more resources than currently available, CA will provision additional worker nodes to accommodate the new pods. This ensures that the cluster adapts dynamically to the load.
Benefits of using HPA and CA Together: This combination allows for a highly responsive and resilient Kubernetes environment that can scale both horizontally (more pods) and vertically (more nodes) as needed.

By leveraging HPA and CA, you can ensure that your Kubernetes cluster scales efficiently in response to application demands, optimizing both resource usage and cost.

Checklist: Scaling applications rapidly using a buffer node

Here’s how it works:

Deploy low-priority buffer pods: Create a deployment or replica set with low-priority pods that have minimal resource requirements. Set the priorityClassName of these pods to a lower value than your main application pods.
Configure cluster autoscaler: Set up the Cluster Autoscaler to monitor the resource utilization of the cluster. The Cluster Autoscaler will automatically create buffer nodes based on the demand from the low-priority buffer pods.
Scaling behavior: When the Horizontal Pod Autoscaler (HPA) triggers the scaling of your main application pods, Kubernetes will prioritize scheduling them on the available nodes. If there is insufficient capacity on the existing nodes, Kubernetes will evict the low-priority buffer pods to make room for the higher-priority application pods. The evicted buffer pods will be rescheduled onto the newly created buffer nodes by the Cluster Autoscaler.

Checklist: Optimize Application Start-up Time

To optimize the application start-up time and mitigate the impact of image fetching, consider the following strategies:

Use a registry in the same region: Ensure that your container image registry (or a mirror) is located in the same region as your Kubernetes cluster. This proximity helps reduce network latency and speeds up the image-fetching process.
Implement a pull-through cache: Deploy a local Docker registry within your Kubernetes cluster as a pull-through cache. Configure your applications to pull images from this local registry instead of the remote registry. The local registry will cache the images pulled from the remote registry, helping to reduce the network latency for subsequent image pulls.
Optimize image size: Minimize the size of your container images by removing unnecessary files and layers. Use minimal base images and leverage multi-stage builds to keep the final image size small. Smaller image sizes result in faster image downloads and shorter start-up times. For those seeking highly optimized and secure base images, consider using offerings from providers like Chainguard. These images are designed with security and minimal size in mind, which can further enhance the performance and security of your applications.
Utilize image pre-pulling: Consider implementing image pre-pulling techniques to proactively fetch and cache the required images on the nodes. This can be achieved using a DaemonSet that pulls the images on each node.

Checklist: DNS Scaling

To ensure efficient DNS resolution and handle increased traffic in your Kubernetes cluster, consider scaling DNS at three layers:

Cluster CoreDNS: Scale CoreDNS by increasing the number of replicas in the deployment. Adjust the replicas based on CPU utilization and DNS latency.
Node-Local DNS cache: Implement the NodeLocalDNSCache daemonset to enable DNS caching on each node. Node-local caching reduces the load on CoreDNS and improves DNS performance.
Cloud provider DNS forwarder: DigitalOcean manages and scales the DNS forwarder in each region. This layer handles external DNS resolution and is automatically scaled by the cloud provider.

To optimize DNS performance in your cluster:

Monitor CoreDNS performance and scale the number of replicas as needed.
Deploy the NodeLocalDNSCache daemonset to enable node-local DNS caching.

Checklist: Caching and Database Scaling

Databases work very well but also require due diligence when scaling. Help ensure your system remains efficient and responsive under heavy loads by implementing these key strategies:

Caching: Implement Redis, Memcached, or some other caching mechanism to store frequently accessed data, reducing database load and improving response times.
Connection pooling: Use connection pooling to manage database connections more efficiently, minimizing overhead and enhancing throughput.
Database scaling: For Horizontal (sharding), partition data across multiple servers to effectively manage large datasets and high traffic volumes. You can also use read replicas to distribute read queries and balance the load in read-heavy applications
Managed services: Consider DigitalOcean Managed Databases for automated scaling, maintenance, and security, simplifying database management.
Monitoring: Continuously monitor and optimize your database and caching setup to promptly identify and address performance bottlenecks.

Checklist: Network Load Testing

Implement realistic testing scenarios: Utilize tools like Apache JMeter, Locust, or Gatling to simulate real-world user behavior and traffic patterns. Ensure these scenarios cover both expected traffic and extreme load conditions to thoroughly test the network’s capability.
Incorporate chaos testing: In addition to standard load testing, integrate chaos testing methods to evaluate how your network handles unexpected disruptions. Tools such as ChaosMesh, LitmusChaos, or ChaosMonkey can introduce random failures into your network components (like randomly terminating instances) to test the resilience and failover mechanisms.