Over the past several years, containers in general, and Docker specifically, have become quite prevalent across industry. Containerization offers isolated and reproducible build and runtime environments in a simple and developer-friendly form. They make the entire software development process run a bit smoother, from initial development to deploying services in production. Orchestration frameworks like Kubernetes and Mesos offer robust abstractions of service components, which simplifies deployment and management.
Like many other tech companies, DigitalOcean uses containers internally to run production services. Quite a few of our services run inside Kubernetes, and a large slice of those run on an internal platform that we've built to abstract away some of the pain points for developers new to Kubernetes. We also use containers for CI/CD in our build systems, and locally for development. In this post, I’ll describe how we redesigned our Docker registry architecture for better performance. (You can find out more about how DigitalOcean used both containers and Kubernetes in a talk by Joonas Bergius, and more about our internal platform, DOCC, in this talk by Mac Browning.)
Initially, to host our private Docker images, we set up a single server running the official Docker registry, backed by object storage. This is a common, simple pattern for private registries, and it worked well early on. By relying on a consistent object store for backing storage, the registry itself doesn’t have to worry about consistency. However, with a single registry instance, there are still performance and availability bottlenecks, as well as a dependency on being able to reach the region running the registry.
As our use of containers grew, we started to experience general performance issues such as slow or failing image pushes. A simple solution for this would be to increase the number of registry instances running, but we’d still have a dependency on the single region being available and reachable from every server.
Additionally, the default behavior of the official Docker registry is to serve the actual image data via a redirect to the backing store. This means a request from a client arrives at the registry server, which returns a HTTP redirect to object storage (or whatever remote backend you have configured the registry to use). One unique issue that we encountered was a large deployment of large Docker images (~10GB) spiking bandwidth to our storage backend. Hundreds of clients requested a new, large image at the same time, saturating our connection to storage from our data center. Running multiple instances of the registry wouldn’t solve this issue—all the data would still come from the backing store.
We decided it was time to to overhaul our Docker registry architecture, with a few primary goals in mind:
We operate relatively large Kubernetes clusters in every DigitalOcean region, so using the fundamental building blocks that Kubernetes and our customizations offer was a logical choice. Kubernetes provided us with great primitives like scaling deployments and simple rolling deploys. Additionally, we have lots of internal tooling for running, monitoring, and managing services running inside Kubernetes.
For caching, we decided to take advantage of the Docker registry’s ability to disable redirects. Disabling redirection causes the registry server to retrieve image data, and then send it directly to the client, instead of redirecting the request to the backend store. This adds a bit of latency to the initial response, but enables us to put a caching proxy like Squid in front of the registry and serve cached data without transiting to the backing store on subsequent requests.
At this point, we had a good idea of how to run multiple caching registries in every region, but we still needed a way to direct clients to request Docker images from the registry in their region, instead of a single global one. To accomplish this, we created a new DNS zone that was not shared between regions, so that clients in each region could resolve the DNS address of our registry to the local region's registry deployment, instead of to a single registry located in a different region.
The registry configuration we ended up using was rather standard, using a storage backend configured with access key and secret key. The one important bit, as previously mentioned was disabling `redirect`:
For caching image data locally with the registry, we chose to use Squid. Each instance of the registry would be deployed with its own Squid instance, with its own cache storage. This approach was simple to set up and configure, but does have drawbacks: notably, that each instance of the registry has its own independent cache. This means that in a deployment of multiple instances, multiple identical requests directed to different backing instances could result in several cache misses, one for each instance of the registry and cache. There's room for future improvement here, setting up a larger, shared cache that all registry instances in a region sit behind. Any local caching at all was a big improvement over our original setup, so it was an okay tradeoff to make in our initial work.
To configure Squid, we wrote a simple configuration to listen for HTTPS connections and to send all cache misses to the local registry:
https_port 443 accel defaultsite=dockerregistry no-vhost cert=cert.pem key=key.pem
cache_peer 127.0.0.1 parent 5000 0 no-query originserver no-digest forceddomain=dockerregistry name=upstream login=PASSTHRU ssl
acl site dstdomain dockerregistry
http_access allow site
cache_peer_access upstream allow site
cache allow site
Once we had written the registry and Squid configuration, we combined the two pieces of software to run together in a Kubernetes deployment. Each pod would run an instance of the registry and an instance of Squid, with its own temporary disk storage. Deploying this across our regional Kubernetes clusters was straightforward.
- name: registry-config
- name: squid-config
- name: cache
- name: registry
- name: registry-config
- name: squid
- containerPort: 443
- name: squid-config
- name: cache
The last bit of remaining work was enabling ingress to our new registry, which we did using our existing HAProxy ingress controllers. We terminate TLS with Squid, so HAProxy is only responsible for forwarding TCP traffic to our deployment.
- host: dockerregistry
- path: /
In conclusion, this registry architecture has been working well, providing much quicker pulls and pushes across all of our data centers. With this setup, we now have Docker registries running in all of our regions, and no region depends on reaching another region to serve data. Each registry instance is now backed by a Squid caching proxy, allowing us to keep many requests for the same data entirely in cache, and entirely local to the region. This has enabled larger deploys and much higher pull performance.
Future improvements will be made around metrics instrumentation and monitoring. While we currently compute metrics by scraping the registry logs, we're looking forward to the Docker registry including Prometheus metrics natively. Additionally, creating a shared regional cache for our registry deployments should provide a nice performance boost and reduce the number of cache misses we see in operation.
Jeff Zellner is a Senior Software Engineer on the Delivery team, where he works on providing infrastructure and automation around Kubernetes to the DigitalOcean engineering organization at large. He's a long-time remote worker, startup-o-phile, and incredibly good skier.