How Urlbox runs its website screenshot API on DigitalOcean Kubernetes

"I like the simplicity of DigitalOcean compared with AWS and Google Cloud.”
Chris Roebuck, Founder and CEO of UrlboxFounder and CEO, Urlbox

How Urlbox runs its website screenshot API on DigitalOcean Kubernetes

Urlbox is a ‘screenshots as a service’ API. With Urlbox, you can automate website screenshots, convert HTML to pdf or png files, and more. Urlbox makes it easy to grab full or partial webpage screenshots as if you were using real desktop or mobile devices, with the Chrome browser.

Incredibly, Urlbox is a bootstrapped, profitable company whose single employee, Chris Roebuck, is its Founder and CEO.

This is the story of why and how Urlbox uses DigitalOcean – specifically our managed Kubernetes service and Container Registry.

“DigitalOcean Kubernetes’ ease of use, low pricing, and responsive support make it the right choice for me and Urlbox. It costs me about half as much as Google Kubernetes Engine, and whenever I’ve gotten stuck and opened tickets, I’ve usually gotten responses back in under an hour.”

— Chris Roebuck, Founder and CEO of Urlbox

Urlbox’s early days: is 'screenshots as a service’ even a business?

Urlbox’s story is very much a reflection of Chris’ personal path, and is proof that sometimes great businesses aren’t always obviously successful ideas at the start.

In 2012, after working for big banks in London, Chris decided to follow his entrepreneurial dreams in Silicon Valley. After participating in Y Combinator (YC) with his college friends, Chris returned to London and quietly launched a minimum viable Urlbox in 2013. For the next few years, Urlbox would sit silently on the open internet, largely in maintenance mode, initially running on Amazon EC2 VMs.

The rise of the API economy, and a second act for Urlbox

Over the next several years, disruptive innovations would dramatically alter the ways in which developers run and build software. Rather than operating infrastructure and building every software component themselves, developers shifted workloads to the cloud, and sought to utilize developer-friendly API services like Stripe and Twilio whenever possible.

Rather quietly, developers at the likes of BBC, Booking.com, NYTimes, and Yahoo searched Google for phrases like ‘website screenshot API’. They found Urlbox, and signed on as customers.

By 2016, Chris began to realize that Urlbox might be a pretty good business after all, and in late 2018 he sought to reduce his cloud computing costs. He migrated Urlbox to run on more than one hundred of DigitalOcean’s Standard Droplet virtual machines, most of them with 2vCPU, 4GB, and 80GB SSD. By migrating from AWS to DigitalOcean, Urlbox cuts its core compute and network costs roughly in half.

Seeking a simple way to scale: enter DigitalOcean Kubernetes

Not long after, in early 2019, Chris decided to further rethink his approach to the Urlbox infrastructure. One of the main issues with his VM-based setup was that there was no easy way to scale in accordance with user demand. Consequently, he either overprovisioned resources and left machines idle, or underprovisioned and introduced latency and problems that frustrated users. He decided it was time to put in place a modern, cloud native foundation that could automatically scale and support future growth.

As he had already switched to DigitalOcean, he suspected that the easiest way to adopt Kubernetes would be to utilize DigitalOcean’s managed service. In his evaluations of DigitalOcean Kubernetes, he found its developer experience much simpler than alternatives’. He also appreciated that DigitalOcean did not charge a fee for the cluster control plane and its master node.

Having made the decision to use DigitalOcean Kubernetes to run Urlbox’s services, Chris now operates Urlbox in production using an architecture that looks like this:

Independently scalable, containerized microservices

Urlbox now utilizes several different Node.js microservices, all of which run on DigitalOcean Kubernetes:

  • a publicly accessible dashboard service (urlbox.io) to provide a graphical user interface (written with React) for developers to use and manage their Urlbox account
  • a publicly accessible API service (api.urlbox.io) to provide developer-friendly endpoints for generating screenshots and pdf files, written in Typescript
  • a private website rendering service that uses headless Chrome via the Puppeteer Node.js library to generate and store screenshots in users’ AWS S3 object storage accounts
  • private monitoring and analytics services that use Prometheus and Grafana, and DigitalOcean Block Storage as the backing Persistent Volume Claims
  • A set of private auxiliary services: cert-manager to auto-renew ssl certificates, ExternalDNS to auto update dns records in Cloudflare, logging using logspout containers to send various logs to papertrail, descheduler to occasionally reschedule pods to optimize resource usage on each node, overprovisioner to run a few paused pods that reserve nodes for future use

Three right-sized node pools per Kubernetes cluster

To run this architecture and scale independent microservices in response to user demand, Urlbox utilizes two identically-configured DigitalOcean Kubernetes clusters, one in NYC and the other in SF. Each cluster has three node pools, with different sized VMs that match their use case:

  1. a node pool to run the dashboard and API services using Standard Droplets with 4 vCPU and 8GB RAM
  2. a node pool to run the rendering service using more powerful Standard Droplets with 6 vCPU and 16GB RAM
  3. a node pool to run monitoring and analytics using Standard Droplets with 4 vCPU and 8GB RAM

A safe, speedy public interface with the internet

To publicly expose its screenshot service to the internet, Urlbox utilizes Cloudflare for DNS and CDN. Cloudflare in turn routes legitimate user requests to a DigitalOcean Load Balancer, which passes them on to a HAProxy Kubernetes Ingress Controller. This controller then passes requests to the API or dashboard services, as appropriate.

A continuous build pipeline utilizing DigitalOcean Container Registry

To release new features, Urlbox runs a build pipeline that stitches together services from various vendors.

When new code is pushed to GitHub, this triggers a build in Google Cloud Builder, which then pushes the new build images into DigitalOcean Container Registry. Once container images are ready, they are  deployed to the cluster — without any downtime — using helmfile.

Growing together: DigitalOcean ❤️ Urlbox

DigitalOcean is proud to have Urlbox as a customer, and is committed to supporting Chris and his ambitions.
It’s our hope that sharing the Urlbox story inspires other software developers and aspiring entrepreneurs to build the applications and businesses of their dreams.

Urlbox

Industry

Developer API

HQ

London

# SAVINGS

Roughly 50% the cost of GCP and AWS

Start building with DigitalOcean!

Sign up now and you'll be up and running on DigitalOcean in just minutes.

Try DigitalOcean

Start building today

Sign up now and you'll be up and running on DigitalOcean in just minutes.