Daniel Ni, CEO & Founder of Scraper API
Scraper API is an API that allows developers to build scalable web scrapers. If you’re running a site like Expedia and need to look up 20 million flights a day, it requires a lot of infrastructure, IP addresses, browsers, etc. Scraper API manages this kind of platform which allows data-intensive companies to easily get the data they need.
Scraper API’s chief struggle has been scaling. As they passed a thousand customers – in niches that include e-commerce, SEO, and travel – from all over the world, Scraper API’s founder, Daniel Ni, began losing sleep. As a freelance developer, he used to do scraping jobs where he’d source proxy providers and test out each of those proxies. So, they started out on a small service called NodeChef, which is a NoOps tool like Heroku, but solely for Node.js®. Eventually, they were processing half a billion requests and growing by 10-20% each month, but their tool was falling apart. Manually writing much of the code that would sort through user agents and rotating IP addresses, they needed more control and reliability.
Scraper API was initially attracted to DigitalOcean because of its simplicity. They first moved to Kubernetes because they really were hitting the limits without having a containerized solution. They hired a contractor that uses Kubernetes on several platforms, who insisted DigitalOcean has the “most straightforward experience compared to other cloud providers.” Specifically, he found value in a separate Kubernetes monitoring dashboard that allows users to see things at the individual pod level. Having different Droplet types is also very useful for Scraper API. Because the product is CPU intensive, they spun up several high-CPU Droplets, which gave them the sort of flexibility and performance they were looking for.
Daniel says they’ve been “pleasantly surprised by just how robust the platform is – both DigitalOcean Kubernetes and in general.” He recalled a particular time that they were blown away by the durability of the platform, “It's almost impossible to take down. We had a crashing issue that we didn't even notice until we looked at our logs and our servers were crashing every 20 seconds. It didn't even disrupt service – which is mind boggling – that you could push a bug into production and even that couldn't take it down.”
While Scraper API has a few microservices, it’s still a pretty “monolithic app,” according to Daniel. Today, they’re at a scale where they never hit the actual database anymore. Everything is done out of Reddis, which Daniel says isn’t too complex. But being a bandwidth-heavy service handling 6-7 billion requests each month, it all goes back to scalability. Scraper API now bundles everything into a nice, neat package and allows customers to scale up and down elastically, fixing a pain point for a lot of developers.
Because they’ve been so happy with DigitalOcean’s service offerings, Scraper API has begun consolidating a lot of their services to DigitalOcean. And Daniel says he thinks that “within a few months, they’ll have all their Redis™ and Postgres architecture on DigitalOcean because it just makes sense to have it all in one place.”
Pricing transparency is also something Daniel says he and his team appreciates. They originally got started with DigitalOcean through Hatch, which is DigitalOcean’s startup program. In the beginning, they were much more price-sensitive than they are today as their cloud costs have become a small percentage of their spend as a company because they’ve become so profitable (though cost and predictability of cost is important to them). “There’s no separate billing,” Daniel says. “You can see exactly what the costs are, which is great. On some cloud providers, you think you're paying some amount and then you have a thousand-dollar bill.”
DigitalOcean has been central to Scraper API's success – and will be with the team for the next billion requests. And they have plans to continue growing their business using DigitalOcean products.
Daniel says, “We've been consolidating a lot of our services to DigitalOcean. I think within a few months we'll have all our Redis™ and Postgres architecture on DigitalOcean, because it just makes sense to have it all in one place.”
Listen to more about Daniel's experiences as he described them during DigitalOcean's deploy 2020 conference spot:
Contact our Customer Success team to get answers.