Coming To Port 53 Near You

By Sharon Campbell

Posted: November 18, 2014•3 min read

Over the past few months, our engineering team has been hard at work replacing our current DNS resolvers with a lightning fast solution. We’ve updated our old architecture with a much more scalable and reliable system for creating and resolving DNS entries.

The main concern with rolling out this new system was the potential of any downtime. We have many thousands of queries per second hitting our resolvers, which means any downtime would be inconvenient for our users.

Our requirements were:

Keep both DNS systems in sync and check for inconsistencies in order to mitigate them
Be able to fallback in the event that the new system contained a hidden demon (performance, bugs under load, etc)

The New Architecture

The new system architecture now looks like this:

new architecture diagram

Here’s the updated application flow when users add a DNS entry from a DigitalOcean application, API, or Control Panel:

Add the record to the DNS database via a RESTful API written in Go
The API will verify the entry, and if valid, will create record in the new DNS Database
After that, when a query comes into our resolvers, they will query the database for the entry and respond accordingly

Keeping Two Systems Alive

As mentioned above, we wanted to be able to fallback to the old system should the new one fall over. We performed a full backfill of the DNS entries into the new system by using the new DNS API endpoints. This did two things for us: 1) It stress tested the application for a high amount of requests; and 2) it backfilled all of the data into the new application.

We also had the challenge of converting our DNS entries from BIND syntax into a Fully Qualified Domain Name, which is a requirement in our new system. This proved to be a challenge – we ended up having many records that became inconsistent with the old implementation of DNS. We solved this by creating a small conversion library that accepts BIND syntax and returns a FQDN.

While our users were adding or updating DNS entries, we were concurrently writing to the new service, preparing it for prime time. If the service could not accept the record, say because of a failed validation, it was logged to a separate list of entries that existed in the old system (but not the new). This allowed us to triage issues separately and notify customers that they have invalid DNS entries, should that be the case.

new architecture diagram

After we were confident that we had a reliable system, we switched over the concurrent writes to be synchronous. Creating a domain record, for example, would now be written to both systems synchronously. If either failed, the transaction would be rolled back and the error was presented to the user. This was great because it allowed us to populate both systems with good certainty that they matched each other.

Turning It Up To 11

On the 27th of October, we slowly rolled out changes to the first nameserver, fixed minor configuration issues, and then continued to flip over each nameserver slowly. Now all of our DNS is served off the new architecture and we’re very pleased with it. Propagation is nearly instant from the moment you hit Submit on a domain entry.

Takeaways

We found that splitting our DNS into its own service proved to be immensely more powerful. Also, instead of doing a hard cutover, writing concurrently to the new service found issues that likely would have been missed if we had switched over without a proper release plan.

We hope you enjoy a much faster DNS!

by Robert Ross