By Shamim Raashid and Anish Singh Walia
When a cloud region goes dark, will your app survive? In today’s always-on digital world, downtime isn’t just a minor inconvenience; it can be a major blow to your business. Whether it’s a regional outage, infrastructure failure, or even a natural disaster, having a solid disaster recovery (DR) strategy is critical for keeping your application running smoothly and ensuring minimal impact on your users.
In this guide, we’ll walk you through the process of designing a disaster recovery plan using DigitalOcean Managed Databases for PostgreSQL and MySQL. You’ll learn:
You’ll gain the knowledge needed to use DigitalOcean’s managed databases for building a disaster recovery plan that reduces risk and keeps your app available, even in the face of unexpected disruptions.
By following the strategies and steps outlined in this guide, you can build a robust, multi-region disaster recovery plan that balances cost, complexity, and business requirements—ensuring your application remains resilient in the face of unexpected disruptions.
Before you dive into this tutorial, make sure you have:
As cloud infrastructure becomes more integral to our operations, even short outages can have a significant impact, including:
Disaster Recovery (DR) is all about ensuring your services keep running, even when things go wrong. It’s about having a plan in place that lets your app keep serving users from another region if one goes down, minimizing the impact and ensuring continuity.
While High Availability (HA) setups are designed to keep operations running smoothly within a single region with the help of standby nodes, DR goes a step further by replicating data across multiple regions to protect against bigger, regional outages.
In order to design a solid DR plan, understanding key metrics like Recovery Time Objective (RTO) and Recovery Point Objective (RPO) is crucial. These two terms help define the recovery goals and expected performance of your DR plan.
One of the most effective ways to reduce both RTO and RPO is through replication. Replicating your database across regions ensures that there is always a backup copy of your data that can quickly take over in the event of a failure. However, this comes with trade-offs:
The decision between replication and periodic backups comes down to your risk tolerance and business requirements. Replication can ensure minimal data loss but increases cost, while backups are cheaper but have a higher RPO.
In self-managed environments, implementing a Disaster Recovery (DR) strategy usually involves:
While this approach gives you complete control over your setup, it also comes with a significant amount of overhead. You’ll need deep expertise in replication, clustering, and DR best practices, and all of that has to be managed and maintained by your team.
DigitalOcean’s Managed Databases for PostgreSQL and MySQL take the headache out of building disaster recovery solutions. Here’s how they make it easier:
DigitalOcean’s read-only nodes are simple to set up across different regions. These nodes serve as real-time replicas of your primary database, ensuring that your data is continuously synchronized and available in multiple locations.
Creating read-only nodes in regions geographically distant from your primary, helps you set up a robust disaster recovery plan that allows you to mitigate the risks of a regional outage by having a failover option in another region.
Automated backups with point-in-time recovery (PITR) give you peace of mind, knowing your data is safe and easily recoverable. PITR is based on write-ahead logs (WALs) that are continuously backed up every few minutes.
This allows you to recover your data to any specific point in time within a 7-day window. In case of accidental or malicious data deletion, you can restore your database back to the last known good state. However, it’s important to note that PITR operates within a single region.
If the region where the backups are stored becomes unavailable due to a regional failure, the PITR feature won’t be able to help you recover data lost in that region.
Seamless replication ensures that your data is consistently synchronized between your primary node and read-only nodes across different regions.
Unlike PITR, which restores data within a single region, seamless replication addresses regional failures. If one region experiences downtime, replication ensures that up-to-date data is available in another region. This minimizes the risk of data loss and enables a quick failover to another region, ensuring that your application remains operational despite a regional outage.
If a regional outage causes the primary database to fail, replica promotion enables you to promote a read-only node to become the new primary. This significantly reduces downtime, as your application can immediately switch to the newly promoted primary database.
The promoted read-only node becomes fully writable, ensuring continuity of operations. This process allows your application to quickly recover from a regional failure and minimizes downtime.
DigitalOcean’s managed databases drastically reduce management overhead. The combination of automated backups, PITR, and seamless replication means that these disaster recovery processes are handled with less intervention, without the need for manual configuration or constant monitoring. This automation not only saves you time but also ensures that your database is protected without requiring ongoing manual intervention. You can focus on other aspects of your application and business development, knowing that your disaster recovery system is working behind the scenes.
With DigitalOcean’s managed databases, you can set up a multi-region disaster recovery strategy with minimal effort, letting you focus on what really matters. For more information about Managed Databases and its features, take a look at the official documentation: DigitalOcean Managed Databases Overview
If you haven’t already set up your primary database cluster, this is the first step in building your disaster recovery strategy. You can easily do this via the DigitalOcean Control Panel or the doctl CLI or API.
--engine
, --region
, and --size
flags. Use the doctl databases options engines
, doctl databases options regions
, and doctl databases options slugs
commands, respectively, to get a list of available values. The following example creates a MySQL database cluster named example-database
in the nyc3
region with a single 1 GB node (Basic usage looks like this, but you can read the usage docs for more details):doctl databases create example-database \
--engine mysql \
--region nyc3 \
--size db-s-1vcpu-1gb \
--num-nodes 1
curl -X POST \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $DIGITALOCEAN_TOKEN" \
-d '{"name": "backend", "engine": "pg", "version": "14", "region": "nyc3", "size": "db-s-2vcpu-4gb", "num_nodes": 2, "storage_size_mib": 61440, "tags": ["production"]}' \
"https://api.digitalocean.com/v2/databases"
For more information, refer to:
When creating the cluster, make sure to choose the right database engine (PostgreSQL or MySQL), size, and region. If you’re aiming for automatic failover within the same region, you can enable High Availability (HA), although for a basic DR setup, this isn’t a must.
Note: If your cluster is already set up, the primary node of your existing database will act as the main point of entry for your application’s write operations.
Once your primary cluster is set up, the next step is to create a read-only node in a different geographic region. This node will serve as a replica of your primary database, ensuring data continuity if the primary region experiences issues. Communication between primary and read-only nodes is SSL-encrypted and sent over the public network. Read-only nodes differ from standby nodes, which are exact copies of the primary node that are automatically moved into place in the event of a primary node failure.
--region
and --size
flags, which specify the node’s datacenter and its configuration (number of CPUs, amount of RAM, and hard disk space). Use the doctl databases options regions
and doctl databases options slugs
commands, respectively, to get a list of available values. The following example creates a read-only replica named example-replica
for a database cluster with the ID ca9f591d-f38h-5555-a0ef-1c02d1d1e35
(Basic usage looks like this, but you can read the usage docs for more details):doctl databases replica create ca9f591d-f38h-5555-a0ef-1c02d1d1e35 example-replica --size db-s-1vcpu-1gb
region
and size
fields, which specify the new node’s datacenter and its configuration (number of CPUs, amount of RAM, and hard disk space). Use the /v2/databases/options
endpoint to get a list of available values. For example using cURL:curl -X POST
-H "Content-Type: application/json" \
-H "Authorization: Bearer $DIGITALOCEAN_TOKEN" \
-d '{"name":"read-nyc3-01", "region":"nyc3", "size": "db-s-2vcpu-4gb", "storage_size_mib": 61440}' \
"https://api.digitalocean.com/v2/databases/9cc10173-e9ea-4176-9dbc-a4cee4c4ff30/replicas"
For more details, refer to:
Choose a region that is geographically distant from the primary (e.g., if your primary is in nyc3
, choose lon1
or sgp1
for redundancy). This read-only node will asynchronously replicate data from the primary cluster, ensuring that it is up-to-date with the latest changes.
Once the read-only node is set up, it’s essential to verify that it is fully operational and ready to handle your application’s read-heavy queries.
Check Read-Only Node Accessibility: Start by confirming that your application can successfully connect to the read-only node. You can do this by running manual queries or directing your application’s read queries to the read-only node. This ensures the node is correctly synchronized with the primary database and is available to handle your application’s read traffic.
Ensure Read-Only Node Can Handle Expected Traffic: The read-only node is designed to offload read-heavy queries from the primary node. Ensure that the read-only node can handle the expected amount of read traffic without any issues. This offloads the primary node, reducing its workload and improving the overall efficiency of the application.
Confirm Read-Only Functionality: Keep in mind that the read-only node is designed to handle only read operations, not write operations. This means it will accept queries but will not process writes. Monitoring its load can help you confirm whether the node is effectively reducing the load on the primary node and maintaining overall performance.
Once the read-only node is verified as operational, it’s important to monitor replication lag, a critical KPI to ensure that the read-only node stays in sync with the primary database.
Replication lag is the delay between when data is written to the primary node and when it appears on the read-only node. If the replication lag grows too large, it could impact your Recovery Point Objective (RPO) and result in data loss or inconsistency in case of failover. Monitoring this metric is essential for ensuring your disaster recovery setup is working as expected.
DigitalOcean offers multiple tools to help you monitor replication lag and ensure your database cluster is functioning as expected.
For PostgreSQL, replication delay is reported in bytes in the Metrics tab of the Control Panel. This metric shows how far behind the read-only node is in terms of unreplicated data. For more information on how to monitor PostgreSQL databases, refer to this guide on PostgreSQL metrics.
You can also access detailed performance data, including replication lag, via the metrics endpoint of the DigitalOcean API. You can read more about accessing the metrics endpoint for PostgreSQL.
For MySQL, you can monitor replication lag via the metrics endpoint. The key metric for replication lag is mysql_slave_seconds_behind_master
, which indicates the number of seconds the replica is behind the primary node. You can scrape these metrics from the metrics endpoint. More details on how to access and scrape MySQL metrics can be found in this guide. Additionally, MySQL users can monitor replication delay using the SHOW SLAVE STATUS
command, which will provide the details. For more information, refer to the official MySQL documentation on replication delay.
In addition to replication lag, you can also monitor other key metrics, such as CPU usage, memory usage, and disk I/O, to ensure your nodes are performing well. DigitalOcean provides easy-to-understand metrics and data points to track resource consumption.
If replication lag increases or is higher than expected, check the performance of your nodes, including:
In the event of an actual regional outage, you may need to promote your read-only replica to become the new primary database. This can be done quickly via the doctl CLI or API. You can also promote a read-only node to create a new database cluster in a different datacenter region. This option can help you maintain uptime if a cluster is experiencing issues in another region.
doctl
, the following example promotes a read-only replica named example-replica
for a database cluster with the ID ca9f591d-f38h-5555-a0ef-1c02d1d1e35
:doctl databases replica promote ca9f591d-f38h-5555-a0ef-1c02d1d1e35 example-replica
https://api.digitalocean.com/v2/databases/{database_cluster_uuid}/replicas/{replica_name}/promote
For example, using cURL:
curl -X PUT \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $DIGITALOCEAN_TOKEN" \
"https://api.digitalocean.com/v2/databases/9cc10173-e9ea-4176-9dbc-a4cee4c4ff30/replicas/read-nyc3-01/promote"
For more information, refer to:
Promote a MySQL Read-Only Node to Become a Primary Node
Promote a PostgreSQL Read-Only Node to Become a Primary Node
Once promoted, the read-only node becomes an independent database cluster and can now accept write operations.
Important: After promotion, the replication relationship between the original primary and the new primary is broken, and they will no longer remain in sync.
With the read-only node now promoted, the next step is to redirect your application’s traffic to the new primary database. Update your application configuration to point to the new database’s connection string or endpoint.
Once the original primary region is back online, you can revert to your original setup by:
This ensures that your application can fail back to the original region after the outage has been resolved.
Aspect | Self-Managed Database | Managed Database (DigitalOcean) |
---|---|---|
Disaster Recovery Setup | Requires manual configuration of replication, failover, and backup. | Simple setup with automated replication, failover, and backups. |
Replication & Failover | Manual setup and maintenance of replication and failover processes. Failover requires custom scripts. | Automated replication across regions with seamless failover in the same region and read-only node promotion in other regions. |
Backups & Data Recovery | Manually configure backup schedules, PITR, and restore processes. Responsibility for ensuring reliability of backups. | Automated backups with point-in-time recovery (PITR) for easy restoration to any specific time in the last 7 days. |
Recovery Time | Recovery time is longer due to manual intervention (replica promotion, backup restoration). | Fast recovery with automatic failover and immediate promotion of read-only nodes to primary. |
Data Synchronization | You must monitor and ensure synchronization between primary and replica nodes, managing replication health manually. | Data is automatically synchronized across regions with minimal lag, ensuring consistency. |
Monitoring & Alerts | Requires manual setup for monitoring replication status, database health, and performance. Alerts are custom. | Built-in monitoring with automatic alerts for replication, health, performance, and backup issues. |
Ease of Scaling | Scaling is manual and requires careful planning for additional replicas or resources. | Scaling is seamless with DigitalOcean’s infrastructure. You can add more nodes and scale as needed with minimal effort. |
Operational Overhead | High overhead in setting up, managing, and maintaining DR systems, including manual testing and ongoing updates. | Low overhead. Most DR-related operations (replication, failover, backups) are automated and managed by DigitalOcean. |
DigitalOcean Managed Databases make disaster recovery simple and reliable by automating replication, failover, backups, and monitoring—minimizing manual effort and reducing the risk of downtime. While self-managed databases offer more control, they require significant expertise and ongoing maintenance. For most teams, especially those seeking efficiency and peace of mind, managed solutions are the fastest path to resilient, multi-region DR.
Want to go deeper? Check out these related DigitalOcean tutorials:
Explore these resources to strengthen your DR strategy and build more resilient cloud applications.
Check out our list of awesome tutorials on DigitalOcean Community.
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
Helping Businesses stand out with AI, SEO, & Technical content that drives Impact & Growth | Senior Technical Writer @ DigitalOcean | 2x Medium Top Writers | 2 Million+ monthly views & 34K Subscribers | Ex Cloud Engineer @ AMEX | Ex SRE(DevOps) @ NUTANIX
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!
Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.
Full documentation for every DigitalOcean product.
The Wave has everything you need to know about building a business, from raising funding to marketing your product.
Stay up to date by signing up for DigitalOcean’s Infrastructure as a Newsletter.
New accounts only. By submitting your email you agree to our Privacy Policy
Scale up as you grow — whether you're running one virtual machine or ten thousand.
Sign up and get $200 in credit for your first 60 days with DigitalOcean.*
*This promotional offer applies to new accounts only.