Report this

What is the reason for this report?

How to Build a Multi-Region Disaster Recovery Strategy Using Read-Only Nodes on DigitalOcean Managed Databases

Published on July 23, 2025
How to Build a Multi-Region Disaster Recovery Strategy Using Read-Only Nodes on DigitalOcean Managed Databases

Introduction

When a cloud region goes dark, will your app survive? In today’s always-on digital world, downtime isn’t just a minor inconvenience; it can be a major blow to your business. Whether it’s a regional outage, infrastructure failure, or even a natural disaster, having a solid disaster recovery (DR) strategy is critical for keeping your application running smoothly and ensuring minimal impact on your users.

In this guide, we’ll walk you through the process of designing a disaster recovery plan using DigitalOcean Managed Databases for PostgreSQL and MySQL. You’ll learn:

  • Why Disaster Recovery is Essential: The importance of building a resilient infrastructure to safeguard your app and its users.
  • Challenges with DIY DR Solutions: We’ll compare self-managed DR strategies with DigitalOcean’s managed approach and highlight why managed services make life easier.
  • Setting Up Read-Only Nodes for DR: A step-by-step guide to creating and managing read-only nodes, an essential component of your DR strategy.
  • Failover and Recovery: How to quickly promote a read-only node to primary in the event of a failure, ensuring your app remains functional and available.

You’ll gain the knowledge needed to use DigitalOcean’s managed databases for building a disaster recovery plan that reduces risk and keeps your app available, even in the face of unexpected disruptions.

Key Takeaways

  • Disaster Recovery (DR) ensures business continuity by keeping your application and data available, even during regional outages, infrastructure failures, or natural disasters.
  • DigitalOcean Managed Databases for PostgreSQL and MySQL simplify DR by automating complex tasks like cross-region replication, failover, and backups—removing the need for manual configuration and custom scripts.
  • Read-only nodes can be deployed in multiple regions with just a few clicks or API calls, providing geographically distributed replicas that can be promoted to primary in case of a regional failure.
  • Seamless, near real-time replication keeps your data synchronized between the primary node and all read-only nodes, minimizing data loss (low RPO) and ensuring consistency across regions.
  • Point-in-Time Recovery (PITR) allows you to restore your database to any specific moment within the last 7 days, providing a safety net against accidental data loss or corruption.
  • Automated failover and node promotion enable fast recovery (low RTO) by allowing you to quickly switch your application to a healthy region with minimal downtime.
  • Built-in monitoring and alerting help you proactively detect issues with replication, backups, or database health, reducing operational overhead and risk of unnoticed failures.
  • Managed DR reduces operational complexity and human error compared to self-managed solutions, freeing your team to focus on application development rather than infrastructure maintenance.
  • Cost and trade-offs are important: While cross-region replication increases infrastructure costs, it dramatically improves your ability to recover quickly and with minimal data loss compared to periodic backups alone.
  • This tutorial covers: the importance of DR, key metrics (RTO/RPO), the differences between self-managed and managed DR, step-by-step setup of read-only nodes, how to promote a replica during failover, and best practices for testing and maintaining your DR strategy on DigitalOcean.

By following the strategies and steps outlined in this guide, you can build a robust, multi-region disaster recovery plan that balances cost, complexity, and business requirements—ensuring your application remains resilient in the face of unexpected disruptions.

Prerequisites

Before you dive into this tutorial, make sure you have:

Why is Disaster Recovery Important?

As cloud infrastructure becomes more integral to our operations, even short outages can have a significant impact, including:

  • Revenue loss from downtime.
  • Breach of SLAs (Service-Level Agreements).
  • Damage to your brand’s reputation and user trust.
  • Legal and compliance issues (e.g., GDPR, HIPAA).

Disaster Recovery (DR) is all about ensuring your services keep running, even when things go wrong. It’s about having a plan in place that lets your app keep serving users from another region if one goes down, minimizing the impact and ensuring continuity.

While High Availability (HA) setups are designed to keep operations running smoothly within a single region with the help of standby nodes, DR goes a step further by replicating data across multiple regions to protect against bigger, regional outages.

DR

What are the Key DR Metrics: RTO and RPO?

In order to design a solid DR plan, understanding key metrics like Recovery Time Objective (RTO) and Recovery Point Objective (RPO) is crucial. These two terms help define the recovery goals and expected performance of your DR plan.

  • Recovery Time Objective (RTO): The maximum amount of downtime that is acceptable in the event of a failure. RTO helps you define how quickly your system needs to recover before business operations are significantly impacted.
  • Recovery Point Objective (RPO): The maximum amount of data loss you can tolerate in the event of a failure. RPO helps determine how frequently you need to back up your data and how up-to-date the replica copies need to be.

Replication and the Cost/Benefit Trade-Off

One of the most effective ways to reduce both RTO and RPO is through replication. Replicating your database across regions ensures that there is always a backup copy of your data that can quickly take over in the event of a failure. However, this comes with trade-offs:

  • Cost: Replication across regions incurs additional costs. You’re essentially running multiple copies of your database, which increases your infrastructure costs. The more replicas you have, the more resources you need.
  • RPO Improvement: Replication improves RPO by synchronizing data across regions in near-real-time. If your database is replicated continuously, the impact of a failure is minimized because only the most recent changes could be lost. This is a significant improvement compared to periodic backups, which can leave you vulnerable to larger data losses.

The decision between replication and periodic backups comes down to your risk tolerance and business requirements. Replication can ensure minimal data loss but increases cost, while backups are cheaper but have a higher RPO.

How Is DR Typically Handled in Self-Managed Setups?

In self-managed environments, implementing a Disaster Recovery (DR) strategy usually involves:

  • Setting up database replication across multiple regions manually.
  • Provisioning Virtual Machines (VMs) and configuring failover mechanisms.
  • Monitoring and maintaining database health, including backups and replication status.
  • Writing custom scripts for handling failover and database promotion.
  • Regularly testing failover to ensure everything works as expected.

While this approach gives you complete control over your setup, it also comes with a significant amount of overhead. You’ll need deep expertise in replication, clustering, and DR best practices, and all of that has to be managed and maintained by your team.

How DigitalOcean’s Managed Databases Simplify DR

DigitalOcean’s Managed Databases for PostgreSQL and MySQL take the headache out of building disaster recovery solutions. Here’s how they make it easier:

Read-only Nodes

  • DigitalOcean’s read-only nodes are simple to set up across different regions. These nodes serve as real-time replicas of your primary database, ensuring that your data is continuously synchronized and available in multiple locations.

  • Creating read-only nodes in regions geographically distant from your primary, helps you set up a robust disaster recovery plan that allows you to mitigate the risks of a regional outage by having a failover option in another region.

Point-in-Time Recovery (PITR)

  • Automated backups with point-in-time recovery (PITR) give you peace of mind, knowing your data is safe and easily recoverable. PITR is based on write-ahead logs (WALs) that are continuously backed up every few minutes.

  • This allows you to recover your data to any specific point in time within a 7-day window. In case of accidental or malicious data deletion, you can restore your database back to the last known good state. However, it’s important to note that PITR operates within a single region.

If the region where the backups are stored becomes unavailable due to a regional failure, the PITR feature won’t be able to help you recover data lost in that region.

Seamless Replication Across Regions

  • Seamless replication ensures that your data is consistently synchronized between your primary node and read-only nodes across different regions.

  • Unlike PITR, which restores data within a single region, seamless replication addresses regional failures. If one region experiences downtime, replication ensures that up-to-date data is available in another region. This minimizes the risk of data loss and enables a quick failover to another region, ensuring that your application remains operational despite a regional outage.

Replica promotion for Quick Failover

  • If a regional outage causes the primary database to fail, replica promotion enables you to promote a read-only node to become the new primary. This significantly reduces downtime, as your application can immediately switch to the newly promoted primary database.

  • The promoted read-only node becomes fully writable, ensuring continuity of operations. This process allows your application to quickly recover from a regional failure and minimizes downtime.

Low Management Overhead

  • DigitalOcean’s managed databases drastically reduce management overhead. The combination of automated backups, PITR, and seamless replication means that these disaster recovery processes are handled with less intervention, without the need for manual configuration or constant monitoring. This automation not only saves you time but also ensures that your database is protected without requiring ongoing manual intervention. You can focus on other aspects of your application and business development, knowing that your disaster recovery system is working behind the scenes.

  • With DigitalOcean’s managed databases, you can set up a multi-region disaster recovery strategy with minimal effort, letting you focus on what really matters. For more information about Managed Databases and its features, take a look at the official documentation: DigitalOcean Managed Databases Overview

Setting Up a Multi-Region DR Strategy Using Read-Only Nodes

Step 1: Set Up Your Managed Database Cluster

If you haven’t already set up your primary database cluster, this is the first step in building your disaster recovery strategy. You can easily do this via the DigitalOcean Control Panel or the doctl CLI or API.

  • To create a database using doctl, you need to provide values for the --engine, --region, and --size flags. Use the doctl databases options engines, doctl databases options regions, and doctl databases options slugs commands, respectively, to get a list of available values. The following example creates a MySQL database cluster named example-database in the nyc3 region with a single 1 GB node (Basic usage looks like this, but you can read the usage docs for more details):
doctl databases create example-database \
  --engine mysql \
  --region nyc3 \
  --size db-s-1vcpu-1gb \
  --num-nodes 1
  • To create a database using the API, you need to provide values for the engine, region, and size fields, which specify the database’s engine, its datacenter, and its configuration (number of CPUs, amount of RAM, and hard disk space). Use the /v2/databases/options endpoint to get a list of available values. For example using cURL:
curl -X POST \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $DIGITALOCEAN_TOKEN" \
  -d '{"name": "backend", "engine": "pg", "version": "14", "region": "nyc3", "size": "db-s-2vcpu-4gb", "num_nodes": 2, "storage_size_mib": 61440, "tags": ["production"]}' \
  "https://api.digitalocean.com/v2/databases"
  • You can also create a MySQL database cluster from the cloud panel at any time from the Create menu by selecting Databases. In the create menu, click Databases to open the database cluster creation page. This is where you choose your database cluster’s configuration, like the number and size of nodes and the datacenter region.

For more information, refer to:

  1. How to Create MySQL Database Clusters

  2. How to Create PostgreSQL Database Clusters

When creating the cluster, make sure to choose the right database engine (PostgreSQL or MySQL), size, and region. If you’re aiming for automatic failover within the same region, you can enable High Availability (HA), although for a basic DR setup, this isn’t a must.

Note: If your cluster is already set up, the primary node of your existing database will act as the main point of entry for your application’s write operations.

Step 2: Add a Read-Only Node in a Different Region

Once your primary cluster is set up, the next step is to create a read-only node in a different geographic region. This node will serve as a replica of your primary database, ensuring data continuity if the primary region experiences issues. Communication between primary and read-only nodes is SSL-encrypted and sent over the public network. Read-only nodes differ from standby nodes, which are exact copies of the primary node that are automatically moved into place in the event of a primary node failure.

  • To create a read-only node using doctl, you need to provide values --region and --size flags, which specify the node’s datacenter and its configuration (number of CPUs, amount of RAM, and hard disk space). Use the doctl databases options regions and doctl databases options slugs commands, respectively, to get a list of available values. The following example creates a read-only replica named example-replica for a database cluster with the ID ca9f591d-f38h-5555-a0ef-1c02d1d1e35 (Basic usage looks like this, but you can read the usage docs for more details):
doctl databases replica create ca9f591d-f38h-5555-a0ef-1c02d1d1e35 example-replica --size db-s-1vcpu-1gb
  • To create a read-only node using the API, you need to provide values for the region and size fields, which specify the new node’s datacenter and its configuration (number of CPUs, amount of RAM, and hard disk space). Use the /v2/databases/options endpoint to get a list of available values. For example using cURL:
curl -X POST 
 -H "Content-Type: application/json" \
 -H "Authorization: Bearer $DIGITALOCEAN_TOKEN" \
 -d '{"name":"read-nyc3-01", "region":"nyc3", "size": "db-s-2vcpu-4gb", "storage_size_mib": 61440}' \
 "https://api.digitalocean.com/v2/databases/9cc10173-e9ea-4176-9dbc-a4cee4c4ff30/replicas"
  • To add a read-only node from the Cloud Control Panel, click the name of the cluster to go to its Overview. At the bottom of the page, click the Add a read-only node link to go to the read-only node creation page. Select the size, which must be equal to or larger than the primary node, then select the datacenter and name the node. When you’re done, click Create a read-only node to provision the node.

For more details, refer to:

  1. How to Add Read-Only Nodes to MySQL Database Clusters

  2. How to Add Read-Only Nodes to PostgreSQL Database Clusters

Choose a region that is geographically distant from the primary (e.g., if your primary is in nyc3, choose lon1 or sgp1 for redundancy). This read-only node will asynchronously replicate data from the primary cluster, ensuring that it is up-to-date with the latest changes.

Step 3: Verify Read-Only Node Connectivity

Once the read-only node is set up, it’s essential to verify that it is fully operational and ready to handle your application’s read-heavy queries.

  1. Check Read-Only Node Accessibility: Start by confirming that your application can successfully connect to the read-only node. You can do this by running manual queries or directing your application’s read queries to the read-only node. This ensures the node is correctly synchronized with the primary database and is available to handle your application’s read traffic.

  2. Ensure Read-Only Node Can Handle Expected Traffic: The read-only node is designed to offload read-heavy queries from the primary node. Ensure that the read-only node can handle the expected amount of read traffic without any issues. This offloads the primary node, reducing its workload and improving the overall efficiency of the application.

  3. Confirm Read-Only Functionality: Keep in mind that the read-only node is designed to handle only read operations, not write operations. This means it will accept queries but will not process writes. Monitoring its load can help you confirm whether the node is effectively reducing the load on the primary node and maintaining overall performance.

Step 4: Monitor Replication Lag

Once the read-only node is verified as operational, it’s important to monitor replication lag, a critical KPI to ensure that the read-only node stays in sync with the primary database.

Why Monitor Replication Lag?

Replication lag is the delay between when data is written to the primary node and when it appears on the read-only node. If the replication lag grows too large, it could impact your Recovery Point Objective (RPO) and result in data loss or inconsistency in case of failover. Monitoring this metric is essential for ensuring your disaster recovery setup is working as expected.

Using DigitalOcean’s Metrics to Monitor Replication Lag

DigitalOcean offers multiple tools to help you monitor replication lag and ensure your database cluster is functioning as expected.

For PostgreSQL, replication delay is reported in bytes in the Metrics tab of the Control Panel. This metric shows how far behind the read-only node is in terms of unreplicated data. For more information on how to monitor PostgreSQL databases, refer to this guide on PostgreSQL metrics.

You can also access detailed performance data, including replication lag, via the metrics endpoint of the DigitalOcean API. You can read more about accessing the metrics endpoint for PostgreSQL.

For MySQL, you can monitor replication lag via the metrics endpoint. The key metric for replication lag is mysql_slave_seconds_behind_master, which indicates the number of seconds the replica is behind the primary node. You can scrape these metrics from the metrics endpoint. More details on how to access and scrape MySQL metrics can be found in this guide. Additionally, MySQL users can monitor replication delay using the SHOW SLAVE STATUS command, which will provide the details. For more information, refer to the official MySQL documentation on replication delay.

Monitor Other Key Metrics to Avoid Replication Issues

In addition to replication lag, you can also monitor other key metrics, such as CPU usage, memory usage, and disk I/O, to ensure your nodes are performing well. DigitalOcean provides easy-to-understand metrics and data points to track resource consumption.

If replication lag increases or is higher than expected, check the performance of your nodes, including:

  • CPU and memory usage: Excessive CPU or memory usage could be slowing down replication.
  • Disk I/O: Slow disk reads or writes on the primary node can cause replication delays.
  • Examine query performance: Slow queries on the primary node can back up replication. Ensure your queries are optimized and do not lock resources unnecessarily.

What to do in Case of an Outage: Recovery Steps

Step 1: Promote the Read-Only Node to Primary

In the event of an actual regional outage, you may need to promote your read-only replica to become the new primary database. This can be done quickly via the doctl CLI or API. You can also promote a read-only node to create a new database cluster in a different datacenter region. This option can help you maintain uptime if a cluster is experiencing issues in another region.

  • Using doctl, the following example promotes a read-only replica named example-replica for a database cluster with the ID ca9f591d-f38h-5555-a0ef-1c02d1d1e35:
doctl databases replica promote ca9f591d-f38h-5555-a0ef-1c02d1d1e35 example-replica
  • Using API, you can send a PUT request to https://api.digitalocean.com/v2/databases/{database_cluster_uuid}/replicas/{replica_name}/promote

For example, using cURL:

curl -X PUT \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $DIGITALOCEAN_TOKEN" \
"https://api.digitalocean.com/v2/databases/9cc10173-e9ea-4176-9dbc-a4cee4c4ff30/replicas/read-nyc3-01/promote"

For more information, refer to:

Promote a MySQL Read-Only Node to Become a Primary Node

Promote a PostgreSQL Read-Only Node to Become a Primary Node

Once promoted, the read-only node becomes an independent database cluster and can now accept write operations.

Important: After promotion, the replication relationship between the original primary and the new primary is broken, and they will no longer remain in sync.

Step 2: Redirect Traffic to the New Primary

With the read-only node now promoted, the next step is to redirect your application’s traffic to the new primary database. Update your application configuration to point to the new database’s connection string or endpoint.

  • If using DNS or a load balancer, update the routing rules to direct traffic to the new region.
  • Ensure that any database client libraries are using the correct connection parameters.

Step 3: Restore the Original Region When Ready

Once the original primary region is back online, you can revert to your original setup by:

  1. Creating a new read-only node in the restored region.
  2. Promoting the new read-only node to be the primary.
  3. Redirecting traffic back to the original region.
  4. (Optional) Re-establishing replication from the new primary to the previous region for future DR readiness.

This ensures that your application can fail back to the original region after the outage has been resolved.

What are the Differences Between Self-Managed and Managed Databases for Disaster Recovery?

Aspect Self-Managed Database Managed Database (DigitalOcean)
Disaster Recovery Setup Requires manual configuration of replication, failover, and backup. Simple setup with automated replication, failover, and backups.
Replication & Failover Manual setup and maintenance of replication and failover processes. Failover requires custom scripts. Automated replication across regions with seamless failover in the same region and read-only node promotion in other regions.
Backups & Data Recovery Manually configure backup schedules, PITR, and restore processes. Responsibility for ensuring reliability of backups. Automated backups with point-in-time recovery (PITR) for easy restoration to any specific time in the last 7 days.
Recovery Time Recovery time is longer due to manual intervention (replica promotion, backup restoration). Fast recovery with automatic failover and immediate promotion of read-only nodes to primary.
Data Synchronization You must monitor and ensure synchronization between primary and replica nodes, managing replication health manually. Data is automatically synchronized across regions with minimal lag, ensuring consistency.
Monitoring & Alerts Requires manual setup for monitoring replication status, database health, and performance. Alerts are custom. Built-in monitoring with automatic alerts for replication, health, performance, and backup issues.
Ease of Scaling Scaling is manual and requires careful planning for additional replicas or resources. Scaling is seamless with DigitalOcean’s infrastructure. You can add more nodes and scale as needed with minimal effort.
Operational Overhead High overhead in setting up, managing, and maintaining DR systems, including manual testing and ongoing updates. Low overhead. Most DR-related operations (replication, failover, backups) are automated and managed by DigitalOcean.

Conclusion

DigitalOcean Managed Databases make disaster recovery simple and reliable by automating replication, failover, backups, and monitoring—minimizing manual effort and reducing the risk of downtime. While self-managed databases offer more control, they require significant expertise and ongoing maintenance. For most teams, especially those seeking efficiency and peace of mind, managed solutions are the fastest path to resilient, multi-region DR.

Want to go deeper? Check out these related DigitalOcean tutorials:

Explore these resources to strengthen your DR strategy and build more resilient cloud applications.

Check out our list of awesome tutorials on DigitalOcean Community.

Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.

Learn more about our products

About the author(s)

Shamim Raashid
Shamim Raashid
Author
Senior Solutions Architect
See author profile
Anish Singh Walia
Anish Singh Walia
Editor
Sr Technical Writer
See author profile

Helping Businesses stand out with AI, SEO, & Technical content that drives Impact & Growth | Senior Technical Writer @ DigitalOcean | 2x Medium Top Writers | 2 Million+ monthly views & 34K Subscribers | Ex Cloud Engineer @ AMEX | Ex SRE(DevOps) @ NUTANIX

Still looking for an answer?

Was this helpful?


This textbox defaults to using Markdown to format your answer.

You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!

Creative CommonsThis work is licensed under a Creative Commons Attribution-NonCommercial- ShareAlike 4.0 International License.
Join the Tech Talk
Success! Thank you! Please check your email for further details.

Please complete your information!

The developer cloud

Scale up as you grow — whether you're running one virtual machine or ten thousand.

Get started for free

Sign up and get $200 in credit for your first 60 days with DigitalOcean.*

*This promotional offer applies to new accounts only.