Backups and Point In Time Restore (PITR)

Posted January 17, 2020 3k views
DigitalOcean Managed MySQL Database

I currently use AWS RDS. Looking at your managed database pages and the documentation it’s not clear if, like AWS, you’re taking snapshots every 5 minutes in addition to the daily backups - or there is only the daily backup?

What is the Recovery Point Objective (RPO) of your managed MySQL databases? This is assuming user error, so a cluster does not help as the mistake will be replicated.

These answers are provided by our Community. If you find them useful, show some love by clicking the heart. If you run into issues leave a comment, or add your own answer to help others.

Submit an Answer
4 answers

Hey petersys 👋

The service takes a single full backup every 24 hours; the write ahead log (WAL) is backed up approximately every 5 minutes, enabling point-in-time-restore operations.

If user error occurs that needs to be rolled back, the best way to accomplish this is to create a new service from a backup at a point-in-time prior to the error. Once the new service is online, the original service can be destroyed.

@abearfield, thanks, so the RPO is approximately 5 minutes.

edited by MattIPv4

For a single-node, non-HA services, it is fair to assess the RPO as approximately 5 minutes. Users who require reduced RPO should configure their Managed Database services with one or more standby nodes.

  • Can you explain why a multi-node setup would have a reduced RPO?

    Are you saying DigitalOcean ensures each node backs up its WAL at a different time, so 2 nodes => 2.5 minute RPO, 3 nodes => 1.7 minute RPO etc?

Hey @petersys

Sorry for the delay here. A multi-node setup would have reduced RPO due to standby nodes being promoted and taking over write and read operations right away. For a single-node service, a failure will cause the loss of data in the WAL that has not yet been committed to the backup service (5 min or less as outlined above). For multi-node services, any loss of data is a function replication lag. Replication to standby nodes is asynchronous but close to real-time.