Is a backup transactionally consistent?

November 29, 2018 664 views
Backups Databases Ubuntu

Assume a server is busy processing transactions (e.g. MySQL on Ubuntu).

When the DO backup is taken is it guaranteed to maintain transactional (ACID) consistency within the database files?

In other words, if I recover from a backup will it represent a valid database (of some time ago) or will the block copy process sometimes update the log but not the buffer-cache spill or vice versa?

Happy to lose some data, I just need consistent data files for a historic point-in-time :-)

Lastly, is the same true for DO snapshots?

Thanks.

1 Answer

Hey friend,

This is a great question, thank you for asking it here so that others can see the answer. The easy answer is that you should never trust a live disk backup of a running database. The more complicated answer is “maybe.” This is a bit old, but it’s my favorite document for referencing a bit of how we do backups/snapshots:

https://kashyapc.fedorapeople.org/virt/lc-2012/snapshots-handout.html

Backups and snapshots are generally more consistent with the use of overlays, and it’s more likely that your MySQL database is fine. That said, it’s still far too likely that something like InnoDB corruption occurs on the backup image during a process like this (if the MySQL server is running). For this reason, I highly recommend that you regularly make use of mysqldump to make static backups of your database that you can restore. Keep a copy on your server, but also send a copy off somewhere else (Dropbox, S3, whatever).

While I trust our backups, it’s important to assume that they will fail and be useless. Two backup strategies is a good measure, three geographically diverse backups mean you’ve seen hard times and never want to again ;)

Hope that helps!

Jarland

  • Excellent response :-)

    Thank you very much - it helps me plan my diaster recovery strategy so much better.

    Just on the off-chance that you might have experience in the area: how long might I expect a mysqldump to take on a 200Gb database? Not after exact figures, I know every database is different, and I’ll never hold you to it, but are we talking minutes, hours or good chunks of a night window? My environment would likely be a 320GB DO server with an attached 200-300GB SAN volume for the backup (I think DO calls them “Block Storage”) so no local storage for the backup ;-)

    • That’s a large database. I’m thinking that might take up to 30 minutes, off the top of my head. Part of me is wondering if master-master replication to a second server might be of value, because you could run mysqldump on the second server then without harming performance on the first during the process.

      • 30-60 mins (each week) sounds pretty reasonable to me.

        I’m actually after DR before true high-availability [i.e. some downtime is acceptable]. If a future database drifts into the TB region (and several hours of recovery time) I think I might implement your master-master replication idea and do the physical backup from the 2nd master.

        Thanks again.

Have another answer? Share your knowledge.