Question

How much downtime to expect from single cluster DB

I’m setting up a production PostgreSQL. Before adding standby nodes for high availability, I wanted to see how much downtime could I expect in case I only keep the single cluster.

I could not find any structured reports or statistics describing the uptime/downtime for managed DB (PostgreSQL). Is there anything that I could use for my decision?

Thanks for replies


Submit an answer


This textbox defaults to using Markdown to format your answer.

You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!

Sign In or Sign Up to Answer

These answers are provided by our Community. If you find them useful, show some love by clicking the heart. If you run into issues leave a comment, or add your own answer to help others.

The amount of downtime to expect from a single cluster database (DB) can vary depending on several factors, including the type of database, the size of the data, the complexity of the database operations, and the maintenance procedures being performed.

In a single cluster database, there are inherent risks of potential downtime due to hardware failures, software bugs, and maintenance tasks. Here are some typical scenarios that can cause downtime:

  1. Hardware Failures: If the server or storage hosting the database experiences a hardware failure, it can lead to unplanned downtime until the hardware is replaced or repaired.

  2. Software Errors: Bugs, software updates, or misconfigurations can cause the database to crash, resulting in downtime.

  3. Data Import/Export: If you need to import or export a large amount of data, the database might need to be offline temporarily.

  4. Backup and Restore: Regular backups are essential, but restoring from a backup can cause some downtime, depending on the size of the database and the backup strategy.

  5. Index Rebuilds: Periodic maintenance tasks, like rebuilding indexes, can cause brief downtime.

  6. Database Updates or Schema Changes: Major updates to the database or schema changes might require temporary downtime during the process.

  7. Performance Optimization: Some performance optimization tasks might require the database to be offline briefly.

  8. Hardware/Software Upgrades: Upgrading the server or database software could involve downtime during the upgrade process.

  9. Replication Lag (For Replicated Databases): If the database is part of a replication setup, replication lag can cause some data unavailability.

To minimize downtime in a single cluster database, consider implementing high availability solutions like failover clustering, replication, or using backup strategies to minimize data restoration time. Additionally, conducting regular maintenance and testing can help identify potential issues before they lead to significant downtime.

It’s essential to plan for downtime, communicate with users or clients about scheduled maintenance, and have a well-defined disaster recovery plan in place to minimize the impact of unexpected downtime. Remember that each database environment is unique, so it’s essential to consider the specific requirements and resources available when estimating downtime for a single cluster database.

Bobby Iliev
Site Moderator
Site Moderator badge
July 19, 2023

Hi there,

All database clusters have automated failover, meaning they automatically detect and replace degraded or failing nodes.

High availability requires redundancy in addition to automatic failover. Database clusters must have at least one standby node to be highly available because standby nodes provide redundancy for the primary node.

Without standby nodes, the primary node is a single point of failure, so the cluster is not highly available.

If the primary node fails, the service becomes unavailable until the primary node’s replacement is reprovisioned. The amount of time it takes to reprovision a node depends on the amount of data being stored; larger databases require more time.

In other words, the effect of a primary node’s failure on service availability depends on the cluster configuration. Provisioning a new replacement node takes time, but failing over to a standby node is immediate.

With one standby node, the cluster is highly available and if the primary node fails, the service remains available. The standby node is immediately promoted to primary and begins serving requests while a replacement standby node is provisioned in the background.

Hope that this helps!

Best,

Bobby

Try DigitalOcean for free

Click below to sign up and get $200 of credit to try our products over 60 days!

Sign up

Get our biweekly newsletter

Sign up for Infrastructure as a Newsletter.

Hollie's Hub for Good

Working on improving health and education, reducing inequality, and spurring economic growth? We'd like to help.

Become a contributor

Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.

Welcome to the developer cloud

DigitalOcean makes it simple to launch in the cloud and scale up as you grow — whether you're running one virtual machine or ten thousand.

Learn more
DigitalOcean Cloud Control Panel