Question

How do MongoDB standby nodes work?

I have a managed MongoDB server with two failover nodes. The promo says: “Add redundancy with standby nodes that start serving requests if your primary node fails.”. What’s meant by “primary node fails”? What exactly should happen so that secondary nodes start working? Previously, we had accidents when because of unoptimized queries and too many concurrent users, our own MongoDB server had 100% CPU load and was unoperable. Is it the case?


Submit an answer

This textbox defaults to using Markdown to format your answer.

You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!

Sign In or Sign Up to Answer

These answers are provided by our Community. If you find them useful, show some love by clicking the heart. If you run into issues leave a comment, or add your own answer to help others.

Want to learn more? Join the DigitalOcean Community!

Join our DigitalOcean community of over a million developers for free! Get help and share knowledge in Q&A, subscribe to topics of interest, and get courses and tools that will help you grow as a developer and scale your project or business.

Hello,

I recently answered a similar question here:

https://www.digitalocean.com/community/questions/how-standby-node-work

Basically in the event of a failure, Managed Databases will automatically switch data handling to a standby node to minimize downtime.

The data between the nodes is being replicated using SQL replication.

All database clusters have automated failover, meaning they automatically detect and replace degraded or failing nodes.

With one standby node, if the primary node fails, the service remains available. The standby node is immediately promoted to primary and begins serving requests while a replacement standby node is provisioned in the background.

If both nodes fail simultaneously, the service becomes unavailable until at least one of the nodes is reprovisioned.

To increase the stability further, you could have two standby nodes, that way the cluster would be highly available and very resilient against downtime.

Even if two nodes fail simultaneously, the service remains available while two replacements are provisioned in the background.

The service only becomes unavailable in the unlikely event of all three nodes failing at the same time.

In your case, the server was under high CPU load but it did not crash. What I could suggest in such cases is to indeed, as you mentioned optimize any slow queries and also add more CPU power if the CPU usage is still too high.

Hope that this helps.

Best,

Bobby