Report this

What is the reason for this report?

DOKS reliability?

Posted on February 13, 2021

Hey! I was using DOKS for 1,5 years and in general it was super experience, but for last 3 month I got 2 major downtimes for hours: problems with master/network on DO side.

I see gapes in metrics on insights page, but cant connect to master, it fails due to different network reasons.

I contacted the support, but it takes a lot of time to respond on weekend time. And there are no knobs I can move to do anything at this time. Funny thing, test cluster on the same area works find :(

I was thinking about what can I do and only have 3 thoughts:

  1. Create new cluster and restore from backups
  2. Go into HA with another provider and double the cost.
  3. Ask DO to add metrics to detect such things more reliably.

How do other people deal with such problems?



This textbox defaults to using Markdown to format your answer.

You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!

These answers are provided by our Community. If you find them useful, show some love by clicking the heart. If you run into issues leave a comment, or add your own answer to help others.

Heya,

Just came across this answer and decided to write some general guidelines for anyone who comes across this in the future despite the old question.

If your master node is experiencing network issues, it’s a serious problem that needs to be addressed. Depending on your application/cluster configuration and availability needs, all three of your solutions could be viable. Here’s a bit about each:

1. Creating a new cluster and restoring from backups:

This could potentially address the current master node issue, since a new cluster would have a new master node.

2. High Availability (HA) with another provider:

This could increase your availability at the cost of complexity. This would involve setting up a multi-cloud Kubernetes system.

3. Asking DigitalOcean to add metrics:

In monitoring your Droplets, additional metrics could potentially help diagnose issues more quickly in the future.

I would suggest further reviewing these options and considering what works best for you based on your needs and resources. For future reference, you can also check the DigitalOcean status page for any ongoing incidents that might be affecting your services.

Also, while we aim to provide fast and effective support 24/7, responses can sometimes take a bit longer during peak times. We always suggest to provide as much detail as possible in your initial request to speed up the troubleshooting process.

Hope that this helps!

The developer cloud

Scale up as you grow — whether you're running one virtual machine or ten thousand.

Get started for free

Sign up and get $200 in credit for your first 60 days with DigitalOcean.*

*This promotional offer applies to new accounts only.