Hey! I was using DOKS for 1,5 years and in general it was super experience, but for last 3 month I got 2 major downtimes for hours: problems with master/network on DO side.

I see gapes in metrics on insights page, but cant connect to master, it fails due to different network reasons.

I contacted the support, but it takes a lot of time to respond on weekend time. And there are no knobs I can move to do anything at this time. Funny thing, test cluster on the same area works find :(

I was thinking about what can I do and only have 3 thoughts:

  1. Create new cluster and restore from backups
  2. Go into HA with another provider and double the cost.
  3. Ask DO to add metrics to detect such things more reliably.

How do other people deal with such problems?

Submit an answer

You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!