Question

SFO Blues - why were droplets not restarted automatically?

It appears that the recovery from the recent network/power problems in SFO2 had a hiccup.

I fully expect my droplets to go down due to hardware problems from time to time. However it would be really, really, really nice if the droplets were automatically powered back on when the problem has been resolved.

Looking through the history of my droplets, I see that there was an action, not initiated by me, to power on my droplets. Presumably this was part of an automated recovery process initiated by DO once SFO2 was operational again. The history from my droplets said: Action did not complete indicating that the recovery failed.

Once I “arrived at work” I found email from customers reporting the network problem (all my droplets were offline.) I went to the DO site, logged in and manually powered the droplets on and everything recovered quickly. I feel this step should have been accomplished by DO automatically.

It appears (to me) that there is a flaw in the DO recovery from this type of problem - there was an attempt to power up the droplet, which failed, and another attempt was not made. I would have been happy with an other attempt being tried in 1/2 hour or an hour - better than simply abandoning the recovery of the customers droplets?

So, finally, a question:

  1. does anybody at DO recognize this as a problem?
  2. could someone at DO acknowledge that the recovery process is being examined and will be improved?

Hey, I’m a software developer too, problems happen. But this may be an opportunity for DO to improve the recovery process to make next time a little less painful?

Thanks.


Submit an answer


This textbox defaults to using Markdown to format your answer.

You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!

Sign In or Sign Up to Answer

These answers are provided by our Community. If you find them useful, show some love by clicking the heart. If you run into issues leave a comment, or add your own answer to help others.

Accepted Answer

DO just posted a report detailing this incident.

In it they state:

20:15 UTC - All Droplets and services fully restored.

I restarted my droplet at approximately 18:00UTC. It appears that if I had waited a little longer, DO would have restarted the droplet for me and my cluster would have come up, recovered and began providing service again.

That’s awesome, and was the answer I was hoping for.

Thanks DO!

Honestly working with various VPS providers over the years, most commonly a VPS will return to it’s last working state (so if online -> online) however in failures like power its a fresh boot, and 90% of the time you have to bear in mind you are a slice of cake. You can’t serve all cake at once, you must cut it, serve some to the kids, cut some more for the old folks, and last by not least cut up the last and serve it to the rest of the guests.

I know with other companies as well as with DO mine wasn’t returned to on, heck I couldn’t even see my droplet at first, kicked into doctl and issued a power up from my console at home, and was back online quickly.

If you are a software dev, look at the API, maybe write something that queries your droplet state, and if offline after 2-3 checks it issues a power-on? With major faults like this though, sometimes honestly? Better to wait and let them bring up load carefully vs. entering into a laggy window of online, mine crawled to a start, but it did start.

Just another customer’s 2-cents. API though, highly suggest a look.

Try DigitalOcean for free

Click below to sign up and get $200 of credit to try our products over 60 days!

Sign up

Get our biweekly newsletter

Sign up for Infrastructure as a Newsletter.

Hollie's Hub for Good

Working on improving health and education, reducing inequality, and spurring economic growth? We'd like to help.

Become a contributor

Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.

Welcome to the developer cloud

DigitalOcean makes it simple to launch in the cloud and scale up as you grow — whether you're running one virtual machine or ten thousand.

Learn more
DigitalOcean Cloud Control Panel