Does anyone know how a superblock can be fixed?

May 19, 2015 1.1k views
DigitalOcean Backups

Our droplet got corrupted and DO support seems to be completely washing their hands ...

3 Answers


I have the same issue here, any help would be appreciated:
My droplet's filesystem got corrupted after migration of hypervisor by DO. DO's support said they're sorry but can't restore anything as they don't have any backup! (first time I see a hosting provider performing a serious maintenance task without making a backup beforehand)

I've been told to do the following:
1.) Power off your droplet and change your kernel to the DO-recovery-static-fsck image.
2.) Boot your droplet and access it via the console in the control panel.
3.) Run the command "fsck /dev/vda"
4.) Answer yes to any requests to repair errors
5.) Power off your droplet, change the kernel back to the original, and reboot

=> no success

Then the following:
=> no success since none of the command like fdisk were available.

Then the same with a recovery ISO mounted on my droplet
=> no success

DO tells me to restore my droplet from a snapshot which I don't have. My backup-recovery strategy was relying on having a proper file system, and I would have expected DO to at least provide a DR plan which doesn't exist (and apparently not even for the backup and snapshots i.e. if a DO engineer crashes your system it's your problem).

I suspect it's only the partition table that is corrupted but by raw data is still there.
I just need to access some directories (mysql and data) even in raw / lost+found format.

Any idea?

I just can't give up, the loss in terms of time and data is catastrophic.

Thank you for your help.

I tried all those and other actions, including restoring from backup and creating a new droplet from a backup converted into a snapshot, but all failed. Something in the underlying file structure that has always been corrupted. So I am looking at ways to repair it.

This morning I received this message: "We've had to reboot your Droplet due to an issue on the underlying physical node where the Droplet runs. We are investigating the health of the physical node to determine whether this was a single incident or systemic." When did your problem occur?

We are in the same boat with unhappy schools, ambulances and other customers. Being able to rely on the underlying hardware platform of a hosted service together with bandwidth availability is the essence of any hosted service.

The problem here was that your expectations weren't properly managed, here. I recently changed positions from managing many machines over what the owner confirmed was the largest contiguous single-owner inter-net in the world. Neat.

Anyway, you unfortunately believed that your backup and disaster recovery was being handled, because it may not have been stressed enough for you to fully understand that it was NOT. Don't be upset about that, as I've seen it many times.

When all is said and done, you need a backup system you can rely on, and you need a DR plan that works. When you tested your backups and your DR, as per every best-practice document ever written, you will have noticed problems like Not having a backup, or NOT having a redundant copy of data in case of disaster. You should have been made more aware of these issues as they came up in testing, and it's not your fault you didn't know.

Identify who told you your backups and DR were being taken care of, every time you asked as a conscientious follower of best-practice and standards. When providing service and support for ambulances and other emergency crews, and your IT staff was assuring you that, as per the SLA you agreed to with the emergency services, that all their needs were being met by the provider and that you had nothing to worry about, that's the crew that I'm talking about. It's okay not to know everything all the time, but you obviously trusted your own internal staff who WILL have been told what services you don't get, and who didn't stress it well enough to you, and your trust was badly placed.

Fire your IT staff who lied to you. Digital Ocean will have stressed this well enough, and it's only your IT staff who didn't listen. It's there where the blame lies, and all the recovery pain and expense and risk to life and limb, it's all on your internal IT staff's shoulders. Terminate those staff members with cause, because as responsible management if they raised the issues and you didn't address them, you wouldn't be complaining today; you're better than that.

Have another answer? Share your knowledge.