Lots of unexplained errors with DO Spaces / object storage

December 28, 2017 237 views
Object Storage Ubuntu 16.04

Is DO Spaces stable and production-worthy? I get so many unexplained, seemingly random errors that I am beginning to worry about the bet I made on it.

There merely annoying ones are that some objects seem to become undeletable. I can delete several thousand items with similar keynames -- either programmatically or with the web page -- and inevitably, it seems, a few will remain, and nothing can remove them.

Much more annoying, Spaces will just stop responding for some period of time. Even commands like "list objects" (performed with boto/python) return errors. If I have multiple processes going, they all start erroring at the same time. This is so substantial a problem that I wonder what the Spaces uptime is, and would love to know.

Then, tonight, a stable server started throwing errors to PutObject and DeleteObject. It's been going on for hours, and there is nothing in the DO status system (which has been very slow, in my experience, to report that problems are occurring). My application is toast, as a result.

So I'd be interested to know if my experience is unusual. If it is, I'll stick with Space for awhile longer, but my sense is that it's just too undependable to be useful. (I've been using Spaces since the last week of the beta, and it seems to be getting worse, not better.)

Thanks.

1 comment
  • That's my experience as well. 503/504 errors appearing in batches, even the DO spaces page reporting errors. Batch operations will get it to that state regularly, seemingly the worst one is deleting objects. But it also happens just when getting objects - perhaps someone else is killing it at that time by updates. No DO status indication of any problems.

2 Answers

More information on the PutObject/DeleteObject errors. These were occurring with every call to the Spaces interface, over a period of a day. I had only one Space at this time. I created a new space, and the same code elicits no errors when used on the new Space.

So, somehow, one Space got so corrupted that no programmatic calls on it worked. I suspect this corruption occurred as a result of deleting millions of objects in a short time using multple simultaneous processes.

If there are limits on certain actions that Spaces tolerates, it would be useful to know them.

Yes I get lots of timeout errors. I opened a ticket with steps to reproduce, and they assured me they are working to make it more reliable. I am looking forward to that. Currently I would not recommend it for production. I'm just using it for backup currently which can tolerate a few timeouts.

Have another answer? Share your knowledge.