Contents of ETag returned from Spaces isn't always MD-5

February 26, 2018 361 views
API Object Storage
AlephNull
By:
AlephNull

https://developers.digitalocean.com/documentation/spaces/#list-bucket-contents says that the returned ETag contains a MD5 hash of the object. I've done some checking and whilst in most cases, the ETag contains the expected hex-encoded MD5 hash, about 20% of my objects have ETags in a different format - eg "6f60ac7c40a8e9c4408f89653c2243dc-2" or "4ee6bae1874dd445140acb8f8c4241f2-261", which bears no resemblance to the expected MD5 hash. The returned ETag's are consistent from one listing to another.

Can someone explain why these tags aren't in the documented format and how to get List Bucket Contents to return the expected MD5 hash.

1 Answer
asb MOD February 26, 2018
Accepted Answer

Thanks for flagging this for us. It does look like the docs are missing a bit of detail here and need an update. Spaces was designed to be compatible with the S3 API, and its usage of ETag is consistent with S3's behavior. Quoting from the S3 documentation:

The entity tag is a hash of the object. The ETag reflects changes only to the contents of an object, not its metadata. The ETag may or may not be an MD5 digest of the object data. [ ... ] If an object is created by either the Multipart Upload or Part Copy operation, the ETag is not an MD5 digest, regardless of the method of encryption.

So if it's not an MD5 digest, what exactly is it? Let's take a look at a real example. Here's the HEAD for an object stored in Spaces:

$ aws s3api --endpoint-url https://nyc3.digitaloceanspaces.com head-object --bucket example-bucket --key demo-h264.mov
{
    "AcceptRanges": "bytes", 
    "ContentType": "video/quicktime", 
    "LastModified": "Mon, 11 Dec 2017 16:30:22 GMT", 
    "ContentLength": 800032767, 
    "ETag": "\"abca46f3fae1b698571c0f08b98618e1-96\"", 
    "Metadata": {}
}

The ETag is abca46f3fae1b698571c0f08b98618e1-96 This is made up of two pieces, the value before the hyphen and the value after. The latter means that the object was uploaded using a multipart upload consisting of 96 parts. Each of these parts was 8MB large. To get the first value, the MD5 of each of the 96 parts are concatenated in binary format, and then the MD5 of that is taken.

This bash script is a useful tool for calculating and verifying the ETag for an object:

Looking at our object again, we can calculate what the ETag should be:

  • ./s3md5 8 demo-h264.mov
  • abca46f3fae1b698571c0f08b98618e1-96

Or verify the existing one:

  • ./s3md5 -e abca46f3fae1b698571c0f08b98618e1-96 8 demo-h264.mov
  • TRUE
Have another answer? Share your knowledge.