My goal is to recursively download a directory from DigitalOcean spaces to local machine (nothing fancy).
If I attempt to use AWS CLI for the job, aws s3 commands download only first 1000 files from any directory that contains more than 1000 files.

To reproduce:

  • Create directory test1 with 1007 files locally
  • Upload this directory to spaces: aws s3 cp --recursive --endpoint=https://reg1.digitaloceanspaces.com s3://mybucket/test1/ ./test1/
  • Upload works as expected (uploads all 1007 files)
  • Try to download this directory locally: aws s3 cp --recursive --endpoint=https://reg1.digitaloceanspaces.com s3://mybucket/test1/ ./download1/
  • Downloaded directory contains only 1000 files

Adding argument –page-size=500 to aws s3 cp command, downloads only first 500 files, so it downloads only the first page.
Reproduced on Linux, macOS, multiple AWS CLI, multiple Python versions.

Is downloading entire bucket or directory not possible with AWS CLI? This seems like a very basic feature that should just work.

1 comment
  • Uhh, how can you implement an “S3 compatible” option and call it PRODUCTION when it only lists 1000 files? This is a horribly crippling issue.

    Where is someone from DO on this? This should never have rolled out live. This breaks so many usage models.

These answers are provided by our Community. If you find them useful, show some love by clicking the heart. If you run into issues leave a comment, or add your own answer to help others.

×
6 answers

Facing exact same problem… This is pretty critical, any ideas?

This is my command:
aws s3 sync s3://bucket-a s3://bucket-b –endpoint=https://fra1.digitaloceanspaces.com

Only copies over 1000 files… of over 30 million.

Same issue here. This is pretty nasty problem

My hunch is that it’s related to the default behavior of the bucket list operation which is to only list 1,000 files.

https://developers.digitalocean.com/documentation/spaces/#list-bucket-contents

hi! any news about this from DO?

If I try to use s3cmd to accomplish the same goal (recursively download a directory with many files), I experience error: http.client.RemoteDisconnected: Remote end closed connection without response. Single empty file is created and this error is displayed after around 5 minutes; also memory usage grows to almost 1GB during the failure. This happens on a standard DO Ubuntu 18.04 droplet with around 500k files in the target DO Spaces directory.

Right now I have not found any solution to download directory with many files from DO Spaces locally with a standard tooling like aws cli, s3cmd. So no easy way to backup locally or migrate to another service. I am surprised nobody has mentioned these issues previously.

Running into the same issue, this is very frustrating since all s3 operations will be silently capped at 1000 files max.

I assume this is mentioned as a ‘known issues’ at https://www.digitalocean.com/docs/spaces/

In the API, list-objects-v2 pagination does not work.

It basically means do not use DO spaces for anything serious

Submit an Answer