Report this

What is the reason for this report?

DigitalOcean Spaces: cannot download directory with more than 1000 files

Posted on December 12, 2019

My goal is to recursively download a directory from DigitalOcean spaces to local machine (nothing fancy). If I attempt to use AWS CLI for the job, aws s3 commands download only first 1000 files from any directory that contains more than 1000 files.

To reproduce:

  • Create directory test1 with 1007 files locally
  • Upload this directory to spaces: aws s3 cp --recursive --endpoint=https://reg1.digitaloceanspaces.com s3://mybucket/test1/ ./test1/
  • Upload works as expected (uploads all 1007 files)
  • Try to download this directory locally: aws s3 cp --recursive --endpoint=https://reg1.digitaloceanspaces.com s3://mybucket/test1/ ./download1/
  • Downloaded directory contains only 1000 files

Adding argument --page-size=500 to aws s3 cp command, downloads only first 500 files, so it downloads only the first page. Reproduced on Linux, macOS, multiple AWS CLI, multiple Python versions.

Is downloading entire bucket or directory not possible with AWS CLI? This seems like a very basic feature that should just work.



This textbox defaults to using Markdown to format your answer.

You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!

These answers are provided by our Community. If you find them useful, show some love by clicking the heart. If you run into issues leave a comment, or add your own answer to help others.

Facing exact same problem… This is pretty critical, any ideas?

This is my command: aws s3 sync s3://bucket-a s3://bucket-b --endpoint=https://fra1.digitaloceanspaces.com

Only copies over 1000 files… of over 30 million.

My hunch is that it’s related to the default behavior of the bucket list operation which is to only list 1,000 files.

https://developers.digitalocean.com/documentation/spaces/#list-bucket-contents

If I try to use s3cmd to accomplish the same goal (recursively download a directory with many files), I experience error: http.client.RemoteDisconnected: Remote end closed connection without response. Single empty file is created and this error is displayed after around 5 minutes; also memory usage grows to almost 1GB during the failure. This happens on a standard DO Ubuntu 18.04 droplet with around 500k files in the target DO Spaces directory.

Right now I have not found any solution to download directory with many files from DO Spaces locally with a standard tooling like aws cli, s3cmd. So no easy way to backup locally or migrate to another service. I am surprised nobody has mentioned these issues previously.

The developer cloud

Scale up as you grow — whether you're running one virtual machine or ten thousand.

Get started for free

Sign up and get $200 in credit for your first 60 days with DigitalOcean.*

*This promotional offer applies to new accounts only.