Which is the right method to manage huge number of files "without" going to specific cloud solution?

May 28, 2015 825 views
Clustering Node.js

Hi, I'm working in a project site which will manage thousands of files (images and pdfs basically). I'll deploy the site in the coming weeks after 1 year working in it. Of course, the I'll deploy the site with Digital Ocean, actually a beta version is already up & running.

I wonder which is the right method to scale the site as the need of files storage increases (I plan 4/5 Gb monthly).

Yes, I know, I've checked Azure, RackSpace... but these cloud solutions includes geo-redundancy (and other stuff that my solution, at this phase, does'nt need) and complicates the development (in the sense that it has to be "attached" to those specific solutions).

Moreover, these services (File Storage services) fit better for CDNs, in which is difficult to differenciate between "public" files and "private" files.

The site is fully developed with NodeJS.

Because it will be easy to put in place (and cheap) I'm thinking just to comission a new Ubuntu server with 20Gb / 30 Gb SSD space and create a private network to access the files using NFS. This could be the first approach (and surely is not the best).

What's you opinion? Someone with experience in systems similar to this?

I'd appreciate some help!

2 Answers

have you looked into amazon s3 or google storage?

they both have APIs to access files

It's worth noting, similar to wiak's answer that you can indeed use other storage providers, Azure's Blob storage can be used via API as well. I've been considering writing some apps using Azure Tables for example, though my concern there is latency, less of an issue for large file storage.

They all have some advantages, and disadvantages... I think DO is a much more cost-effective option for VMs of the applications, I wish they had SaaS offerings similar to S3/Azure/Google storage as well as RDS/SQL

I'm considering using Azure blob storage and Azure SQL with DO myself, though my concern is latency for the SQL instance...

Have another answer? Share your knowledge.