The first question I’d ask is why is the 5s marker that important? I ask as you’ve not provided much to go on as far as what your use case is and why it requires such quick processing time.
When it comes to image processing, and the speed at which images are processed, CPU is going to be the limiting factor. The size of the file being processed matters – larger files will take more CPU to process, thus resulting in a fluctuating variable that’s relatively difficult to account for due to the fact it’s not a constant.
If, for example, we knew every single PDF was 5MB and we can process 250 files per second, we’d then know that we could quickly spin up 5x 48GB Droplets and offload 250 images to each one. As this would be a very special case, it’s not all that feasible as I’m sure each PDF is not 100% identical.
What We Know
As per your test, you’re processing 1,000 images in 60 seconds, or one image every 0.06 seconds.
To put conversion in to perspective, the above test is converting:
1,000 = 60 seconds
500 = 30 seconds
250 = 15 seconds
125 = 7.5 seconds
62.5 = 3.75 seconds
Images per Second
That means you should be able to do ~83 images in 5 seconds (4.98 seconds to be exact) on that the Droplet w/ 16 CPU Cores, which equates to offloading the images to 12 Droplets at the same time and of the same spec to reach your desired 5 second processing time.
This, however, doesn’t include the time it’ll take to pull them down from these servers, back to either your location or your main Droplet.
You’d need to factor the pull down time in as well.
Alternative to Convert
You can install
poppler-utils and use
pdftoppm to convert, though there’s no option to combine the pages if each PDF has more than one page, so you’d need to pull multiple images down as soon as the conversion process is complete.
That being said, converting images at that speed is going to need multiple servers and actually getting that speed is going to depend on numerous factors (size is one of the main). The good news is that you can keep costs in line by using the DigitalOcean API to spawn the Droplets for processing.
Connecting to the API, you can deploy on-demand, run your tasks, and then destroy the Droplet so you’re not incurring recurring costs by keeping the Droplets alive for the entire month.
As far as how you’d connect, it really depends. You can access the API using any programming language you’d like – PHP, NodeJS, Ruby, etc.
How you’d really set something like this up to achieve the desired results really depends on how you need to do this, so knowing more would be ideal as right now suggestions are only that :-).