Snapshot without powering off server

Question

With this new feature to take snapshot without shouting down server first, is there some downside effects for doing that?

Is snapshot save current status of droplet in moment we press button?

Is there some downgrade of website speed in process of taking snapshot without powering off droplet first?

Jonathan Tittle · Accepted Answer

@Sunka

Thanks for the link! I must have overlooked that when I was browsing over Facebook last night :-).

I just ran a live snapshot on a 512MB / 1 CPU Droplet with Ubuntu 16.04 64-bit and the only thing that I noticed popping up was a process for /usr/lib/snapd/snapd which utilized 0.00% CPU and 2.3% RAM – the result was a 2.88GB snapshot. The overall load, however stayed at 0 with no recorded I/O wait.

I then ran a second and third snapshot while simulating traffic to the droplet using loader.io to drive in around 3000 requests over 60 seconds to an index.php file which ran a single MySQL query using a prepared statement. The results of the query was then dumped using var_dump( $results ); to simply push the output to the browser.

The web server used was PHP’s built-in web server, which was ran using:

php -S 0.0.0.0:80 -t .

Overall, the only load placed on the server was that from the incoming requests, which would have, if using a more optimized web server, most likely been lower. With the 3,000 incoming requests over a period of 60 seconds, the server load never shot above 0.10 even with the snapshot being taken in the middle of the traffic spike.

One final test was ran, using the same index.php file, but this time with 8,000 requests over 60s. The server load shot up to ~0.20, though as above, that was easily from the incoming traffic and the number of requests in such a short period of time. The CPU usage by the process remained 0.00% while the RAM actually dropped to around 1.6%.

After running a total of four snapshots, the only thing I noticed was that the fourth took a little longer to generate. While the first three took only a few minutes, the fourth took an extra few minutes, which could be the backend accounting for the server load (even though the snapshot didn’t generate any load that could be picked up by htop or top), or it could be the backend was simply processing more requests when I was generating the fourth which slowed it down a bit.

Overall, if we cached the query, ran a real web server (NGINX, Caddy, Apache) and cached the incoming request, and then multiplied the results by a factor of 10x (thus, we’d be running at least 10 similar queries – around the number of queries a base WordPress installation runs), the server load would still be as a result of the incoming traffic and not the snapshot utility, of which seems to be pretty well optimized.

Additionally, the snapshot utility seems to be pretty well designed to intentionally not to interfere with the normal operation of the Droplet or the services running (as PHP’s web server and MySQL never went down during the spikes or during the creation of the snapshots).

–

That being said, this was a simple baseline test (very simple), so I would encourage you to run your own tests using an environment that functions as a duplicate of your live environment so that you can gather your own testing data using.

You can sign up for a free account @ loader.io and run up to 10,000 requests per test for free, which should give you a general idea of what you can expect. I wouldn’t recommend pounding the VPS w/ back-to-back sieges of 10,000 requests over 60 seconds as your VPS may get flagged for the surge of duplicate requests from such a small range of IP’s (if the IP’s aren’t blocked beforehand), but a few small tests to allow data gathering shouldn’t be an issue.

–

As for side effects…

A potential and probable scenario could play out if your application uses a database, whether MySQL, MariaDB, Percona, or a NoSQL variant – or even Redis, Memcached or another memory store.

In such a scenario, my concern would be writes – incomplete writes to be specific. Since the database service, by default, isn’t shutdown during the creation of the snapshot, this means that there’s a good chance that only part of whatever is being written at the time of the snapshot being taken will actually be inside the snapshot.

This is an issue as you definitely don’t want to end up having to restore from the snapshot only to find that X or XX+ number of users created an account but the application wasn’t able to get the password set before the snapshot ran.

In such a case, if the username was set but the password wasn’t, and someone with less than good intentions decides to run a dictionary attack on usernames and the test accounts for testing blank passwords, but your application doesn’t, the attacker would be able to easily gain access as whatever user returned as a hit.

Of course, in most modern applications, testing for blank passwords and using a transaction of sorts is common, thus if the insert isn’t 100%, it’s not written at all, so the risk of writing a partial is reduced and the chances of a blank password being accepted is minimal. Still, it is a possible scenario and a real one at that.

Sunka · Answer

Thanks @jtittle I am going to take snapshots like I did before. Turning server off before taking snapshot, it is safer

Jonathan Tittle · Answer

@Sunka

Taking a snapshot of your Droplet currently requires that it first be powered down, which means a few minutes of downtime will result. Depending on the size of your Droplet, it could take more than just a few minutes. The reason for this is because snapshots are capable of restoring the Droplet from the moment in time that the snapshot was taken, in its entirety.

That being said, powering down the Droplet will result in the state of the Droplet being preserved at the time of the snapshot being taken. Since the Droplet will be down, no reads, writes, requests, etc will be handled by the Droplet, so the state your Droplet is in prior to powering down is the state it will retain in the snapshot.

Although I’ve not personally experienced this, such state could pose an issue when it comes to your databases (i.e. incomplete writes). As a precaution, ideally, you’d scan the database for such data on power up and in the event you need to restore from that snapshot to ensure such incomplete data doesn’t result in a user, or group of users, being able to access something they would have otherwise been prevented from accessing had the write been 100% successful.

In all reality, something like this would be handled at the application level so that the write never gets that far, though not all applications are designed equal, so it’s definitely worth noting!

Report this

Snapshot without powering off server

Become a contributor for community

DigitalOcean Documentation

Resources for startups and SMBs

Get our newsletter

The developer cloud

Get started for free