By Aparna Prabhu and Anish Singh Walia
Snapshots are one of the most powerful features in modern cloud storage, providing point-in-time recovery, instant rollbacks, and near-zero downtime backups. However, their convenience can lead to misuse, transforming a useful safety net into an unexpected cost sink that strains both billing fairness and backend efficiency.
Most cloud providers price snapshots significantly lower than the primary storage they replicate, such as block storage volumes or network shares. This pricing difference often leads users to treat snapshots as cheap, long-term storage rather than a disaster recovery tool, creating inefficiencies and billing distortions over time.
Understanding how copy-on-write snapshots work under the hood is crucial for recognizing and preventing misuse patterns that can impact both cost and performance.
By the end of this tutorial, you will have:
Most modern storage systems, including those built on platforms like VAST, use copy-on-write (CoW) for snapshots.
How Copy-on-Write Works:
This model makes snapshots fast and space-efficient, but also deceptively cheap, leading to misuse patterns that can accumulate significant hidden costs over time.
In practice, users frequently take snapshots and fail to clean them up, either intentionally or unintentionally. Over time, snapshots evolve from a disaster recovery feature into a hidden, low-cost storage tier that quietly accumulates data.
Common Misuse Scenarios:
Snapshots of volumes or file shares are often used to archive static data (logs, ML models, datasets) that should have been offloaded to object storage instead.
Six Primary Misuse Patterns:
High Snapshot Density: Large number of snapshots tied to a single share or volume.
Real-world example: A developer schedules hourly snapshots “just in case,” creating dozens of copies that barely differ within days.
Aging Snapshots: Many snapshots remain active long after their creation.
Real-world example: Teams keep snapshots from past deployments or experiments for months, even after they’re obsolete.
Frequent Access: Snapshots that are read or mounted often, behaving like live data.
Real-world example: Engineers mount snapshots to serve read-only datasets or old environments, effectively using snapshots as production data.
Unlimited Cloning: Users create new shares or volumes from existing snapshots repeatedly.
Real-world example: A single snapshot becomes the source for dozens of derived environments, all referencing the same underlying data blocks.
Lifecycle Avoidance: Instead of deleting data, users preserve it indefinitely through chained snapshots.
Real-world example: Users snapshot a resource before every cleanup, making deletion nearly impossible without manual intervention.
Usage Drift: Backend storage growth that exceeds billable allocations.
Real-world example: Each snapshot preserves old blocks. Even if your share shows 500 GB in billing, the system may be tracking far more data due to unreclaimable blocks from older snapshots.
System metrics and monitoring can reveal when snapshots are being used beyond their intended scope, helping identify these patterns before they become costly problems.
Without a cap, users can accumulate hundreds of snapshots per resource. The effects multiply over time:
For example, even if snapshots are billed individually, an excessive number can still delay reclamation of old blocks and inflate backend costs.
Resizing storage upward is usually safe. But allowing resizing down creates an easy loophole:
A user could provision a large share (say, 2 TB), fill it with data, take a snapshot, and then shrink it to 500 GB.
The snapshot still references the original 2 TB of blocks, which the system can’t reclaim but the user now pays for only 500 GB of live storage.
This behavior effectively turns snapshots into free cold storage. Preventing downward resizing ensures allocation and usage remain aligned.
Imagine a user with a 1 TB share who takes 10 snapshots, then resizes down to 200 GB.
In a usage-based model, they pay only for 200 GB even though 1 TB of blocks remains pinned.
Left unchecked, snapshot misuse can strain both billing fairness and backend efficiency.
A balanced approach involves three complementary strategies:
Strategy | Implementation | Benefit |
---|---|---|
Allocation-Based Billing | Bill users based on total physical allocation, not just live share size | Aligns cost with actual resource usage and prevents billing distortions |
Prevent Downward Resizing | Block share shrinking once data has been written and snapshotted | Prevents users from getting free cold storage by resizing down after taking snapshots |
Snapshot Limits | Set reasonable caps on snapshots per resource (e.g., 10-50 snapshots) | Discourages hoarding and enforces good hygiene practices |
Implementation Benefits:
Together, these mechanisms prevent snapshot misuse while keeping the system predictable and fair.
For example, if a customer creates multiple snapshots of a large dataset, the allocation-based model ensures they continue paying proportionally for the underlying data retained by snapshots. This discourages storing long-term, read-only data in snapshots instead of object storage.
Allocation-based billing can sometimes feel unintuitive to users, since charges may not immediately drop after deleting snapshots, the system reclaims space gradually as blocks are dereferenced. It can also increase perceived costs for legitimate heavy snapshot users. However, the transparency and fairness it brings to long-term storage management often outweigh these challenges.
Snapshot misuse occurs when users treat snapshots as a cheap, long-term storage solution rather than a disaster recovery tool. This includes creating excessive snapshots, using them for data archiving, or keeping them active long after they’re needed. Common patterns include high snapshot density (many snapshots per resource), aging snapshots that remain active for months, and using snapshots to store static data that should be in object storage.
Copy-on-write (CoW) snapshots work by freezing metadata that references existing data blocks at the moment of creation, without actually duplicating data. Only when blocks change or get deleted do snapshot references prevent the system from reclaiming those blocks. This makes snapshots fast and space-efficient initially, but can lead to hidden costs as the underlying data grows and snapshots prevent block reclamation.
Snapshot misuse can create significant billing distortions where users pay for only the active share size while the system tracks much more data due to unreclaimable blocks from older snapshots. For example, a user might resize a 1TB share down to 200GB after taking snapshots, but still have 1TB of blocks pinned by those snapshots, effectively getting free cold storage while only paying for 200GB.
You can prevent snapshot misuse by implementing the following strategies:
These steps help align costs with actual usage, prevent billing distortions, and encourage good data management practices.
Snapshots are point-in-time copies of data that use copy-on-write technology and are primarily designed for quick recovery and rollbacks. Backups are complete copies of data stored separately, often in different locations. While snapshots are fast and space-efficient, they shouldn’t replace proper backup strategies for long-term data retention, especially for compliance or archival purposes.
Snapshots are indispensable for resilience, but their convenience can invite misuse if not managed thoughtfully. By combining allocation-aware billing, restricting downsizing, and capping snapshot counts, storage platforms can strike the right balance between flexibility and fairness, thereby ensuring snapshots remain what they were always meant to be: a safety net, not a storage tier.
To learn more about snapshots and disaster recovery, check out the following tutorials:
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
I help Businesses scale with AI x SEO x (authentic) Content that revives traffic and keeps leads flowing | 3,000,000+ Average monthly readers on Medium | Sr Technical Writer @ DigitalOcean | Ex-Cloud Consultant @ AMEX | Ex-Site Reliability Engineer(DevOps)@Nutanix
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!
Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.
Full documentation for every DigitalOcean product.
The Wave has everything you need to know about building a business, from raising funding to marketing your product.
Stay up to date by signing up for DigitalOcean’s Infrastructure as a Newsletter.
New accounts only. By submitting your email you agree to our Privacy Policy
Scale up as you grow — whether you're running one virtual machine or ten thousand.
Sign up and get $200 in credit for your first 60 days with DigitalOcean.*
*This promotional offer applies to new accounts only.