Understanding Database Snapshots

By Kruno Golubic

8 min readSeptember 21, 2023

In the age of rapid technological evolution, data stands as an uncontested kingpin. Driving businesses, fostering innovations, and influencing decisions, data's role is paramount. As its significance soars, the urgency for efficient and reliable data management practices becomes vital. Traditional backup methods once stood as the gold standard for data protection. However, the tech world constantly evolves, prompting us to reevaluate and adopt more advanced solutions. This is where database snapshots come into play.

What Are Database Snapshots?

Often, when one hears "database snapshots," SQL Server might be the first thing to pop into mind. However, this cutting-edge technique transcends the boundaries of SQL. So, what exactly is a database snapshot?

Imagine you're looking at a document. Instead of photocopying each page, you simply take a photograph of it. That's essentially what database snapshots do. Rather than duplicating the entire dataset, database snapshots capture the state of a database at a specific point in time, recording only changes made post their creation. This is somewhat similar to how version control systems save changes, making it an incredibly space-efficient method. Techniques like copy-on-write play a pivotal role in making this possible, not just in the SQL Server realm but across multiple platforms.

But how does this differ from traditional backups? While database backups demand creating a full replica of your data, a database snapshot focused on changes since the last snapshot. This approach ensures that if no alterations are made, no storage space is consumed, unlike backups that allocate space equivalent to the backed-up data every time.

How Database Snapshots Work

The magic behind snapshots is both intriguing and elegant. When you create a database snapshot, the system doesn’t embark on an immediate wholesale copying of the database. Instead, it adopts a strategy called the copy-on-write operation. Here, when modifications are made to the source database post the snapshot’s inception, only the original data pages affected by these changes are stored.

how it works

Copy-On-Write vs Redirect-On-Write

At the heart of database snapshots lies a set of intricate techniques that enable them to capture and preserve data changes efficiently. Two of the most prevalent methods employed in this process are Copy-on-write and Redirect-on-write:

Copy-on-write: This mechanism operates by copying the original data blocks that are about to be modified into a snapshot area before they are overwritten. This ensures that the snapshot retains the original data, even as changes occur in the source database.
Redirect-on-write: In contrast, Redirect-on-write works by redirecting the write operation to a new data block while retaining the old block in the snapshot. This approach ensures that the snapshot preserves the original data in its unaltered state.

Alternative Snapshot Techniques

While "Copy-on-write" and "Redirect-on-write" are two common methods used in snapshot technology, there are other variations and techniques that can be employed depending on the specific snapshot implementation and the underlying technology. These methods may not always be explicitly named or standardized, but they serve similar purposes. Here are a few additional approaches:

Clone-based snapshots: This method creates a new clone or copy of the data blocks when changes are made, preserving the original data in the snapshot. Cloning can be achieved at the database file system level or through storage virtualization.
Sparse copy: In this technique, only the data that changes from the original dataset is copied into the snapshot. This minimizes the amount of data that needs to be stored while ensuring the snapshot reflects the state of the data at a specific point in time.
Log-based snapshots: Instead of copying data blocks, log-based snapshots capture changes made to the data by recording the transactions or operations that modify the data. This approach is commonly used in database systems to maintain consistent snapshots.
Differential snapshots: These snapshots capture the changes made since the last snapshot was taken, similar to incremental backups. They only store the differences between the current state and the previous snapshot, reducing storage requirements.
Thin provisioning: Thin provisioning is a storage technique that allows for the creation of snapshots without initially consuming additional storage space. Space is allocated as changes are made, making it an efficient use of storage resources.
Snapshot chains: Some snapshot systems organize snapshots into chains, where each snapshot is based on the previous one. This chaining mechanism helps in tracking changes over time and maintaining data consistency.

The choice of snapshot method often depends on the specific requirements of the system, the storage technology in use, and the desired trade-offs between performance, storage efficiency, and ease of management. Different snapshot technologies and storage vendors may implement these methods in various ways to achieve similar objectives.

speed

Snapshots in Memgraph

Unlike "Copy-on-write" or "Redirect-on-write," Memgraph employs a full snapshot mechanism where the entire in-memory database is saved to disk during a snapshot operation. This ensures durability by combining snapshots with Write-Ahead Logging (WAL), which records all changes since the last snapshot, allowing for consistent recovery and transaction-level durability.

Here’s how it works:

Full database snapshot: When a snapshot is triggered, Memgraph starts a transaction and reads the entire in-memory database, writing the current state to disk. This is not a "Copy-on-write" or "Redirect-on-write" approach but rather a full copy of the database.
Write-Ahead Logging (WAL): To achieve transaction-level durability, Memgraph uses WAL files. These files log every change made to the database since the last snapshot. If a failure occurs, Memgraph can use the latest snapshot in conjunction with the WAL to restore the database to its last consistent state.
Snapshot process: Memgraph handles snapshot creation in parallel with ongoing transactions. However, changes made after the snapshot process begins are not included in the snapshot. This ensures that the snapshot represents an accurate point-in-time view of the database at the start of the snapshot process.

Database Snapshot Advantages

One of the primary advantages of database snapshots is their speed. Since they capture only changes made after their creation, they are swift in marking real-time modifications. This efficiency is particularly valuable in dynamic database environments where rapid changes are the norm.

Moreover, snapshots are incredibly space-efficient. Unlike traditional backups that demand storage space equivalent to the entire dataset, snapshots only store the alterations made, making them economical in terms of storage resources. This space-efficiency is particularly beneficial when dealing with large datasets.

Another advantage is reduced performance overhead. The absence of the need to duplicate the entire dataset ensures that the system's performance remains largely unaffected, even as snapshots are created and updated.

Potential Pitfalls and Best Practices With Database Snapshots

Like any technological solution, snapshots have their caveats. Snapshots offer remarkable flexibility and efficiency, but they should not be misinterpreted as a complete backup substitute. They serve best as a supplementary solution.

warning

Some of the potential pitfalls are the following:

Over-reliance on snapshots can lead to unwarranted risks.
Overloading systems with innumerable snapshots without a coherent management strategy can be disastrous, particularly concerning storage.

To harness their potential fully, make sure to:

Balance with backups: It's imperative to maintain a robust, traditional backup strategy alongside snapshots.
Conduct regular reviews: Periodically check and manage your snapshots to ensure they remain useful without consuming unnecessary storage.
Conduct validation: Always test snapshots to ensure data integrity. Perhaps consider setting up a test database for such tasks, to get firsthand experience of their value.

Understanding ACID vs. Snapshot Isolation

acid snapshot

Databases have their own language, and for many, ACID is one of the first words they come to understand. This acronym stands for Atomicity, Consistency, Isolation, and Durability, fundamental principles ensuring database reliability. In this context, many encounter the term "snapshot isolation," a concurrency control mechanism. However, it's crucial to draw a distinction between this and "database snapshots." While both concepts revolve around the idea of capturing a state of data at a specific point in time, their applications are distinctly different.

Snapshot isolation focuses on providing transactions with a consistent view of the database to maintain data integrity during concurrent operations. In contrast, database snapshots pertain to creating a point-in-time, read-only capture of the entire database, primarily for backup or analytical purposes. So, while they might sound alike, they serve unique and vital roles in the intricate dance of database management.

Conclusion

In the world of data management, think of database snapshots like smart tools that help keep track of changes efficiently, save storage space, and keep things running smoothly. However, it's essential to recognize that database snapshots are most effective when they complement traditional backup strategies, much like using both a wrench and a screwdriver for different tasks in a workshop. To harness their potential, a solid grasp of snapshot methodologies is key, akin to understanding the inner workings of complex machinery. But like every tool, it's not the existence but the correct usage that determines its value. With a comprehensive understanding and strategic implementation, snapshots can be a game-changer in data management, not just in SQL, but across the vast realm of databases.Their ability to efficiently document changes, conserve storage space, and maintain top-notch operational performance makes them an asset for tech professionals.