Memgraph logo
Back to blog
Memgraph Snapshots Explained

Memgraph Snapshots Explained

By Memgraph
6 min readSeptember 25, 2024

In our previous blog post, Understanding Database Snapshots, we covered the basics of database snapshots and how they function in general. Now, it’s time to take a closer look at how snapshots work specifically in Memgraph.

In this post, we’ll break down the mechanics of Memgraph’s full snapshot mechanism, how it differs from other databases that use incremental snapshots, and how Memgraph combines snapshots with Write-Ahead Logging (WAL) for efficient recovery and transactional durability.

Whether you’re a seasoned graph database user or just getting started, understanding Memgraph’s snapshot process will help you optimize performance and ensure data durability.

How Database Snapshots Work in Memgraph?

Memgraph skips the complexity and, in IN_MEMORY_TRANSACTIONAL storage mode, goes straight for full snapshots combined with Write-Ahead Logging (WAL). The idea is simple: instead of tracking changes piecemeal, Memgraph periodically takes a full snapshot of the database's current state and logs everything that happens after that.

When you trigger a snapshot in Memgraph, the whole database—every node, edge, and property—gets written to disk in one go. No "saving changes since the last snapshot," no complicated versioning schemes. Just a full copy of the graph as it stands right then and there.

Memgraph’s WAL works alongside the snapshot system to ensure data durability.

Every time a change occurs in the database, a delta object is created and logged in a write-ahead log (WAL). If the system crashes, Memgraph restores the most recent snapshot by default.

However, if a WAL file has a more recent timestamp than the snapshot, Memgraph will use the WAL to recover the database to its latest state. This ensures that no important data is lost, giving you full transactional durability.

Memgraph doesn't block real-time activity for this, either. It divides the snapshot process into batches and runs it in parallel. This means while you're saving your database state, your queries are still running at full throttle. After a restart, Memgraph uses the most recent snapshot and WAL to recover without skipping a beat.

Configuring Memgraph Snapshots

You don’t have to wait for something to break to use snapshots. Memgraph lets you schedule them based on time intervals (--storage-snapshot-interval-sec) or trigger them manually with a simple query:

cypher
CREATE  SNAPSHOT;

When working with Memgraph, you can also enable the --storage-snapshot-on-exit flag to automatically save your data as a snapshot when Memgraph shuts down. This ensures that your latest data is safely stored even if you forget to manually trigger a snapshot before exiting.

When you run the CREATE SNAPSHOT command in Memgraph, it creates a snapshot file on the disk. However, this snapshot won’t automatically show up in the Cloud interface. The snapshots you see in the Cloud interface are copies of the entire disk (like an AWS EBS volume), which includes any Memgraph snapshots. Since the Cloud service doesn’t have access to Memgraph’s internal commands for security reasons, it can’t automatically create a new disk copy when you use CREATE SNAPSHOT in Memgraph. So, while the command creates a snapshot locally, it doesn’t trigger a new snapshot in the Cloud interface.

Everything gets saved in /var/lib/memgraph/snapshots by default. Remember that snapshots and WAL files are tied to the Memgraph version you're using, so you can’t migrate them between versions. Yet another reason to keep things simple.

What Makes Memgraph's Approach Different?

Most databases use Copy-on-write or Redirect-on-write to avoid making full copies of the data. Those methods are designed to save space and time by only storing changes. It sounds smart on paper, but in practice, it introduces complexity. You have to manage multiple layers of snapshots, deal with potential performance bottlenecks, and you might even end up juggling too many intermediate states.

Memgraph says, "No thanks" to all that. Instead, it creates full snapshots every time. Here’s why that’s beneficial:

Consistency is Key

With a full snapshot, you know exactly what the state of the database was at a given point in time. There’s no guesswork, no juggling logs, deltas, or incremental snapshots. If you need to roll back or restore, you can do it without piecing things together like a puzzle.

No Performance Bottlenecks

While Copy-on-write might sound efficient, it can slow things down when the system has to track every small change. Memgraph, on the other hand, offloads snapshot creation to multiple threads, so the database keeps humming along without missing a beat. Your queries keep running, and your real-time performance stays solid.

Less Overhead

Managing complex snapshot hierarchies or figuring out when to merge changes in Copy-on-write systems can become a headache. Memgraph keeps it simple—full snapshot + WAL, no extra overhead to worry about.

Key Benefits of Memgraph's Full Snapshots

  1. Speed. Parallel processing of snapshots means no performance drop while creating them. Memgraph breaks the database into batches, and multiple threads work on saving those batches to disk simultaneously. Fast, efficient, and no downtime.

  2. Simplicity. Full snapshots are easy to manage. No dealing with incremental snapshots, no merging data from different states. Just a full, point-in-time copy.

  3. Reliability. When combined with WAL, Memgraph ensures that you won’t lose any data even if something goes wrong. Your most recent snapshot and WAL logs are enough to recover everything.

  4. Flexibility. You control when and how often snapshots are created, whether manually or automatically, making it easy to integrate into your backup and recovery strategies.

Conclusion

Memgraph’s snapshot approach is designed for simplicity, speed, and reliability. By taking full snapshots and leveraging Write-Ahead Logging (WAL), Memgraph ensures that your data is always durable and recoverable without the complexity found in other systems. Whether you’re configuring snapshots to run automatically or triggering them manually, Memgraph makes it easy to integrate snapshots into your backup and recovery strategy.

If you run into any issues with snapshots or have specific questions, we’re here to help! You can create a ticket via GitHub, or better yet, join the Memgraph Discord community where you can get advice from both Memgraph experts and experienced users.

Further Reading

Join us on Discord!
Find other developers performing graph analytics in real time with Memgraph.
© 2024 Memgraph Ltd. All rights reserved.