Data durability

Memgraph uses two mechanisms to ensure the durability of stored data and make disaster recovery possible:

write-ahead logging (WAL)
periodic snapshot creation

These mechanisms generate durability files and save them in the respective wal and snapshots folders in the data directory. Data directory stores permanent data on disk.

The default data directory path is /var/lib/memgraph but the path can be changed by modifying the --data-directory configuration flag. To learn how to modify configuration flags, head over to the Configuration page.

With Memgraph Enterprise, the data_directory holds databases directory which splits durability files by database name. The reason for that is the multi-tenant architecture in Memgraph Enterprise, where the durability files for each database are stored under /data_directory/databases/<db_name>. The databases directory will exist even if you’re not using the multi-tenancy feature.

Durability files are deleted when certain events are triggered, for example, exceeding the maximum number of snapshots, defined by the --storage-snapshot-retention-count=3 flag.

To prevent the deletion of durability files, you need to lock the data directory, and enable it again by unlocking the directory.

To manage this behavior, use the following queries:

LOCK DATA DIRECTORY;
UNLOCK DATA DIRECTORY;

To show the status of the data directory, run:

DATA DIRECTORY LOCK STATUS;

To encrypt the data directory, use LUKS as it works with Memgraph out of the box and is undetectable from the application perspective so it shouldn’t break any existing applications.

Durability mechanisms

To configure the durability mechanisms, check their respective configuration flags.

Write-ahead logging

Write-ahead logging (WAL) is a technique applied in providing atomicity and durability to database systems.

In the default IN_MEMORY_TRANSACTIONAL storage mode, Memgraph creates a Delta object each time data is changed. By using Deltas, Memgraph creates write-ahead logs. Each database modification is therefore recorded in a log file before being written to the DB, and in the end the log file contains all steps needed to reconstruct the DB’s most recent state.

Memgraph has WAL enabled by default. To switch it on and off, use the boolean --storage-wal-enabled flag. For other WAL-related flags check the configuration reference guide.

By default, WAL files are located at /var/lib/memgraph/wal.

Snapshots

Snapshots provide a faster way to restore the states of your database. Snapshots are created periodically based on the value defined with the --storage-snapshot-interval configuration flags, as well as upon exit based on the value of the --storage-snapshot-on-exit configuration flag. When a snapshot creation is triggered, the entire data storage is written to the drive. Nodes and relationships are divided into groups called batches.

Snapshot creation can be made faster by using multiple threads. See Parallelized execution for more information.

On startup, the database state is recovered from the most recent snapshot file. Memgraph can read the data and build the indexes on multiple threads, using batches as a parallelization unit: each thread will recover one batch at a time until there are no unhandled batches.

This means the same batch size might not be suitable for every dataset. A smaller dataset might require a smaller batch size to utilize a multi-threaded processor, while bigger datasets might use bigger batches to minimize the synchronization between the worker threads. Therefore, the size of batches and the number of used threads are configurable similarly to other durability-related settings.

The timestamp of the snapshot is compared with the latest update recorded in the WAL file and, if the snapshot is less recent, the state of the DB will be recovered using the WAL file.

Memgraph has snapshot creation enabled by default. You can configure the exact snapshot creation behavior by defining the relevant flags. Alternatively, you can make one directly by running the following query:

CREATE SNAPSHOT;

If another snapshot is already being created or no committed writes to the database have been made since the last snapshot, this query will fail with an error.

By default, snapshot files are saved inside the var/lib/memgraph/snapshots directory.

To query which snapshots currently exist in the data directory, execute:

SHOW SNAPSHOTS;

Periodic snapshots

IN_MEMORY_TRANSACTIONAL mode supports periodic snapshot creation. The interval can be set at startup via the --storage-snapshot-interval flag or at run-time via the database settings:

SET DATABASE SETTING "storage.snapshot.interval" TO "1200";
SET DATABASE SETTING "storage.snapshot.interval" TO "* * 12 * * *";
SET DATABASE SETTING "storage.snapshot.interval" TO "";

Changing the configuration settings depends on the way you are using Memgraph, so please refer to the configuration docs for more information.

If the interval string is an integer, then it’s treated as the execution period in seconds. Interval can also be defined as a CRON expression. The expression must define seconds in addition to the usual five cron fields. By setting the value to an empty string, the background process is paused and any currently active snapshot creation will finish. Please note that defining the interval via a CRON expression is an Enterprise feature.

If the database is started in or migrated into IN_MEMORY_ANALYTICAL mode, the background thread will pause and no snapshots will be created as long as that mode is active. The job will continue with the last defined interval when the storage mode is changed to IN_MEMORY_TRANSACTIONAL storage mode.

The periodic snapshot will be skipped if another snapshot is in progress or no new writes have been committed since the last snapshot. If the periodic snapshot is skipped it will be logged on INFO level.

⚠️

Snapshots and WAL files are presently not compatible between Memgraph versions.

Parallelized execution

Snapshot creation in Memgraph can be optimized using multiple threads, which significantly reduces the time required to create snapshots for large datasets.

This behavior can be controlled using the following flags:

--storage-parallel-snapshot-creation: This flag determines whether snapshot creation is performed in a multi-threaded fashion. By default, it is set to false. To enable parallelized execution, set this flag to true.
--storage-snapshot-thread-count: This flag specifies the number of threads to be used for snapshot creation. By default, Memgraph uses the system’s maximum thread count. You can override this value to fine-tune performance based on your system’s resources.

When parallelized execution is enabled, Memgraph divides the data into batches, where the batch size is defined via --storage-items-per-batch. The optimal batch size and thread count may vary depending on the dataset size and system configuration.

When Parallelization Helps

Parallel execution is especially beneficial when CPU-bound operations dominate the snapshot creation process, such as serialization or compression of in-memory structures. As a general guideline, parallel snapshot creation provides the most significant performance improvement when disk I/O constitutes 25% or less of the total snapshot creation time.

To take full advantage of parallelization, it’s also important to set the --storage-items-per-batch flag appropriately. This value determines how the dataset is split into work units for threads. A good rule of thumb is: Total number of items (vertices + edges) ≈ 4 × number of threads × —storage-items-per-batch This ensures that each thread has enough batches to work on without idling, helping maximize CPU utilization during snapshot creation.

When using multi-threaded snapshot creation with the correct batch size, the disk will once again become the bottleneck. At that point, more threads will not necessarily yield better performance.

Measuring Disk Write Speed on Linux

To determine how fast your disk can handle writes (which influences the I/O bottleneck), you can use the dd command:

dd if=/dev/zero of=testfile bs=1G count=1 oflag=direct

This writes a 1 GB file directly to disk and reports the write speed. After the test, remove the file.

You can also monitor real-time disk utilization during snapshot creation using tools like iostat, iotop, or dstat.

Storage modes

Memgraph has the option to work in IN_MEMORY_ANALYTICAL, IN_MEMORY_TRANSACTIONAL or ON_DISK_TRANSACTIONAL storage modes.

Memgraph always starts in the IN_MEMORY_TRANSACTIONAL mode in which it creates periodic snapshots and write-ahead logging as durability mechanisms, and also enables creating manual snapshots.

In the IN_MEMORY_ANALYTICAL mode, Memgraph offers no periodic snapshots and write-ahead logging. Users can create a snapshot with the CREATE SNAPSHOT; Cypher query. During the process of snapshot creation, other transactions will be prevented from starting until the snapshot creation is completed.

In the ON_DISK_TRANSACTIONAL mode, durability is supported by RocksDB since it keeps its own WAL files. Memgraph persists the metadata used in the implementation of the on-disk storage.

Data types Indexes