Data durability

Data durability

Memgraph uses two mechanisms to ensure the durability of stored data and make disaster recovery possible:

  • write-ahead logging (WAL)
  • periodic snapshot creation

These mechanisms generate durability files and save them in the respective wal and snapshots folders in the data directory. Data directory stores permanent data on disk.

The default data directory path is /var/lib/memgraph but the path can be changed by modifying the --data-directory configuration flag. To learn how to modify configuration flags, head over to the Configuration page.

With Memgraph Enterprise, the data_directory holds databases directory which splits durability files by database name. The reason for that is the multi-tenant architecture in Memgraph Enterprise, where the durability files for each database are stored under /data_directory/databases/<db_name>. The databases directory will exist even if you're not using the multi-tenancy feature.

Durability files are deleted when certain events are triggered, for example, exceeding the maximum number of snapshots, defined by the --storage-snapshot-retention-count=3 flag.

To prevent the deletion of durability files, you need to lock the data directory, and enable it again by unlocking the directory.

To manage this behavior, use the following queries:

LOCK DATA DIRECTORY;
UNLOCK DATA DIRECTORY;

To show the status of the data directory, run:

DATA DIRECTORY LOCK STATUS;

To encrypt the data directory, use LUKS (opens in a new tab) as it works with Memgraph out of the box and is undetectable from the application perspective so it shouldn't break any existing applications.

Durability mechanisms

To configure the durability mechanisms, check their respective configuration flags.

Write-ahead logging

Write-ahead logging (WAL) is a technique applied in providing atomicity and durability to database systems.

In the default IN_MEMORY_TRANSACTIONAL storage mode, Memgraph creates a Delta object each time data is changed. By using Deltas, Memgraph creates write-ahead logs. Each database modification is therefore recorded in a log file before being written to the DB, and in the end the log file contains all steps needed to reconstruct the DB’s most recent state.

Memgraph has WAL enabled by default. To switch it on and off, use the boolean --storage-wal-enabled flag. For other WAL-related flags check the configuration reference guide.

By default, WAL files are located at /var/lib/memgraph/wal.

Snapshots

Snapshots provide a faster way to restore the states of your database. Snapshots are created periodically based on the value defined with the --storage-snapshot-interval configuration flags, as well as upon exit based on the value of the --storage-snapshot-on-exit configuration flag. When a snapshot creation is triggered, the entire data storage is written to the drive. Nodes and relationships are divided into groups called batches.

On startup, the database state is recovered from the most recent snapshot file. Memgraph can read the data and build the indexes on multiple threads, using batches as a parallelization unit: each thread will recover one batch at a time until there are no unhandled batches.

This means the same batch size might not be suitable for every dataset. A smaller dataset might require a smaller batch size to utilize a multi-threaded processor, while bigger datasets might use bigger batches to minimize the synchronization between the worker threads. Therefore, the size of batches and the number of used threads are configurable similarly to other durability-related settings.

The timestamp of the snapshot is compared with the latest update recorded in the WAL file and, if the snapshot is less recent, the state of the DB will be recovered using the WAL file.

Memgraph has snapshot creation enabled by default. You can configure the exact snapshot creation behavior by defining the relevant flags. Alternatively, you can make one directly by running the following query:

CREATE SNAPSHOT;

By default, snapshot files are saved inside the var/lib/memgraph/snapshots directory.

To query which snapshots currently exist in the data directory, execute:

SHOW SNAPSHOTS;

Periodic snapshots

IN_MEMORY_TRANSACTIONAL mode supports periodic snapshot creation. The interval can be set at startup via the --storage-snapshot-interval flag or at run-time via the database settings:

SET DATABASE SETTING "storage.snapshot.interval" TO "1200";
SET DATABASE SETTING "storage.snapshot.interval" TO "* * 12 * * *";
SET DATABASE SETTING "storage.snapshot.interval" TO "";

Changing the configuration settings depends on the way you are using Memgraph, so please refer to the configuration docs for more information.

If the interval string is an integer, then it's treated as the execution period in seconds. Interval can also be defined as a CRON (opens in a new tab) expression. The expression must define seconds in addition to the usual five cron fields. By setting the value to an empty string, the background process is paused and any currently active snapshot creation will finish. Please note that defining the interval via a CRON expression is an Enterprise feature.

If the database is started in or migrated into IN_MEMORY_ANALYTICAL mode, the background thread will pause and no snapshots will be created as long as that mode is active. The job will continue with the last defined interval when the storage mode is changed to IN_MEMORY_TRANSACTIONAL storage mode.

⚠️

Snapshots and WAL files are presently not compatible between Memgraph versions.

Storage modes

Memgraph has the option to work in IN_MEMORY_ANALYTICAL, IN_MEMORY_TRANSACTIONAL or ON_DISK_TRANSACTIONAL storage modes.

Memgraph always starts in the IN_MEMORY_TRANSACTIONAL mode in which it creates periodic snapshots and write-ahead logging as durability mechanisms, and also enables creating manual snapshots.

In the IN_MEMORY_ANALYTICAL mode, Memgraph offers no periodic snapshots and write-ahead logging. Users can create a snapshot with the CREATE SNAPSHOT; Cypher query. During the process of snapshot creation, other transactions will be prevented from starting until the snapshot creation is completed.

In the ON_DISK_TRANSACTIONAL mode, durability is supported by RocksDB since it keeps its own WAL (opens in a new tab) files. Memgraph persists the metadata used in the implementation of the on-disk storage.