Data durability
Memgraph uses two mechanisms to ensure the durability of stored data and make disaster recovery possible:
- write-ahead logging (WAL)
- periodic snapshot creation
These mechanisms generate durability files and save them in the respective
wal
and snapshots
folders in the data directory. Data directory stores
permanent data on disk.
The default data directory path is /var/lib/memgraph
but the path can be
changed by modifying the --data-directory
configuration flag. To learn how to
modify configuration flags, head over to the
Configuration
page.
With Memgraph Enterprise, the data_directory
holds databases
directory which
splits durability files by database name. The reason for that is the
multi-tenant architecture in Memgraph Enterprise, where the durability files for
each database are stored under /data_directory/databases/<db_name>
. The
databases
directory will exist even if you’re not using the multi-tenancy
feature.
Durability files are deleted when certain events are triggered, for example,
exceeding the maximum number of snapshots, defined by the
--storage-snapshot-retention-count=3
flag.
To prevent the deletion of durability files, you need to lock the data directory
, and enable it again by unlocking the directory.
To manage this behavior, use the following queries:
LOCK DATA DIRECTORY;
UNLOCK DATA DIRECTORY;
To show the status of the data directory, run:
DATA DIRECTORY LOCK STATUS;
To encrypt the data directory, use LUKS as it works with Memgraph out of the box and is undetectable from the application perspective so it shouldn’t break any existing applications.
Durability mechanisms
To configure the durability mechanisms, check their respective configuration flags.
Write-ahead logging
Write-ahead logging (WAL) is a technique applied in providing atomicity and durability to database systems.
In the default IN_MEMORY_TRANSACTIONAL storage
mode, Memgraph creates a
Delta
object each time data is changed. By using Deltas, Memgraph creates
write-ahead logs. Each database modification is therefore recorded in a log file
before being written to the DB, and in the end the log file contains all steps
needed to reconstruct the DB’s most recent state.
Memgraph has WAL enabled by default. To switch it on and off, use the boolean
--storage-wal-enabled
flag. For other WAL-related flags check the configuration
reference guide.
By default, WAL files are located at /var/lib/memgraph/wal
.
Snapshots
Snapshots provide a faster way to restore the states of your database. Snapshots
are created periodically based on the value defined with the
--storage-snapshot-interval
configuration flags, as well as upon exit based
on the value of the --storage-snapshot-on-exit
configuration flag. When a
snapshot creation is triggered, the entire data storage is written to the drive.
Nodes and relationships are divided into groups called batches.
On startup, the database state is recovered from the most recent snapshot file. Memgraph can read the data and build the indexes on multiple threads, using batches as a parallelization unit: each thread will recover one batch at a time until there are no unhandled batches.
This means the same batch size might not be suitable for every dataset. A smaller dataset might require a smaller batch size to utilize a multi-threaded processor, while bigger datasets might use bigger batches to minimize the synchronization between the worker threads. Therefore, the size of batches and the number of used threads are configurable similarly to other durability-related settings.
The timestamp of the snapshot is compared with the latest update recorded in the WAL file and, if the snapshot is less recent, the state of the DB will be recovered using the WAL file.
Memgraph has snapshot creation enabled by default. You can configure the exact snapshot creation behavior by defining the relevant flags. Alternatively, you can make one directly by running the following query:
CREATE SNAPSHOT;
By default, snapshot files are saved inside the var/lib/memgraph/snapshots
directory.
To query which snapshots currently exist in the data directory, execute:
SHOW SNAPSHOTS;
Periodic snapshots
IN_MEMORY_TRANSACTIONAL
mode supports periodic snapshot creation. The interval
can be set at startup via the --storage-snapshot-interval
flag or at run-time
via the database settings:
SET DATABASE SETTING "storage.snapshot.interval" TO "1200";
SET DATABASE SETTING "storage.snapshot.interval" TO "* * 12 * * *";
SET DATABASE SETTING "storage.snapshot.interval" TO "";
Changing the configuration settings depends on the way you are using Memgraph, so please refer to the configuration docs for more information.
If the interval string is an integer, then it’s treated as the execution period in seconds. Interval can also be defined as a CRON expression. The expression must define seconds in addition to the usual five cron fields. By setting the value to an empty string, the background process is paused and any currently active snapshot creation will finish. Please note that defining the interval via a CRON expression is an Enterprise feature.
If the database is started in or migrated into IN_MEMORY_ANALYTICAL
mode, the
background thread will pause and no snapshots will be created as long as that
mode is active. The job will continue with the last defined interval when the
storage mode is changed to IN_MEMORY_TRANSACTIONAL
storage mode.
Snapshots and WAL files are presently not compatible between Memgraph versions.
Storage modes
Memgraph has the option to work in IN_MEMORY_ANALYTICAL
,
IN_MEMORY_TRANSACTIONAL
or ON_DISK_TRANSACTIONAL
storage
modes.
Memgraph always starts in the IN_MEMORY_TRANSACTIONAL
mode in which it creates
periodic snapshots and write-ahead logging as durability mechanisms, and also
enables creating manual snapshots.
In the IN_MEMORY_ANALYTICAL
mode, Memgraph offers no periodic snapshots and
write-ahead logging. Users can create a snapshot with the CREATE SNAPSHOT;
Cypher query. During the process of snapshot creation, other transactions will
be prevented from starting until the snapshot creation is completed.
In the ON_DISK_TRANSACTIONAL
mode, durability is supported by RocksDB since it
keeps its own
WAL files.
Memgraph persists the metadata used in the implementation of the on-disk
storage.