Manage Memgraph with Docker
Learn the best practices for managing your Memgraph instance in the Docker environment.
Install and run Memgraph with Docker
Memgraph offers two images you can run with Docker, depending on your needs:
memgraph/memgraph-mage
- use this image if you require MAGE library which consist of graph algorithms and util procedures.memgraph/memgraph
- use this image if you just require Memgraph database without MAGE library. This image is more lightweight but it does not include commonly used graph algorithms.
In this guide, we’ll proceed with memgraph/memgraph-mage
images for all examples,
but if you’d like to use memgraph/memgraph
image, just replace it in the code
snippets.
The simplest way of running Memgraph in the Docker environment is using the following command:
docker run -p 7687:7687 -p 7444:7444 memgraph/memgraph-mage
Once you start, you’ll see Memgraph logs in the terminal. To avoid logs showing up
in the terminal, you can run the container in detach mode (-d
flag).
The above command starts Memgraph in a Docker container, and it exposes ports
7687
and 7444
. The port 7687
is used to connect to the database and query it via the Bolt
protocol. The port 7444
is optional, but it’s a good practice to use it, because
that’s where the logs are exposed. Memgraph Lab, a visual user interface, uses
that port to note all logs in the UI.
To connect to Memgraph, you can use CLI mgconsole, preferred client or visual user interface Memgraph Lab.
System configuration
Before running Memgraph, please check the system configuration guidelines, especially the
vm.max_map_count
parameter setting.
Configure Memgraph
Memgraph has its configuration settings persisted in the memgraph.conf
file
located in the /etc/memgraph
folder. Some of the most commonly configured
settings are:
--log-level=WARNING
--query-execution-timeout-sec=600
--storage-snapshot-interval-sec=300
--storage-snapshot-retention-count=3
--storage-mode=IN_MEMORY_TRANSACTIONAL
Of course, other settings are as important, so make sure you get familiar with them as well.
Add the appropriate flags at the end of the command to configure Memgraph within the docker run
command. For example, to change the log level to TRACE
and to
remove query execution timeout, run the following command:
docker run -p 7687:7687 -p 7444:7444 memgraph/memgraph-mage --log-level=TRACE --query-execution-timeout-sec=0
The set flags override the default settings in the memgraph.conf
file. It is
also possible to provide an additional configuration file by providing a path to it
via --file-flag
or MEMGRAPH_CONFIG
environment variable. The set values in
that file will override values in the default memgraph.conf
file. Still, flags
in the docker run
command will override them both, so be careful when
setting the configuration. To ensure all is set correctly, run SHOW CONFIG
once the database is up and running.
Secure the database
In the community version of Memgraph, you can create a user and its password for security.
To achieve that, set MEMGRAPH_USER
and MEMGRAPH_PASSWORD
environment variables. When you set values for those variables, a user will be
created if it does not exist, and no one except from that user will be able to
access the database. Here is the command to start Memgraph with the user environment variables set:
docker run -p 7687:7687 -p 7444:7444 -e MEMGRAPH_USER="user" -e MEMGRAPH_PASSWORD="pass" memgraph/memgraph-mage
To authenticate, provide the correct username and password. For example, if you want to run Memgraph’s CLI mgconsole, run the following command:
# Run docker ps to get CONTAINER_ID
docker exec -it <CONTAINER_ID> mgconsole --username user --password pass
The other option is to start Memgraph with init flags, and you can read more about it in the configuration guide.
In the Enterprise version of Memgraph, you can achieve a higher level of security with the role-based access control (RBAC) and fine-grained access control. Within fine-grained access control Memgraph offers label-based access control.
To set it up, first enable Memgraph Enterprise, and then run the necessary Cypher queries to set up privileges properly.
You can add an additional layer of security by enabling SSL encryption.
Persist data with Docker volumes
Docker volumes are used for data persistence across container lifecycles, data sharing between containers, efficient data storage, backup and migration and for easy access from the host system.
Whenever you restart your Docker container, the container’s filesystem will persist your data, logs and configuration. Still, data will not be persisted across container deletion and recreation. To ensure you have your data even if you decide to spin up a new container, run the container with volumes.
Volumes are the preferred mechanism for persisting data generated by and used by Docker containers. They are completely managed by Docker and are stored outside of the default filesystem. The data in volumes is preserved across container restarts and can be shared between multiple containers.
Although bind mounts, which allow you to attach specific paths from your host machine to paths in the container, are also used for data persistence, they are not a recommended option.
The best practice to ensure volume data persistence is to use named volumes and to do regular backups.
Here are the locations of the different types of data that Memgraph uses:
- Configuration files:
/etc/memgraph
- Logs:
/var/log/memgraph
- User-related data:
/usr/lib/memgraph
- Graph data:
/var/lib/memgraph
Here is the command to run Memgraph with volumes for data persistency:
docker run -p 7687:7687 -p 7444:7444 -v mg_lib:/var/lib/memgraph memgraph/memgraph-mage
Commonly, you want to ensure you have the same Memgraph configuration wherever you start it. Then you can add volume for the configuration persistency as well:
docker run -p 7687:7687 -p 7444:7444 -v mg_lib:/var/lib/memgraph -v mg_etc:/etc/memgraph memgraph/memgraph-mage
If you’d like an easy access to logs on your machine, you can add volume for logs as well:
docker run -p 7687:7687 -p 7444:7444 -v mg_lib:/var/lib/memgraph -v mg_etc:/etc/memgraph -v mg_log:/var/log/memgraph memgraph/memgraph-mage
Named volumes are Docker-managed volumes that store data independently of containers. They are not directly mapped to a specific host directory that you choose because, in this case, Docker manages where the data is stored on the host. However, you can still access this data from the host.
As opposed to bind mounts, named volumes provide better encapsulation and separation of the environment. Having named volumes makes it easier to back up or migrate data. Since they’re managed by Docker, there is an additional layer of abstraction and security. Named volumes are more suitable for production deployments, where security, data integrity, and portability are critical.
If you want to start a new Memgraph instance with the data from another Docker
container which has mg_lib
volume attached to it, you can run the following
command:
docker run -p 7687:7687 -p 7444:7444 -v mg_lib:/var/lib/memgraph memgraph/memgraph-mage
The new container will be started with the contents from the mg_lib
volume,
meaning it will restore the database from the existing snapshots. That’s the
easiest way to transfer your data to another instance.
Additionally, here are some useful facts about Docker volumes:
- Two containers can’t use the same volume at the same time: Be careful when you run a new Memgraph instance to change the name of the volume if you plan on running two Memgraph instances at the same time with different data.
- Two containers can use the same volume if they’re not running at the same time: you can run a new Memgraph instance with an already used volume if you first stopped a previously running instance. That is useful when you want to reuse the same data for two instances.
- Using named volumes is the easiest way to migrate data between different containers.
- With named volumes, you can save files locally by inspecting the volume and saving the data to the local machine. With Docker Desktop, these actions are straightforward. Docker volumes can be inspected even when the container is down.
Backup data
All your data is stored in the /var/lib/memgraph
data folder. Although being
in-memory, Memgraph creates snapshots and WAL files to persist your data.
Because of that, it is a good practice to have a named volume on the
/var/lib/memgraph
folder. That volume can be attached to a new instance, which
will then start with your data being loaded.
Still, it is a good practice to do a backup regularly.
Copy the data persistence folder
To create a backup, you need to copy the contents of /var/lib/memgraph
folder.
If you previously set up the volume on /var/lib/memgraph
data folder, then
just save the content from that volume on a safe location.
If you didn’t run Memgraph with set up volumes, then copy the contents of
/var/lib/memgraph
folder to a safe location. To do that, use docker cp
command. For example, to copy the whole data folder from Memgraph onto the local
file system, first run the following query in Memgraph to lock the data
directory to avoid changes happening during the backup process:
LOCK DATA DIRECTORY;
After that, copy the data folder onto your local file system:
docker cp <container_id>:/var/lib/memgraph /path/to/my/local-folder
In the end, unlock the data directory by running the following query in Memgraph:
UNLOCK DATA DIRECTORY;
By default, Memgraph creates snapshots every 300 seconds and retains three latest snapshots, so it can easily recover in case of corruption. It is a good practice to have more than one snapshot. If something caused the database to crash, Memgraph will still be able to recover your data from the write-ahead logs. On exit, Memgraph will create the latest snapshot. This can be configured, based on your dataset size and use case. If your data is not being frequently updated, you can disable periodic snapshot creation and create snapshots manually when you need to.
Dump database
Another approach is to dump the database into a file. That is useful if you want to move the data between different Memgraph versions that might have incompatible data formats are used for snapshots or WAL files.
Here are the steps to dump the database with Memgraph running in a Docker container:
docker exec -it -u 0 <container_id> bash
echo "DUMP DATABASE;" | mgconsole --output-format=cypherl > data.cypherl
docker cp <container_id>:data.cypherl data.cypherl
The whole graph is dumped into a CYPHERL file, which consists of Cypher queries used to create a database.
It is a better practice to copy the data persistency folder when possible over dumping the database since both the backup and restore processes are significantly faster.
Restore data
Depending on the backup method you use, you can restore data differently.
If you set up a volume on var/lib/memgraph
directory, then you can attach it
to the new container. If you didn’t set up a volume on var/lib/memgraph
directory, then you need to copy the data folder’s contents in the new
container in var/lib/memgraph
directory.
The Memgraph user should own the backup data folder, so make sure the folder is accessible for read and write operations by the Memgraph user.
If you have dumped the database, you can import the data back into Memgraph using the mgconsole CLI tool or the client library. Make sure you read the best practices for such import.
Upgrading Memgraph
Between different versions of Memgraph, we might break the compatibility of the configurations, snapshots, and WAL files. That is done very rarely, but it is helpful to keep an eye on the release notes to see if there are any breaking changes.
Set up a cluster
To create a cluster, replicate data across several instances. Setting up replication means running a couple of Memgraph instances, where one of them is the MAIN instance, and others are either SYNC or ASYNC replicas.
Here is an example of setting up a replication cluster with three instances:
Run the instance on port 7687 (this instance will be MAIN):
docker run -p 7687:7687 -p 7444:7444 -p 10000:10000 memgraph/memgraph-mage --replication-restore-state-on-startup=true
Run another instance on port 7688 (this instance will be REPLICA):
docker run -p 7688:7687 -p 7445:7444 -p 10000:10000 memgraph/memgraph-mage --replication-restore-state-on-startup=true
Run the last instance on port 7689 (this instance will be REPLICA):
docker run -p 7689:7687 -p 7446:7444 -p 10000:10000 memgraph/memgraph-mage --replication-restore-state-on-startup=true
All started instances are MAIN upon starting. To set up a cluster, two instances must be demoted to REPLICA roles because only one instance can be MAIN. To do that, run the following query from the second and third instance (REPLICA instances):
SET REPLICATION ROLE TO REPLICA WITH PORT 10000;
To finish setting up the replication cluster, you need to register REPLICA instances from the MAIN instance with the correct IP addresses or DNS of REPLICA instances:
REGISTER REPLICA REP1 SYNC TO "<IP_ADDRESS_REP1>";
REGISTER REPLICA REP2 ASYNC TO "<IP_ADDRESS_REP2>";
If you have trouble connecting, check your firewall settings.
That’s it, replication cluster with one MAIN, one SYNC REPLICA and one ASYNC REPLICA instance is set up. To learn more about the replication Memgraph, refer to our replication docs.
Having the replication cluster set up is great if you need to replicate data, add load balancing or improve availability. Still, to achieve high availability, you need to manage automatic failover. On the other hand, Memgraph Enterprise has a high availability feature included in the offering to ease the management of Memgraph cluster. In such case, the cluster consists of MAIN instance, REPLICA instances and COORDINATOR instances, which, backed up by Raft protocol, manage the cluster state.
Logging
At any point in your Memgraph instance lifecycle, you might need to check the
logs either to debug an issue or to monitor the performance. Memgraph logs are
stored in the /var/log/memgraph
folder if the default location is not changed
by the --log-file
flag. They are typically stored in the format
memgraph_year-month-day.log
.
You can control the log level as described on the logs
page. If you are setting up the production
environment, you should consider setting the log level to INFO
or WARNING
to
avoid the log files growing too large.
If you are experiencing some issues or you have trouble setting up the Memgraph
instance, consider setting the log level to TRACE
to get more information about
the issue.
If you created a volume on /var/log/memgraph
folder, you can inspect it to access logs.
The best way to monitor logs is to attach the logs directly to your terminal as you debug the issue. You can do that by running the following command:
docker logs -f <CONTAINER>
Because of flag -f
or --follow
, the above command will continue streaming the
new output from the container’s STDOUT
and STDERR
.
More on the docker logs
command can be found on the Docker
documentation.
Where to next?
Docker is a powerful tool, and it is usually a good starting point. If your application involves multiple services besides a database, you might want to create a multi-container application with Docker Compose. To learn more about how to manage Memgraph in such an environment, follow our Docker Compose guide.
To discuss Docker and similar topics, join our Discord community.
Schedule a 30-min session with our engineers to discuss how Memgraph fits with your architecture. Our engineers are highly experienced in helping companies of all sizes to integrate and get the most out of Memgraph in their projects. Talk to us about data modeling, optimizing queries, defining infrastructure requirements or migrating from your existing graph database. No nonsense or sales pitch, just tech.