Deploy Memgraph on AWS EC2 instance

This guide will show you how to deploy Memgraph on an AWS EC2 instance. The guide will cover only the specific bits that are different from the general deployment guide you can find for native Linux or Docker deployments.

This guide assumes you have an AWS account and are familiar with the AWS console. If you are not, you can follow the AWS EC2 documentation.

Creating a new EC2 instance

The first step is to have an EC2 instance running on AWS, where you will deploy Memgraph. Here are some guidelines for opening a new EC2 instance for Memgraph.

Picking the OS

When creating the EC2 instance, you need to pick the OS you wish to have on the instance. If you are going to use the Memgraph native installation, you need to pick a Linux distribution that Memgraph supports.

Memgraph supports multiple Linux distributions. Memgraph packages for supported distributions and versions can be downloaded from the following page

Running Memgraph natively will bring some speed improvements compared to the Docker version of Memgraph. However, deploying Memgraph with Docker is a more straightforward approach since it comes with built-in Memgraph MAGE algorithms. Memgraph MAGE contains graph algorithms and utility modules written in C++, Python and Rust. If you decided to run Memgraph natively, then you need to build MAGE from source, which requires manual work. For native deployment, check the guide on how to build Memgraph MAGE algorithms from source. If you’re trying out Memgraph for the first time, and running your own benchmarks against it, Docker is the recommended way to run Memgraph, as it accelerates the time to value.

If you are going to use the Memgraph Docker image, pick the Linux you are most familiar with.

Memgraph in Docker can be deployed both on x86 and ARM architecture. All native distributions work on x86 architecture. Some of the native distributions also work on ARM architecture (Debian, Ubuntu), some do not (Centos, Fedora). Check the direct download links for detailed information.

Picking the instance type

When creating the EC2 instance, you need to pick the instance type. If you run Memgraph in IN_MEMORY_TRANSACTIONAL (default mode) or in IN_MEMORY_ANALYTICAL storage mode, all data is stored in RAM. That means it would be good first to calculate how much memory you will need. There is also an easy-to-use calulator for approximating memory usage. The good rule of thumb is for your server to have double the memory of what your storage was calculated to be. You can go with less if you do not have a demanding workload. A demanding workload would be a query that traverses half of the graph and returns half of the graph.

AWS has specialized instances for memory-optimized workloads which are designed for in-memory databases, data analytics, and other memory-intensive applications. They provide a good cost per GB of RAM and performance for Memgraph. Once you know how much memory you need, you can pick the instance from the memory-optimized instance pools based on your data needs and budget.

Instances vary based on the supported architectures, number of CPU cores, network bandwidth, block storage, etc. All hardware specs typically scale with the instance size. Memgraph is not demanding on the rest of the hardware specs as long as there is sufficent RAM. The basic R5 or X1 instances families are good starting points for Memgraph.

If you are running Memgraph in ON_DISK_TRANSACTIONAL storage mode, you need to consider the instances optimized for storage.

System configuration

Before running Memgraph, please check the system configuration guidelines, especially the vm.max_map_count parameter setting.

Setup the network

When creating the EC2 instance, you need to set up the network access and security group. By default, Memgraph uses port 7687 for the Bolt protocol. You need to open this port for TCP traffic and to allow connections to Memgraph.

If you change the default Bolt port, make sure to update the security group accordingly.

Also, if deploying Memgraph for Replication or High Availability, the ports for the replication and cluster management should also be open. In replication configuration each instance needs to have an open port 10000 for the replication. In a high availability configuration, each data instance needs to have an open port of 10000 for management and a port of 20000 for replication. Each coordinator instance*, needs to have open port 12000.

All instances need to have open port 7687 for the Bolt protocol.

Setup the storage

When creating the EC2 instance, you need to set up the storage. By default Memgraph stores all data to working RAM, but for the persistency between restarts Memgraph uses the disk storage to store snapshots, configurations, etc.

It is recommended to use the EBS volumes for storage. EBS volumes are durable, scalable, and can be attached to the EC2 instance. They will be persistent even if the EC2 instance is stopped and restarted. Local storage can be used, but it is not recommended for production use cases since data loss can occur if the instance is stopped or terminated.

Faster storage (SSDs) can lead to speedier snapshot creation and recovery times, which can be important on bigger scales (billion-sized graphs). Still, it is not critical for the Memgraph operating performance. Magnetic storage can also be used
on smaller scales.

The storage size depends on the amount of data you are going to store in your Memgraph instance and the number of active snapshots you want to keep alive. Memgraph will periodically create snapshots of the data and store them on disk. As the new snapshot is created, the oldest one is deleted. The number of snapshots you want to keep alive is configurable in the Memgraph configuration file.

The recommendation is to double the storage of the data you will store in Memgraph. If you are going to store 100GB of data in Memgraph, you should have at least 200GB of disk storage.

Installing Memgraph

Depending on the way you want to deploy Memgraph, native or via Docker, you need to follow the steps below:

If you will use the Memgraph Docker image, you need to have Docker installed on your EC2 instance.

After Docker is installed, you can pull the Memgraph image and run it:

docker run -p 7687:7687 memgraph/memgraph:latest

This will run Memgraph on the default port 7687. You should be able to connect to it via Memgraph Lab or via client libraries using the Bolt protocol. If you are experiencing issues while connecting to Memgraph remotely, make sure that the port 7687 is open in the security group of your EC2 instance.

For more information on how to install Memgraph via Docker, you can follow the getting started guide.

Manage Memgraph deployment

After Memgraph is installed and running on your EC2 instance, Memgraph management in AWS is identical to the general guidelines that are described in the form of a native Linux Memgraph or Docker container Memgraph.

Depending on what you are using, you can follow the Linux or Docker deployment guide for more information on how to manage the Memgraph deployment.

Where to next?

Memgraph also supports deployment in the Kubernetes environment. If you are interested in deploying Memgraph in Kubernetes, you can follow the Memgraph Kubernetes installation guide.

To discuss AWS deployment and similar topics, join our Discord community.

Schedule a 30-min session with our engineers to discuss how Memgraph fits with your architecture. Our engineers are highly experienced in helping companies of all sizes to integrate and get the most out of Memgraph in their projects. Talk to us about data modeling, optimizing queries, defining infrastructure requirements or migrating from your existing graph database. No nonsense or sales pitch, just tech.

Book a call

Linux GCP