Deploy Memgraph on AWS EC2 instance
This guide will show you how to deploy Memgraph on an AWS EC2 instance. The guide will cover only the specific bits that are different from the general deployment guide you can find for native Linux or Docker deployments.
This guide assumes you have an AWS account and are familiar with the AWS console. If you are not, you can follow the AWS EC2 documentation (opens in a new tab).
Creating a new EC2 instance
The first step is to have an EC2 instance running on AWS, where you will deploy Memgraph. Here are some guidelines for opening a new EC2 instance for Memgraph.
Picking the OS
When creating the EC2 instance, you need to pick the OS you wish to have on the instance. If you are going to use the Memgraph native installation, you need to pick a Linux distribution that Memgraph supports.
Memgraph supports multiple Linux distributions. Memgraph packages for supported distributions and versions can be downloaded from the following page
Running Memgraph natively will bring some speed improvements compared to the Docker version of Memgraph.
If you are going to use the Memgraph Docker image, pick the Linux you are most familiar with.
Memgraph in Docker can be deployed both on x86 and ARM architecture. All native distributions work on x86 architecture. Some of the native distributions also work on ARM architecture (Debian, Ubuntu), some do not (Centos, Fedora). Check the direct download links for detailed information.
Picking the instance type
When creating the EC2 instance, you need to pick the instance type. If you run
Memgraph in IN_MEMORY_TRANSACTIONAL
(default mode) or in
IN_MEMORY_ANALYTICAL
storage mode, all data is stored in RAM. That means it
would be good first to calculate how much memory you will
need. There
is also an easy-to-use
calulator (opens in a new tab) for approximating
memory usage. The good rule of thumb is for your server to have double the memory of what your
storage was calculated to be. You can go with less if you do not have a demanding
workload. A demanding workload would be a query that traverses half of the graph
and returns half of the graph.
AWS has specialized instances for memory-optimized workloads (opens in a new tab) which are designed for in-memory databases, data analytics, and other memory-intensive applications. They provide a good cost per GB of RAM and performance for Memgraph. Once you know how much memory you need, you can pick the instance from the memory-optimized instance pools based on your data needs and budget.
Instances vary based on the supported architectures, number of CPU cores, network bandwidth, block storage, etc. All hardware specs typically scale with the instance size. Memgraph is not demanding on the rest of the hardware specs as long as there is sufficent RAM. The basic R5 or X1 instances families are good starting points for Memgraph.
If you are running Memgraph in ON_DISK_TRANSACTIONAL
storage mode, you need to
consider the instances optimized for
storage (opens in a new tab).
Setup the network
When creating the EC2 instance, you need to set up the network access and security group (opens in a new tab). By default, Memgraph uses port 7687 for the Bolt protocol. You need to open this port for TCP traffic and to allow connections to Memgraph.
If you change the default Bolt port, make sure to update the security group accordingly.
Also, if deploying Memgraph for Replication or High Availability, the ports for the replication and cluster management should also be open. In replication configuration each instance needs to have an open port 10000 for the replication. In a high availability configuration, each data instance needs to have an open port of 10000 for management and a port of 20000 for replication. Each coordinator instance*, needs to have open port 12000.
All instances need to have open port 7687 for the Bolt protocol.
Setup the storage
When creating the EC2 instance, you need to set up the storage. By default Memgraph stores all data to working RAM, but for the persistency between restarts Memgraph uses the disk storage to store snapshots, configurations, etc.
It is recommended to use the EBS volumes for storage (opens in a new tab). EBS volumes are durable, scalable, and can be attached to the EC2 instance. They will be persistent even if the EC2 instance is stopped and restarted. Local storage can be used, but it is not recommended for production use cases since data loss can occur if the instance is stopped or terminated.
Faster storage (SSDs) can lead to speedier snapshot creation and recovery times,
which can be important on bigger scales (billion-sized graphs). Still, it is not
critical for the Memgraph operating performance. Magnetic storage can also be
used
on smaller scales.
The storage size depends on the amount of data you are going to store in your Memgraph instance and the number of active snapshots you want to keep alive. Memgraph will periodically create snapshots of the data and store them on disk. As the new snapshot is created, the oldest one is deleted. The number of snapshots you want to keep alive is configurable in the Memgraph configuration file.
The recommendation is to double the storage of the data you will store in Memgraph. If you are going to store 100GB of data in Memgraph, you should have at least 200GB of disk storage.
Installing Memgraph
Depending on the way you want to deploy Memgraph, native or via Docker, you need to follow the steps below:
If you will use the Memgraph Docker image, you need to have Docker installed on your EC2 instance.
After Docker is installed, you can pull the Memgraph image and run it:
docker run -p 7687:7687 memgraph/memgraph:latest
This will run Memgraph on the default port 7687. You should be able to connect to it via Memgraph Lab or via client libraries using the Bolt protocol. If you are experiencing issues while connecting to Memgraph remotely, make sure that the port 7687 is open in the security group of your EC2 instance.
For more information on how to install Memgraph via Docker, you can follow the getting started guide.
Manage Memgraph deployment
After Memgraph is installed and running on your EC2 instance, Memgraph management in AWS is identical to the general guidelines that are described in the form of a native Linux Memgraph or Docker container Memgraph.
Depending on what you are using, you can follow the Linux or Docker deployment guide for more information on how to manage the Memgraph deployment.
Where to next?
Memgraph also supports deployment in the Kubernetes environment. If you are interested in deploying Memgraph in Kubernetes, you can follow the Memgraph Kubernetes installation guide.
To discuss AWS deployment and similar topics, join our Discord community (opens in a new tab).
Schedule a 30-min session with our engineers to discuss how Memgraph fits with your architecture. Our engineers are highly experienced in helping companies of all sizes to integrate and get the most out of Memgraph in their projects. Talk to us about data modeling, optimizing queries, defining infrastructure requirements or migrating from your existing graph database. No nonsense or sales pitch, just tech.