General suggestions
This section provides guidance for getting started with Memgraph, regardless of the specific workload you plan to test. It’s ideal for those evaluating Memgraph for the first time or working with a simple dataset. Once you determine the workload that best fits your data and use case, you can explore the more targeted guides in the Memgraph in Production series for tailored recommendations.
What is covered?
The general suggestions cover the following key areas:
-
Hardware requirements for running Memgraph
Learn how to select the best machine for Memgraph based on your resources, whether on-prem, in a data center, or via cloud offerings (e.g., AWS, GCP, Azure). -
Hardware sizing
Since Memgraph is an in-memory database, estimating RAM requirements is crucial. This section helps you allocate memory based on dataset size and expected workloads. -
Hardware configuration
Optimize your host machine configuration for Memgraph to run smoothly, including key parameters for effective operation. -
Networking configuration
Learn the best ways to configure networking to enable Memgraph’s interaction with the outside world and ensure smooth communication with external systems or users. -
Deployment options
Understand the tradeoffs between running Memgraph natively on a host machine or in a containerized environment (Docker, K8s) to choose the best deployment method. -
Choosing the right Memgraph flag set
Memgraph offers a variety of configuration flags for performance, persistence, and other features. This section guides you on setting the right flags based on your use case. -
Choosing the right Memgraph storage mode
Memgraph currently supports two in-memory storage modes - IN_MEMORY_TRANSACTIONAL and IN_MEMORY_ANALYTICAL - as well as one disk-based storage mode: ON_DISK_TRANSACTIONAL. In this section, you’ll find guidance on choosing the most suitable storage mode based on your specific use case. -
Backup considerations
Learn about how to preserve your data in Memgraph to prevent any data loss. -
Importing mechanisms
Discover the best methods for importing your dataset into Memgraph, including Cypher queries, bulk loading, and integrations with other data sources. -
Enterprise features you might require
Memgraph offers a suite of enterprise-grade features that enhance scalability, security, and manageability. Key features include role-based access control (RBAC), advanced monitoring tools high availability cluster setups, multitenancy, and more. These features ensure that your data is secure, available, and that Memgraph can scale to meet the demands of enterprise workloads. -
Queries that best suit your workload
The type of queries you use can significantly affect performance, especially as the dataset grows or the workload complexity increases. For general use cases, simple Cypher queries are sufficient, but as your workload scales, more advanced query optimization techniques are necessary. Also, a different set of queries is needed based on the use case, which will be covered in the specific use case sections.
Hardware requirements
To get started with Memgraph, we recommend checking the system requirements listed in our installation guide. These requirements are sufficient for setting up Memgraph on your own servers.
If you’re deploying Memgraph using a cloud provider, visit the deployment section for guidance on choosing the appropriate instance type for your specific cloud environment.
Hardware sizing
The most critical factor in provisioning a server for Memgraph is RAM. Memgraph operates primarily in in-memory mode, meaning the entire dataset is loaded into RAM for optimal performance. Properly sizing your RAM is essential to ensuring your system runs efficiently and can scale with your workload.
For a deeper dive into how memory usage is calculated, refer to our memory storage documentation. Below is a simplified, rule-of-thumb approach to help you estimate your memory needs.
Memory components
Memgraph’s memory usage consists of five main parts:
-
Node memory
Each node requires approximately 128 bytes. -
Relationship memory
Each relationship requires approximately 120 bytes. -
Property storage
Properties are stored in buffered arrays. The memory needed depends on property types and counts. -
Indices
Additional overhead is introduced by indexing node labels, edge types, and properties. -
Query execution memory
Temporary memory required to execute queries. This varies depending on the complexity of queries.
Items 1–4 are referred to as memory at rest, while query execution memory is often called compute memory.
RAM sizing guidelines
Use the following steps to estimate the RAM required for your workload:
Estimate base graph storage
For datasets with minimal properties (3–5 small properties), use this formula:
128 * N + 120 * R
Where:
N
= number of nodesR
= number of relationships
The result gives you the dataset size in bytes. The size of properties is omitted as it will not impact the sizing.
Don’t know how many nodes or relationships you have?
If your data is in a non-graph format (e.g., CSV), estimating the number of nodes and relationships isn’t always straightforward.
For example, one line in a CSV file could represent a node, a relationship, or a mix of both. In some cases, nodes and relationships are
split into separate CSV files.
In such cases, we recommend importing a sample of the dataset (around 10%) into Memgraph and using the memory usage for that portion to extrapolate the estimated requirements for the full dataset. This empirical method provides a more accurate and context-aware estimate based on how your data is actually represented as a graph.
Estimate property storage
If your dataset includes many or complex properties, refer to the memory storage documentation for detailed calculations. Add this result to the base graph size from Step 1.
Add index overhead
- If you use fewer than 10 indices, this can be skipped.
- If you use many indices (e.g., 50+), add ~20% memory overhead to the total from Steps 1 and 2.
Add query execution memory
- For basic querying without graph algorithms: 1.5× multiplier
- For analytical workloads with algorithms (e.g., PageRank, Community Detection, Betweenness Centrality): 2× multiplier
Final recommendation
Once you’ve completed these steps:
- Round up your estimate to match available server configurations.
- Example: If your estimate is 96GB, choose a server with 128GB RAM for safe headroom.
Still having problems with estimating the size of your instance? Try out our official storage calculator, or contact us on Discord!
Hardware configuration
One of the most important system settings when running Memgraph is configuring the kernel parameter vm.max_map_count
.
This setting ensures that the system can allocate enough virtual memory areas, which is critical for avoiding memory-related
issues or unexpected crashes during Memgraph operations.
You can find detailed setup instructions and context in our system configuration documentation.
If you’re deploying Memgraph on Kubernetes, our Helm charts include an init container that automatically sets vm.max_map_count
during startup. However, this container requires root privileges to execute. If you’re running in a restricted environment
or prefer not to use privileged containers, you’ll need to manually configure this parameter on the host machine.
Properly configuring vm.max_map_count
is a one-time setup but essential for a stable and performant Memgraph deployment.
For system-specific configuration steps and installation guidelines, refer to the Install Memgraph guide.
Networking configuration
To ensure Memgraph functions properly in your environment, make sure the following ports are open and accessible on your server:
- 7687 – Used for the Bolt protocol, which handles all client-to-database communication.
- 3000 – Required if you’re using Memgraph Lab as a visual interface.
- 7444 – Needed for streaming logs from Memgraph to Memgraph Lab.
- 9091 – Used for system metrics, an Enterprise-only feature. If you’re using Prometheus or some other system for tracking system metrics, this port needs to be enabled.
In addition to enabling these ports, be sure to:
- Review and adjust firewall settings to allow traffic on the listed ports.
- On Red Hat-based systems, even with the firewall properly configured, you might need to disable SELinux, as it can block Memgraph’s access to system resources.
Deployment options
Memgraph can be deployed in two main ways: natively as a .deb
or .rpm
package on various Linux distributions,
or containerized using Docker on any operating system. While native installation can offer up to 10% better performance,
we generally recommend using the containerized version—whether via Docker, Kubernetes manifests, or Helm charts.
When deploying via Helm, users can choose between:
- The Memgraph Standalone Helm chart – ideal if you’re working with a single instance setup.
- The Memgraph High Availability Helm chart – designed for production deployments that require redundancy and failover with multiple instances.
The containerized deployment comes pre-packaged with all MAGE algorithms and utility tools, eliminating the need for additional setup. In contrast, native installations require users to compile C++ modules and manually install Python packages to enable the full range of MAGE functionalities—an often complex and error-prone process for most users.
For most use cases, especially during prototyping and production readiness, the containerized image provides the best balance between performance, simplicity, and feature completeness.
More details and installation instructions can be found in the Memgraph deployment guide.
Choosing the right Memgraph flag set
Memgraph offers a variety of configuration flags to tailor its behavior based on your environment and use case. Here are the most important flags to consider:
Logging configuration
-
--log-level
This flag controls the granularity of Memgraph’s logs. In production environments, setting it toINFO
orDEBUG
is typically sufficient. For diagnostics or experimentation, usingTRACE
provides highly detailed logs that can help in troubleshooting—but keep in mind this will significantly increase disk usage. Memgraph does not automatically delete old logs, so you’ll need to manage log retention manually. This flag can be adjusted at runtime using theSET DATABASE SETTING 'log.level' TO 'value'
command. -
--also-log-to-stderr=true
Enables logging to standard error in addition to log files.
Optionally, users can manually stream the logs to another system such as Splunk, or use a containerized setup where logs are collected from standard output streams.
Data recovery configuration
--storage-parallel-schema-recovery=true
and--storage-recovery-thread-count=x
These flags enable parallel recovery of the schema during startup. Setx
to the number of cores you want to dedicate for recovery. This significantly speeds up the time it takes for Memgraph to become operational after a restart.
Memory limit configuration
--memory-limit=x
(in MiB)
This flag defines the maximum memory Memgraph can use. By default, it’s set to 90% of available system memory, but if you’re running other processes on the same machine (like Memgraph Lab or others), it’s advisable to manually lower this limit. On Kubernetes, node memory is often shared with system components like kubelets, so the flag should be set to a value lower than the actual available memory on the node.
Query timeout configuration
-
--query-execution-timeout-sec=0
This flag disables query timeouts entirely. By default, Memgraph limits query execution time to 600 seconds. If you’re performing large data imports—such as usingLOAD CSV
or themigrate()
function—it can be helpful to temporarily set this flag to0
to avoid interruptions.Once the import is complete, you can restart Memgraph without the flag if your regular workloads don’t require extended execution times (e.g., during daily batch jobs).
-
Runtime alternative (Cypher)
You can also configure the query timeout dynamically at runtime using:SET DATABASE SETTING "query.timeout" TO "0";
This is particularly useful in environments where you don’t want to restart Memgraph but still need to adjust timeout behavior for specific workloads.
For more information on how to configure Memgraph, as well as what are all the flags that Memgraph can be configured on, please check out the configuration documentation page.
Choosing the right Memgraph storage mode
Memgraph currently supports two fully-featured and production-ready storage modes:
IN_MEMORY_TRANSACTIONAL
IN_MEMORY_ANALYTICAL
A third mode, ON_DISK_TRANSACTIONAL
, is still experimental and lacks many production-grade features.
While it’s part of Memgraph’s long-term roadmap, it’s not recommended for production use at this stage.
For all current production deployments, users are strongly encouraged to choose either the in-memory transactional or in-memory analytical mode, based on the nature of their workload.
You can set the storage mode in two ways:
- Via a Cypher query:
STORAGE MODE IN_MEMORY_TRANSACTIONAL; -- or -- STORAGE MODE IN_MEMORY_ANALYTICAL;
- Or by using a configuration flag at startup:
--storage-mode=IN_MEMORY_TRANSACTIONAL --storage-mode=IN_MEMORY_ANALYTICAL
Which mode should you choose?
-
✅ Transactional Mode (
IN_MEMORY_TRANSACTIONAL
)
Ideal for mission-critical workloads requiring ACID guarantees, replication, and high availability. This mode supports safe concurrent access, durability, and rollback on failure.
Keep in mind that write-write conflicts can occur and may need to be retried on the driver side. -
🚀 Analytical Mode (
IN_MEMORY_ANALYTICAL
)
Suited for read-only workloads or on-demand graph analytics where you need multithreaded ingestion at scale. This mode supports extremely high write throughput, even reaching millions of writes per second, thanks to parallelized graph construction.
However, it does not support aborting or rolling back transactions, as it does not track changes via delta objects. Once a write is made, it becomes part of the current state.
Rule of thumb:
Use analytical mode for high-speed, read-heavy, or bulk-ingest scenarios.
Use transactional mode for anything requiring reliability, consistency, or fault tolerance.
For more information about the implications of Memgraph storage mode offerings, please check out the storage mode documentation.
Backup considerations
Ensuring data durability and having a solid backup strategy is essential for any production deployment. Memgraph provides built-in mechanisms for creating snapshots and write-ahead logs (WALs) to help you recover from failures or restarts.
By default, Memgraph retains the latest 3 snapshots, but this can be adjusted using the --storage-snapshot-retention-count=x
flag. Older snapshots beyond this limit are automatically deleted to manage storage space.
During recovery, Memgraph first restores the latest snapshot, followed by a replay of the WALs:
- Snapshot recovery can be parallelized for faster startup using:
--storage-parallel-schema-recovery=true
--storage-recovery-thread-count=x
(wherex
is the number of threads used)
- WALs are recovered sequentially, and this step cannot yet be parallelized.
🔒 Memgraph does not yet support automatic backups to third-party storage solutions like AWS S3 or GCP buckets. For now, users are encouraged to implement manual redundancy by syncing snapshots to external locations using tools such as rclone, which is already in use by several Memgraph customers for this purpose.
For more detailed information, refer to the following docs:
Importing mechanisms
Memgraph supports a variety of data importing mechanisms to help you efficiently bring your data into the graph. Whether you’re working with CSV files, JSON streams, Kafka topics, or external data sources, choosing the right import strategy is key to a smooth migration.
We strongly encourage users to review the best practices for data migration, which cover recommendations and tips to ensure a reliable and performant data import process tailored to Memgraph.
Enterprise features you might require
Memgraph provides a rich set of enterprise-grade features designed to support production workloads at scale. These include:
- High Availability (HA) and automatic failover for fault tolerance
- System metrics monitoring for observability and performance tracking
- Role-based and fine-grained access control for secure, multi-user environments
- Multi-tenancy for isolating and managing separate workloads within the same infrastructure
- Advanced security features to meet compliance and operational requirements
There are additional enterprise capabilities as well, tailored for advanced performance, scalability, and security needs. We recommend exploring the other chapters in the “Memgraph in Production” guide series, as they highlight how these features can be aligned with your specific use cases.
Memgraph Enterprise License is enabled by issuing the following queries:
SET DATABASE SETTING 'organization.name' TO 'Organization';
SET DATABASE SETTING 'enterprise.license' TO 'License';
However, the recommended way of specifying the Enterprise License would be through environment variables through Docker or making it visible to the process (if you’re deploying Memgraph natively).
MEMGRAPH_ORGANIZATION_NAME=Organization
MEMGRAPH_ENTERPRISE_LICENSE=License
The reason for that is that environment variables always override any system settings that are set via queries.
Additionally, please check out how to set up the Memgraph Lab Enterprise license.
For more information about what Enterprise features are included with Memgraph Enterprise License, please check out the section on Memgraph Enterprise enablement.
Queries that best suit your workload
Memgraph fully supports the Cypher query language, making it easy to express complex graph patterns. In addition to Cypher, Memgraph has built-in path traversal capabilities at the core of the database, enabling lightning-fast traversals optimized for performance-critical use cases. You can learn more about these in our Deep path traversal guide.
For advanced analytics and utility functions, Memgraph also ships with the MAGE library, which includes a wide range of pre-built graph algorithms and procedures for tasks like community detection, centrality scoring, node similarity, and more.
We encourage users to explore the other guides in the Memgraph in Production series, where you’ll find detailed examples and recommendations on which types of queries are most effective based on your specific workload and use case.