Memgraph in productionMemgraph in high-throughput workloads

Memgraph in high-throughput workloads

👉 Start here first
Before diving into this guide, we recommend starting with the General suggestions page. It provides foundational, use-case-agnostic advice for deploying Memgraph in production.

This guide builds on that foundation, offering additional recommendations tailored to specific workloads. In cases where guidance overlaps, consider the information here as complementary or overriding, depending on the unique needs of your use case.

Is this guide for you?

This guide is for you if you’re working with high-throughput graph workloads where performance, consistency, and scale are critical. You’ll benefit from this content if:

  • ⚡ You’re handling more than a thousand writes per second, and your graph data is constantly changing at high velocity.
  • 🔍 You want your read performance to remain consistent, even as new data is continuously ingested.
  • 🔁 You’re dealing with high volumes of concurrent reads and writes, and need a database that can handle both without performance degradation.
  • 🌊 Your data is flowing in from real-time streaming systems like Kafka, and you need a database that can keep up.

If this sounds like your use case, this guide will walk you through how to configure and scale Memgraph for reliable, high-throughput performance in production.

Why choose Memgraph for high-throughput use cases?

When your workload involves thousands of writes per second and concurrent access to ever-changing graph data, Memgraph provides the performance and architecture needed to keep up—without compromise.

Here’s why Memgraph is a great fit for high-throughput use cases:

  • In-memory storage engine: Memgraph operates entirely in-memory, eliminating the need to write to disk on every transaction. This allows it to scale write throughput far beyond traditional disk-based databases. Unlike systems that rely on LRU or OS-level caching, where cache invalidation can degrade read performance during heavy writes, Memgraph offers predictable read latency even under constant data changes.

    While many graph databases max out around 1,000 writes per second, Memgraph can handle up to 50x more (see image below), making it ideal for high-velocity, write-intensive workloads.

  • Non-blocking reads and writes with MVCC: Built on multi-version concurrency control (MVCC), Memgraph ensures that writes don’t block reads and reads don’t block writes, allowing each to scale independently.

  • Fine-grained locking : Locking happens at the node and relationship level, enabling highly concurrent writes and minimizing contention across threads.

  • Lock-free skiplist storage: Memgraph uses lock-free, concurrent skip list structures for storing nodes, relationships, and indices, leading to faster data access and minimal coordination overhead between threads.

  • Snapshot isolation by default: Unlike many databases that rely on read-committed isolation (which can return inconsistent data), Memgraph provides snapshot isolation, ensuring data accuracy and consistency in real-time queries.

  • Inter-query parallelization: Each read and write query is handled on its own CPU core, meaning Memgraph can scale horizontally on a single machine based on your hardware.

  • Horizontal read scaling with high availability: Memgraph supports replication and high availability, allowing you to distribute read traffic across multiple replicas. These replicas can also power secondary workloads like GraphRAG, analytics, or ML pipelines, without affecting the performance of the main write-intensive instance.

What is covered?

The suggestions for high-throughput workloads complement several key sections in the general suggestions guide. These sections offer important context and additional best practices tailored for performance, stability, and scalability in high-throughput systems:

Choosing the right Memgraph flag set

When streaming data from systems like Kafka, the incoming payload is often standardized, meaning that even when a node or relationship is updated, some property values might remain unchanged.

By default, Memgraph sets the flag --storage-delta-on-identical-property-update=true, which updates all properties of a node or relationship during an update, even if the new value is identical to the existing one. This can introduce unnecessary write overhead.

To optimize for higher throughput in scenarios where most incoming updates do not change all property values, it’s recommended to set:

--storage-delta-on-identical-property-update=false

With this setting, Memgraph will only create delta records for properties that have actually changed, reducing internal write operations and improving overall system throughput—especially important in high-velocity streaming use cases.

Choosing the right Memgraph storage mode

High-throughput scenarios in Memgraph can run effectively on both IN_MEMORY_TRANSACTIONAL and IN_MEMORY_ANALYTICAL storage modes, depending on your specific workload requirements.

If your workload meets the following conditions:

  • You are updating properties on existing nodes and relationships,
  • You are appending new nodes and relationships to the graph,
  • You are not performing deletes,

then it may be worth considering switching to IN_MEMORY_ANALYTICAL mode.
This mode allows writes to be multithreaded, unlocking near limitless write speeds by parallelizing ingestion across CPU cores.

However, keep in mind:

  • If you require replication, high availability, or ACID guarantees, you must use IN_MEMORY_TRANSACTIONAL mode.
  • IN_MEMORY_ANALYTICAL is optimized for bulk ingestion and read-only analytics, but it does not support transactional rollback, as it doesn’t create delta objects during writes.
    Additionally, WALs (write-ahead logs) are not generated in this mode, meaning recovery relies solely on snapshot creation.

Learn more about storage modes in our documentation.

Importing mechanisms

Memgraph natively supports Apache Kafka and Apache Pulsar consumers to directly ingest data streams. However, depending on the system architecture and flexibility requirements, some users prefer building custom Kafka consumers that programmatically push data into Memgraph.

Handling write-write conflicts

If you are ingesting data from multiple topics or multiple producers, it’s important to handle potential write-write conflicts—especially when running in IN_MEMORY_TRANSACTIONAL mode.
You can learn more about these scenarios in our conflicting transactions documentation.

To safely manage these conflicts:

  • Use a driver that supports managed transactions with automatic retries. For example, Neo4j drivers offer an execute_write() method that automatically retries transient errors (errors that can be safely retried without client-side intervention).
  • Wrap your write operations inside retryable transactions if there are multiple concurrent writers.
  • If you only have a single writer, you generally don’t need to worry about transient errors or retries.

Idempotency concept

In addition, when working with streams, unexpected errors may require replaying the stream. To ensure data consistency even after multiple replays, it’s crucial to make your ingestion logic idempotent.
We strongly recommend using MERGE clauses instead of CREATE when inserting data into Memgraph. MERGE ensures that nodes and relationships are created only if they don’t already exist, preventing duplicates and keeping your graph clean no matter how many times the same event is replayed.

Enterprise features you might require

For production-grade high-throughput deployments, you may need advanced capabilities to ensure availability, data management, and tenant isolation. Memgraph offers several enterprise features designed to support these needs:

  • Replication, high availability, and automatic failover
    If you require your system to be available at all times, Memgraph supports clustering and high availability, allowing you to minimize downtime and recover automatically from failures.

  • Node and relationship TTL (time-to-live)
    In high-ingestion environments, you may need to automatically clean up stale data after a certain retention period. Memgraph supports time-to-live (TTL) mechanisms for both nodes and relationships, ensuring your graph remains manageable and efficient over time.

  • Multi-tenancy
    Some workloads require a separate graph database per customer to ensure strict data isolation and security. Memgraph supports multi-tenancy, enabling you to manage multiple independent graphs within a single Memgraph instance.

Queries that best suit your workload

When ingesting data from streams, it’s best to keep your Cypher queries simple, idempotent, and efficient. A typical ingestion query should look like:

MERGE (n:Label) SET n += $row;

This approach ensures idempotency (safe reprocessing of the same events) and minimizes query execution time by keeping the transaction lightweight.
Keep in mind that adding complex business logic or customization to the ingestion queries will increase query latency, so it’s always a good practice to profile your queries using Memgraph’s PROFILE tool to understand and optimize performance.


Dynamic labels and edge types

Memgraph also supports:

However, dynamic creation is only supported with CREATE operations, and matching or merging dynamically created labels and types is not supported.

If your payload contains dynamic labels or edge types and you still need idempotency, you have two options:

  • Programmatically construct your Cypher query strings based on the payload to ensure correct label/type usage before sending the query to Memgraph.
  • Optionally use the merge procedure from MAGE
⚠️

Note: While MAGE procedures are written in C++ and highly optimized, they still introduce slightly more overhead compared to pure Cypher, as they are executed as external modules. We recommend favoring pure Cypher when possible for the highest performance.

Using convert.str2object for parsing nested properties

When working with streamed data, sometimes your incoming payload contains serialized JSON strings that need to be transformed into property maps inside your graph. Memgraph provides the convert.str2object function to easily handle this scenario.

Example usage:

WITH convert.str2object('{"name":"Alice", "age":30}') AS props
MERGE (n:Person)
SET n += props;

This function parses a JSON-formatted string into a Cypher map, making it very useful for flexible ingestion pipelines where the payload structure might vary slightly or be semi-structured.