Memgraph logo
Back to blog
Improve Query Execution Performance

Improve Query Execution Performance

September 7, 2023
Vlasta Pavicic

Memgraph is ingeniously designed for optimal in-memory performance. While it ensures robust ACID compliance and accommodates diverse graph structures and queries, there's always potential to tap into even faster performances. To help you leverage this potential, we've compiled a set of tips and tricks to further boost the query execution in Memgraph.

Faster import

Memgraph offers a versatile suite of data import options. Whether you're importing from JSON files, constructing graph objects via CYPHERL file queries, connecting to a data stream, or integrating an existing NetworkX or DGL graph, Memgraph has you covered. However, for optimal speed, importing from CSV files using the LOAD CSV query is recommended.

To further enhance the speed of CSV data imports, consider leveraging these three Memgraph features:

  1. Creating indexes (this is not so much of a consideration, but a MUST)
  2. Importing from a compressed CSV file
  3. Switching to analytical storage mode

Create indexes

The LOAD CSV clause in Memgraph is a powerful tool for data import. You can significantly boost its efficiency by setting up indexes on nodes or their properties before you create relationships.

CREATE INDEX ON :Node(id);

It's particularly important to note that if you're merging data with the LOAD CSV clause, establishing indexes before running the clause ensures you're optimizing the data integration process, resulting in smoother and faster imports.

Import from a compressed CSV file

The LOAD CSV clause in Memgraph is proficiently designed to handle CSV files compressed using popular algorithms, gzip or bzip2. The use of compressed files has several advantages.

Firstly, they occupy significantly less storage space, ensuring that your disk resources are used wisely. Secondly, since compressed files are smaller in size, they can be transferred or fetched more rapidly from different storage locations or over networks. This reduced transfer time, combined with Memgraph's efficient handling of compressed files, ensures a more streamlined and quicker data loading process.

So, by simply compressing your CSV files with gzip or bzip2 before importing, you can potentially shave off a considerable amount of time from the overall data integration process, making your operations more efficient.

Switch to analytical storage mode

Memgraph offers flexibility in how data is stored and accessed by supporting three distinct storage memory modes:

  • In-memory transactional
  • In-memory analytical
  • On-disk transactional

In-memory analytical mode is optimized for speed and memory efficiency. Memgraph forgoes ACID guarantees in this mode, except for those provided by manually created snapshots. The result? A significant boost in import speed—up to six times faster—while consuming six times less memory. It's ideal for swiftly importing large datasets.

Post-import, you have the choice to either revert to transactional mode to enable ACID guarantees or continue using the analytical mode. If opting for the latter, you can run analytical queries without modifying the data. However, it's crucial to note a couple of things: automated backups aren't generated, and transactions may observe changes from other ongoing transactions which could lead to data corruption and lack of consistency.

Switch between the modes during a session by executiong the following query:

STORAGE MODE IN_MEMORY_{TRANSACTIONAL|ANALYTICAL};

Always ensure you select the most suitable storage mode for your specific needs and use case.

storage modes

Analyze graph

When you've indexed multiple properties, the database doesn't innately know which index is the most efficient for a given query. By default, it tends to choose the index associated with fewer nodes. This methodology, while efficient in certain situations, can sometimes lead the database to pick a non-optimal index based solely on node count.

Similarly, sub-optimal query executions can occur when executing MERGE queries on a supernode. The best practice here is to initiate the MERGE from a node with a lower degree relative to the connecting node, but the database has no knowledge of the nodes’ degree so it can’t use the best practice.

To enhance the database's awareness of the graph structure, consider executing the ANALYZE GRAPH; query before running queries that intersect with multiple property indexes or involve supernodes. This action prompts the database to review and compute specific graph properties, enabling it to make more informed decisions regarding indexes or MERGE transaction optimization.

Here's a practical illustration: A proper comprehension of index structure allows the database to pinpoint optimal indexes, streamlining the match-finding process through reduced iterations. While analyzing the graph might consume some initial time and memory resources, it's a worthy investment. Remember, a graph requires a single thorough analysis to bolster numerous subsequent queries.

execute queries

Inspect query plans

Before a Cypher query is executed, it undergoes a transformation into a format suited for query execution. This transformation results in what we refer to as a query execution plan plan. Think of this query execution plan as a conduit or pipeline. It's essentially a sequence of logical operators that process data step by step, moving it from one operator to the next. To peek under the hood and understand how your query will be executed, prefix your query with the EXPLAIN clause. This lets you view the generated plan, providing a clearer understanding of the query execution pathway. For instance:

EXPLAIN MATCH (n :Node)-[:Edge]-(m :Node) WHERE n.prop = 42 RETURN *;

query plan

The produced plans are read bottom-up. In the example above, the database utilizes two primary logical operators: Expand and Filter. The Expand operator identifies an edge between two nodes, specifically m and n, which were previously singled out using the ScanAllByLabel logical operator. The Filter operator is responsible for sifting through the matched nodes based on the condition WHERE n.prop = 42. Therefore, the query execution unfolds in the following stages:

  • ScanAllByLabel: Two nodes of type :Node are identified through the dual scans.
  • Expand: The engine seeks a path composed of these two nodes and an interconnecting edge.
  • Filter: If such a path is discovered, it's further scrutinized based on one node’s property.
  • Produce: Paths that pass the filtering criteria have their constituent nodes added to the query results.

To get a more granular analysis of a query execution, use the PROFILE clause, where clause provides a detailed performance report for the query plan. Each logical operator is detailed with:

  • OPERATOR: The operator's designation, mirroring its name in an EXPLAIN query result.
  • ACTUAL HITS: The frequency with which a specific logical operator was used.
  • RELATIVE TIME: The time dedicated to a specific logical operator, proportionate to the entire plan's execution duration.
  • ABSOLUTE TIME: The exact duration spent processing a specific logical operator.
  • For the previously mentioned example, the profiling data would appear as follows:

PROFILE MATCH (n :Node)-[:Edge]-(m :Node) WHERE n.prop = 42 RETURN *;

query plan

By examining the query execution plan and its behavior, you can refine the query to utilize more efficient operators, thereby enhancing the overall performance of the query execution.

In retrospect

From understanding the intricacies of storage memory modes to leveraging the power of indexed properties and compressed CSVs, we've delved deep into techniques that optimize data retrieval and manipulation. With tools like the EXPLAIN and PROFILE commands, you can gain invaluable insights into the inner workings of their queries, ensuring that data is not just accessed, but accessed smartly and efficiently. As you navigate your data adventures with Memgraph, keep these insights in mind, experiment with the tools at hand, and always strive for optimization. The true power of a graph database is realized when it's not just used, but used optimally. Happy graphing!

Join us on Discord!
Find other developers performing graph analytics in real time with Memgraph.
© 2024 Memgraph Ltd. All rights reserved.