Improve Query Execution Performance
Memgraph is ingeniously designed for optimal in-memory performance. While it ensures robust ACID compliance and accommodates diverse graph structures and queries, there's always potential to tap into even faster performances. To help you leverage this potential, we've compiled a set of tips and tricks to further boost the query execution in Memgraph.
Faster import
Memgraph offers a versatile suite of data import options. Whether you're importing from JSON files, constructing graph objects via CYPHERL file queries, connecting to a data stream, or integrating an existing NetworkX or DGL graph, Memgraph has you covered. However, for optimal speed, importing from CSV files using the LOAD CSV
query is recommended.
To further enhance the speed of CSV data imports, consider leveraging these three Memgraph features:
- Creating indexes (this is not so much of a consideration, but a MUST)
- Importing from a compressed CSV file
- Switching to analytical storage mode
Create indexes
The LOAD CSV
clause in Memgraph is a powerful tool for data import. You can significantly boost its efficiency by setting up indexes on nodes or their properties before you create relationships.
CREATE INDEX ON :Node(id);
It's particularly important to note that if you're merging data with the LOAD CSV clause, establishing indexes before running the clause ensures you're optimizing the data integration process, resulting in smoother and faster imports.
Import from a compressed CSV file
The LOAD CSV clause in Memgraph is proficiently designed to handle CSV files compressed using popular algorithms, gzip
or bzip2
. The use of compressed files has several advantages.
Firstly, they occupy significantly less storage space, ensuring that your disk resources are used wisely. Secondly, since compressed files are smaller in size, they can be transferred or fetched more rapidly from different storage locations or over networks. This reduced transfer time, combined with Memgraph's efficient handling of compressed files, ensures a more streamlined and quicker data loading process.
So, by simply compressing your CSV files with gzip
or bzip2
before importing, you can potentially shave off a considerable amount of time from the overall data integration process, making your operations more efficient.
Switch to analytical storage mode
Memgraph offers flexibility in how data is stored and accessed by supporting three distinct storage memory modes:
- In-memory transactional
- In-memory analytical
- On-disk transactional
In-memory analytical mode is optimized for speed and memory efficiency. Memgraph forgoes ACID guarantees in this mode, except for those provided by manually created snapshots. The result? A significant boost in import speed—up to six times faster—while consuming six times less memory. It's ideal for swiftly importing large datasets.
Post-import, you have the choice to either revert to transactional mode to enable ACID guarantees or continue using the analytical mode. If opting for the latter, you can run analytical queries without modifying the data. However, it's crucial to note a couple of things: automated backups aren't generated, and transactions may observe changes from other ongoing transactions which could lead to data corruption and lack of consistency.
Switch between the modes during a session by executiong the following query:
STORAGE MODE IN_MEMORY_{TRANSACTIONAL|ANALYTICAL};
Always ensure you select the most suitable storage mode for your specific needs and use case.
Analyze graph
When you've indexed multiple properties, the database doesn't innately know which index is the most efficient for a given query. By default, it tends to choose the index associated with fewer nodes. This methodology, while efficient in certain situations, can sometimes lead the database to pick a non-optimal index based solely on node count.
Similarly, sub-optimal query executions can occur when executing MERGE
queries on a supernode. The best practice here is to initiate the MERGE
from a node with a lower degree relative to the connecting node, but the database has no knowledge of the nodes’ degree so it can’t use the best practice.
To enhance the database's awareness of the graph structure, consider executing the ANALYZE GRAPH;
query before running queries that intersect with multiple property indexes or involve supernodes. This action prompts the database to review and compute specific graph properties, enabling it to make more informed decisions regarding indexes or MERGE
transaction optimization.
Here's a practical illustration: A proper comprehension of index structure allows the database to pinpoint optimal indexes, streamlining the match-finding process through reduced iterations. While analyzing the graph might consume some initial time and memory resources, it's a worthy investment. Remember, a graph requires a single thorough analysis to bolster numerous subsequent queries.
Inspect query plans
Before a Cypher query is executed, it undergoes a transformation into a format suited for query execution. This transformation results in what we refer to as a query execution plan plan. Think of this query execution plan as a conduit or pipeline. It's essentially a sequence of logical operators that process data step by step, moving it from one operator to the next. To peek under the hood and understand how your query will be executed, prefix your query with the EXPLAIN
clause. This lets you view the generated plan, providing a clearer understanding of the query execution pathway.
For instance:
EXPLAIN MATCH (n :Node)-[:Edge]-(m :Node) WHERE n.prop = 42 RETURN *;
The produced plans are read bottom-up. In the example above, the database utilizes two primary logical operators: Expand
and Filter
. The Expand
operator identifies an edge between two nodes, specifically m
and n
, which were previously singled out using the ScanAllByLabel
logical operator. The Filter
operator is responsible for sifting through the matched nodes based on the condition WHERE n.prop = 42
. Therefore, the query execution unfolds in the following stages:
- ScanAllByLabel: Two nodes of type
:Node
are identified through the dual scans. - Expand: The engine seeks a path composed of these two nodes and an interconnecting edge.
- Filter: If such a path is discovered, it's further scrutinized based on one node’s property.
- Produce: Paths that pass the filtering criteria have their constituent nodes added to the query results.
To get a more granular analysis of a query execution, use the PROFILE
clause, where clause provides a detailed performance report for the query plan. Each logical operator is detailed with:
- OPERATOR: The operator's designation, mirroring its name in an
EXPLAIN
query result. - ACTUAL HITS: The frequency with which a specific logical operator was used.
- RELATIVE TIME: The time dedicated to a specific logical operator, proportionate to the entire plan's execution duration.
- ABSOLUTE TIME: The exact duration spent processing a specific logical operator.
- For the previously mentioned example, the profiling data would appear as follows:
PROFILE MATCH (n :Node)-[:Edge]-(m :Node) WHERE n.prop = 42 RETURN *;
By examining the query execution plan and its behavior, you can refine the query to utilize more efficient operators, thereby enhancing the overall performance of the query execution.
In retrospect
From understanding the intricacies of storage memory modes to leveraging the power of indexed properties and compressed CSVs, we've delved deep into techniques that optimize data retrieval and manipulation. With tools like the EXPLAIN
and PROFILE
commands, you can gain invaluable insights into the inner workings of their queries, ensuring that data is not just accessed, but accessed smartly and efficiently. As you navigate your data adventures with Memgraph, keep these insights in mind, experiment with the tools at hand, and always strive for optimization. The true power of a graph database is realized when it's not just used, but used optimally. Happy graphing!