Query Optimization in Memgraph: Best Practices and Common Mistakes

By Matea Pesic

14 min readMay 22, 2024

Whether you’re trying to visualize your data or you’re running complex algorithms, querying is key to accessing information from your database. Yet, even experienced professionals can stumble over common pitfalls—from syntax errors to performance issues.

In this blog post, we’ll explore the most common querying challenges and provide practical tips to avoid typical errors. You'll have the tools to refine your querying strategies, optimize your techniques, and unlock valuable insights from your data.

Query Execution

Querying Memgraph using Cypher is a common operation that needs to be properly optimized. There are some tips and tricks for executing queries as fast as possible with the fewest resources.

Profiling for Performance

Memgraph executes queries based on the generated query plan. A query plan is a set of operations that need to be executed, and changing a single operator in a query plan can significantly change data pipeline and query performance.

To lower the data pipeline, you need to optimize the query plan. You can start by indexing your data properly.

Effective Indexing Strategies

Creating a label and label-property index can significantly lower the time needed to find a specific node and reduce the data in the pipeline.

It is important to note that indexing in a database depends on the actual dataset stored there. The general rule is that properties with high cardinality should be used as indexes. The following query creates an index on a property of a particular label:

CREATE  INDEX  ON  :Person(prop);

Besides creating indexes to decrease the data pipeline and improve search, features such as index hinting and analyze graph can help do that automatically.

Query Parameterization

Multiple query plans will be generated for each query, but only a single strategy is optimal. Query planning time can be significant for complex queries because the query planner will try to generate multiple versions of the query plan and choose the optimal one.

After the optimal query plan is generated, it is cached and reused for the same query. To benefit from query plan caching, use query parameters as much as possible.

Here is an example of a query without query parameters:

CREATE  (n:Node  {id:  123});
CREATE  (n:Node  {id:  154});
CREATE  (n:Node  {id:  322});

Here is an example of a query with query parameters:

CREATE  (n:Node  {id:  $id});
CREATE  (n:Node  {id:  $id});
CREATE  (n:Node  {id:  $id});

Memgraph’s query engine will try to cache the query plan without the parameters. Still, to ensure the best performance, use the query parameters whenever you can.

Reduce Roundtrip

When using Memgraph Lab, the total time it takes to process queries is known as the Lab full round trip. This time includes Memgraph's execution time, which covers query parsing, planning, and execution. To avoid returning duplicate nodes and reduce the full round trip time, you can use projection in the return clause.

Let’s show how that works on a couple of examples on the Game of Thrones deaths dataset from Memgraph Lab.

Here is an example query that returns the whole path:

MATCH  path=(:Character  {  name:  "Jon  Snow"  })-[:KILLED  *bfs]->(:Character)
RETURN  path;

When run in Memgraph Lab, the full roundtrip is 82 ms.

Project to Avoid Duplicates

To reduce roundtrip time, you can use projection to prevent returning duplicate nodes. Here is an example of a query that uses a projection.

 
MATCH  path=(:Character  {  name:  "Jon  Snow"  })-[:KILLED  *bfs]->(:Character)
WITH  project(path)  AS  subgraph
RETURN  subgraph;

When run in Memgraph Lab, the full roundtrip is down to 49 ms.

Return a Single Value

Here is a query that returns only the number of paths:

MATCH  path=(:Character  {  name:  "Jon  Snow"  })-[:KILLED  *bfs]->(:Character)
RETURN  count(path);

When run in Memgraph Lab, the full roundtrip is down to 30 ms.

Similarly, size() and sum() functions can be used to speed up the full roundtrip:

MATCH  path=(:Character  {  name:  "Jon  Snow"  })-[:KILLED  *bfs]->(:Character)
WITH  size(nodes(path))  as  nodes,  size(relationships(path))  AS  relationships
RETURN  sum(nodes), sum(relationships);

When run in Memgraph Lab, the full roundtrip is down to 33 ms.

Here is an example of using both project() and size():

MATCH  path=(:Character  {  name:  "Jon  Snow"  })-[:KILLED  *bfs]->(:Character)WITH  project(path)  as  subgraph
RETURN  size(subgraph.nodes), size(subgraph.edges);

When run in Memgraph Lab, the full roundtrip is down to 31 ms.

Return Only a Property

Here is a query that extracts nodes’ internal IDs from the path:

MATCH  path=(:Character  {  name:  "Jon  Snow"  })-[:KILLED  *bfs]->(:Character)
WITH  extract(n  IN  nodes(path)  |  ID(n))  AS  nodes_ids
RETURN  nodes_ids;

When run in Memgraph Lab, the full roundtrip is down to 41 ms.

Similarly, the following query returns only name nodes’ property from the path:

MATCH  path=(:Character  {  name:  "Jon  Snow"  })-[:KILLED  *bfs]->(:Character)
WITH  extract(n  IN  nodes(path)  |  n  {.name})  AS  nodes_ids
RETURN  nodes_ids;

When run in Memgraph Lab, the full roundtrip is down to 31 ms.

These are some tips that can help reduce roundtrip time. These optimization strategies have a larger impact on larger datasets and should be utilized whenever possible to achieve the best performance.

Optimizing Path Traversals

In contrast to other graph databases, Memgraph deep path traversals efficiently handle complex graph queries, as these algorithms have been built into Memgraph’s core. This eliminates the need for the overhead of business logic on the application side. There are four built-in deep path traversal algorithms: depth-first search (DFS), breadth-first search (BFS), weighted shortest path, and all shortest paths.

For example, let’s take the following unoptimized query that traverses the European transportation network. Using the BFS algorithm, we’ll find all of the cities London is connected to with the shortest road path available. In other words, we'll find all shortest paths from the starting node with the name property being London to all of the nodes in the dataset.

MATCH  path=(:City  {name:  'London'})-[*BFS]->(:City)
RETURN  path;

This query returns nodes representing cities London is connected to with the transportation network.

Since we didn’t provide any restrictions and filtering, the algorithm scans and traverses through the entire dataset, which can lead to slower performance on larger-scale datasets. There are a few ways we can optimize the traversal.

Creating Indexes

Creating indexes on relevant properties can drastically speed up traversal queries by shortening the database scanning time.

Filtering by Relationship Type

Unlike other graph databases, Memgraph supports inline filtering, enabling efficient traversal through graph structures. This approach allows for precise control over how relationships are traversed, including filtering by type and the direction of the relationship, avoiding subsequent filtering using the WHERE clause.

Let’s take the same example from above, but this time we are limiting traversal across roads only, eliminating other types of transportation. In other words, we’ll provide a relationship type filter to the previously used query and limit it only to traverse through the relationship type ROAD.

MATCH  path=(:City  {name:  'London'})-[r:ROAD  *BFS]->(:City)
RETURN  path;

Memgraph eliminates traversing through unnecessary relationships, shortening the execution time.

Filtering by Property Value

Traversal algorithms allow an expression filter that determines if an expansion is allowed over a certain relationship or node property value.

I’ve taken the same example from above, but this time we’ve limited the traversal through the European roads only. We want to apply filters to the relationship property continent and set the value to exactly Europe.

MATCH  path=(:City  {name:  'London'})-[r:ROAD  *BFS  (r,  n  |  r.continent  =  'Europe')]->(:City)
RETURN  path;

This way, Memgraph eliminates traversing through unnecessary relationships and property values, shortening the execution time even more.

Constraining Path Length

By constraining the path length, the algorithm won’t do unnecessary scanning and return results after finding results with the maximum number of hops.

The following query will only return the results if the path is equal to or shorter than two hops:

MATCH  path=(:City  {name:  'London'})-[r:ROAD  *BFS  ..2  (r,  n  |  r.continent  =  'Europe')]->(:City)
RETURN  path;

Knowing the schema of your dataset and filtering and limiting the desired results can help you achieve a much more optimized use of the traversal algorithms.

Avoiding Inefficient Queries

Cyphermorphism

If you’re looking for a longer path in your dataset, try to keep the query as compact as possible. When you’re matching a longer path (chain), a query similar to this one:

EXPLAIN  MATCH  (n)-[r1]-(m)-[r2]-(l)
RETURN  *;

The query plan applies the EdgeUniquenessFilter, meaning that relationships r1 and r2 must be unique. The query engine treats the following triplet syntax (start node, edge, end node) in the same way:

EXPLAIN  MATCH  (n)-[r1]-(m),  (m)-[r2]-(l)
RETURN  *;

The EdgeUniquenessFilter was applied because of graph-matching morphism called cyphermorphism, which ensures that all relationships matched by the same clause must be different.

If the above query was split into two MATCH clauses:

EXPLAIN  MATCH  (n)-[r1]-(m)
MATCH  (m)-[r2]-(l)
RETURN  *;

The query plan for this query does not use the EdgeUniquenessFilter operator anymore, leading to unexpected results, duplicated data in the data pipeline, and, consequently, poor performance.

The EdgeUniquenessFilter operator was not applied, because during query parsing, multiple MATCH, MERGE, OPTIONAL MATCH, WITH and UNION statements split the query into multiple logical query parts. They are considered independent, so no uniqueness filter is applied.

Workarounds for Fast Filtering

For simple filtering queries where label and label-property indexes are set, those indexes will be used as expected. Here is an example of such a query:

MATCH  (p:Person)
WHERE  p.age  =  20
RETURN  p;

For more advanced filtering with boolean operators, profile the query and check if indexes are being adequately used. The following query will not use the label-property index as expected because it uses the OR boolean operator:

MATCH  (p:Person)
WHERE  p.age  =  20  OR  p.age  =  25
RETURN  p;

Such queries haven’t been optimized to use the union of indexes from its two branches.

As a workaround, use IN so that the ScanAllByLabelPropertyValue operator is used:

MATCH  (p:Person)
WHERE  p.age  IN  [20,  25]
RETURN  p;

A similar thing happens if you try using the range() function:

MATCH  (p:Person)
WHERE  p.age  IN  range(20,  25)
RETURN  p;

In this case, only the ScanAllByLabel operator will be used, and the execution won’t be as fast as expected. To use the ScanAllByLabelPropertyValue operator and to achieve better execution time, do the following workaround:

UNWIND  range(20,  25)  AS  person_age
MATCH  (p:Person)
WHERE  p.age  =  person_age,
RETURN  p;

Conclusion

Mastering querying techniques is a step-by-step process that will allow you to handle data more efficiently. This article has explored common pitfalls and provided strategies for overcoming them, including optimizing execution, leveraging indexing, reducing roundtrip times and techniques for optimizing path traversals. By implementing these strategies, you ensure smoother database operations as well as less memory consumption and shorter query execution time.