Indexes

When running queries, you want to get results as soon as possible. In the worst-case scenario, during query execution, all nodes need to be checked to find a match.

This is how the query plan looks when there is no index on the data:

memgraph> EXPLAIN MATCH (n:Person {prop: 1}) RETURN n;
+-----------------------------------+
| QUERY PLAN                        |
+-----------------------------------+
| " * Produce {n}"                  |
| " * Filter (n :Person), {n.prop}" |
| " * ScanAllByLabel (n :Person)"   |
| " * Once"                         |
+-----------------------------------+

Notice the ScanAllByLabel operations. By creating indexes, query execution can be much faster because indexes partition data with a key. When a query is executed, the engine first checks if there is an index, and instead of explicitly checking every node, it can just check the indexed ones, making retrieving indexed data more efficient.

The following query creates an index on a property of a certain label:

CREATE INDEX ON :Person(prop);

The query plan of a query matching that specific node and property:

memgraph> EXPLAIN MATCH (n:Person {prop: 1}) RETURN n;
+-----------------------------------------------------+
| QUERY PLAN                                          |
+-----------------------------------------------------+
| " * Produce {n}"                                    |
| " * ScanAllByLabelProperties (n :Person {prop})"    |
| " * Once"                                           |
+-----------------------------------------------------+

The ScanAllByLabel operations have been replaced by ScanAllByLabelProperties, a more efficient operation.

But there are also some downsides to indexing:

each index requires extra storage (memory)
indexes slow down write operations to the database.

It is important to choose the right data to create indexes on, as indexing all of the content will not improve the database speed.

The structures in the index are dynamically updated on modifications or insertions of new nodes, slowing down the write operations.

Indexing won’t bring any improvements on properties that are mostly of the same value, as they have no proper distinguishers.

For the same reason, indexing certain data types will not bring any significant performance gain. For example, for properties with boolean values, the time will be cut in half.

In addition to the indexes mentioned below, Memgraph also supports vector and text indexes. Due to their fundamentally different nature, these indexes have dedicated documentation pages. To learn how to set them up and use them effectively, refer to the vector search documentation and text search documentation.

Create an index

Indexes are not created automatically.

You can explicitly create indexes on a data with a specific label or label-property combination using the CREATE INDEX ON syntax or CREATE EDGE INDEX ON syntax in the case of edge-type indexes.

Label index

To optimize queries that fetch nodes by label, you need to create a label index:

CREATE INDEX ON :Person;

Creating an index will optimize the following type of queries:

MATCH (n:Person) RETURN n;

To drop the label index use:

DROP INDEX ON :Label;

Label-property index

To optimize queries that fetch nodes with a certain label and property combination, you need to create a label-property index. For the best performance, create an index on properties containing unique integer values.

⚠️

Creating a label-property index will not create a label index!

For example, to index nodes that are labeled as :Person and have a property named age:

CREATE INDEX ON :Person(age);

Creating an index will optimize the queries that need to match a specific label and property combination:

MATCH (n :Person {age: 42}) RETURN n;

Index will also optimize queries that filter labels and properties with the WHERE clause:

MATCH (n) WHERE n:Person AND n.age = 42 RETURN n;

Be aware that since the filter inside WHERE can contain any kind of an expression, the expression can be so complicated that the index doesn’t get used. If there is any suspicion that an index isn’t used, we recommend writing labels and properties inside the MATCH pattern.

Label-property indices can also be created directly on properties that are nested within map properties. If your schema is already structured using nested JSON-like documents, you can get optimal query index performance without needing to flatten your data using the . notation:

CREATE INDEX ON :Project(delivery.status.due_date)

⚠️

This will create an index only on delivery.status.due_date. It does not create an index on delivery, nor on delivery.status.

When querying against nested properties, you must use a WHERE clause. If you use inline property matching, you will be comparing against the entire map structure, not just the specified property. The following query returns an empty result set because the inline property does not contain the milestone.

CREATE INDEX ON :Project(delivery.status.due_date);
CREATE (:Project {delivery: {status: {due_date: date('2025-06-04'), milestone: 'v3.14'}}});
 
MATCH (proj:Project {delivery: {status: {due_date: date('2025-06-04')}}}) RETURN *;

Adding the milestone property returns the correct result, but as the delivery property is not directly indexed - only the child due_date is - the index will still not be used.

MATCH (proj:Project {delivery: {milestone: 'v3.14', status: {due_date: date('2025-06-04')}}}) RETURN *;

+-----------------------------------------------------------------------------+
| proj                                                                        |
+-----------------------------------------------------------------------------+
| (:Project {delivery: {milestone: "v3.14", status: {due_date: 2025-06-04}}}) |
+-----------------------------------------------------------------------------+

Instead, you should use a WHERE clause to specify just the indexed property, which is the only scenario in which a nested index can be used:

MATCH (proj:Project) WHERE x.delivery.status.due_date = date('2025-06-04') RETURN *;

+-----------------------------------------------------------------------------+
| proj                                                                        |
+-----------------------------------------------------------------------------+
| (:Project {delivery: {milestone: "v3.14", status: {due_date: 2025-06-04}}}) |
+-----------------------------------------------------------------------------+

To drop the label-property index use:

DROP INDEX ON :Label(property);

Label-property composite index

An index on a label and two or more properties is called a composite index. Composite indices optimize queries where multiple properties are commonly used together in filtering clauses, or when uniqueness exists in a combination of node properties, rather than in the individual properties.

To create a composite index on nodes labelled :Person and having properties name and occupation:

CREATE INDEX ON :Person(name, occupation);

Such an index can be used to optimise queries which filter against the index’s label and some or all of the index’s properties:

MATCH (p:Person) WHERE p.name = "Alice Smith" AND p.occupation = "DevOps Engineer" RETURN *;

Without a composite index, you would need to either index on name and filter on occupation or index on occupation and filter on name. Having a composite index means queries such as these execute more efficiently because no filtering is required after the index lookup.

There are no restrictions on the number of properties on which a composite index can be created, but be aware that an index entry will be created for every node having any of these properties.

In the following query, an index entry will be created for OpenMind, despite it having only two of the properties of the composite index (name and sector). It should be clear that composite indices are not efficient when they are created on sparsely populated properties:

CREATE INDEX ON :Company(name, size, location, sector, founded, turnover);
CREATE (:Company {name: 'OpenMind', sector: 'AI'});

The order of properties within the index matters. The two indices below are distinct, despite them both being for the same label and properties, and this distinction has important performance implications.

CREATE INDEX ON :Person(name, age, occupation);
CREATE INDEX ON :Person(age, occupation, name);

This order follows the “leftmost prefix rule”, where the leftmost prefix is the subset of index properties starting from the first property.

A composite index on properties (a, b) can efficiently support queries on property a, or on properties a and b together. However, it cannot efficiently support queries on property b alone. Similarly, an index on properties (b, a) would help with queries on b, or b and a together, but not on a alone.

In the case of our index on (name, age, occupation), we can use the index for queries matching all three properties.

MATCH (p:Person)
WHERE p.name = 'Alice Smith' AND p.age = 42 AND p.occupation = 'DevOps Engineer'
RETURN *;

But we can also use the same index if we are filtering only against name and age, or even just name:

MATCH (p:Person) WHERE p.name = 'Alice' AND p.age = 42 RETURN *;
MATCH (p:Person) WHERE p.name = 'Alice' RETURN *;

However, the index cannot be used if we are only filtering against occupation or age.

Formally, for a composite index on :L(p0, p1, p2, ...) to be used, a query must match against :L and any of the leftmost prefixes of properties (such as: p0; or p0, p1; or p0, p1, p2). This means an index cannot be used in a query if the query does not match a leftmost prefix of properties.

The most effective use of an index is in a query which matches only properties from the leftmost prefix of the composite index, as this will need no further filtering. If the query matches a leftmost prefix as well as other non-contiguous properties, the index can be used, but additional filtering is needed.

The following table summarises the cases where a composite index on properties :L(p0, p1, p2) will and won’t be used, and whether additional filtering is required, based on which properties the query matches against:

p0	p1	p2	Example	Uses index	Needs filter
❌	❌	❌	`MATCH (x:L)`	❌	✅
❌	❌	✅	`MATCH (x:L) WHERE x.p2 = 1`	❌	✅
❌	✅	❌	`MATCH (x:L) WHERE x.p1 = 1`	❌	✅
❌	✅	✅	`MATCH (x:L) WHERE x.p1 = 1 AND x.p2 = 1`	❌	✅
✅	❌	❌	`MATCH (x:L) WHERE x.p0 = 1`	✅	❌
✅	❌	✅	`MATCH (x:L) WHERE x.p0 = 1 AND x.p2 = 1`	✅	✅
✅	✅	❌	`MATCH (x:L) WHERE x.p0 = 1 AND x.p1 = 1`	✅	❌
✅	✅	✅	`MATCH (x:L) WHERE x.p0 = 1 AND x.p1 = 1 AND x.p2 = 1`	✅	❌

The best cases here - those where the index can be used with no additional filtering - are those exactly matching on one of the three leftmost prefixes: p0; p0 and p1; or p0, p1, and p2.

Because of the leftmost prefix rule, a composite index can subsume a label+property index. You should be aware, however, that even though an index on :L(a, b, c) can replace an index on :L(a), it may not perform as efficiently as the single property index. If queries frequently filter on a alone, it may be worth adding an additional :L(a) index.

When deciding the order of properties within a composite index, you should generally put the property with the highest cardinality (most unique values) first. Alternatively, if queries on property a alone are more common than those on b alone, prefer (a, b) over (b, a).

Composite indices may also be created using nested properties. You cannot create a nested composite index for both a property and its parent within the same index. The following is invalid because it contains both name and name.surname:

CREATE INDEX ON :Person(name, name.surname) // INVALID

However, it is perfectly valid to create nested composite indices on multiple properties underneath the same parent property, provided the parent isn’t also specified alone:

CREATE INDEX ON :Person(name.firstname, name.surname)`

To drop the label-property composite index use:

DROP INDEX ON :Label(property1, property2);

Edge-type index

To optimize queries that fetch only the edges by specific edge-types, you need to create an edge-type index. Creating an edge-type index requires the --storage-properties-on-edges flag to be set to true.

CREATE EDGE INDEX ON :EDGE_TYPE;

Creating an edge-type index will optimize the following type of queries:

MATCH ()-[r:EDGE_TYPE]->() RETURN r;

If you need to access nodes of found edges, you can use the startNode(r) and endNode(r) functions.

Named parameters are not supported for edge-type indexes.

To drop the edge-type index use:

DROP EDGE INDEX ON :EDGE_TYPE;

Edge-type property index

To optimize queries that fetch only the edges by specific edge types and properties, you need to create an edge-type property index. Creating an edge-type property index requires the --storage-properties-on-edges flag to be set to true.

CREATE EDGE INDEX ON :EDGE_TYPE(property_name);

Creating an edge-type property index will optimize the following type of queries:

MATCH ()-[r:EDGE_TYPE {property_name: value}]->() RETURN r;

If you need to access nodes of found edges, you can use the startNode(r) and endNode(r) functions.

Named parameters are not supported for edge-type property indexes.

To drop the edge-type property index use:

DROP EDGE INDEX ON :EDGE_TYPE(property_name);

Global edge property index

Since edges can have only one edge type, Memgraph provides a way to index all the edges that contain a certain property via the global edge index. The syntax to index all the edges that have a certain property is the following:

CREATE GLOBAL EDGE INDEX ON :(property_name);

Creating a global edge property index will optimize the following type of queries:

MATCH ()-[r {property_name: value}]->() RETURN r;

If you need to access nodes of found edges, you can use the startNode(r) and endNode(r) functions.

Named parameters are not supported for edge-type property indexes.

To drop the global edge property index use:

DROP GLOBAL EDGE INDEX ON :(property_name);

Point index

Point index can be utilized for more performant spatial queries which use WHERE point.distance() or WHERE point.withinbbox().

To create a point index run the following command:

CREATE POINT INDEX ON :Label(property);

Here is an example of a query using point.distance():

MATCH (node:Label) WHERE point.distance(point({x:1, y:1}), node.property) < 1
RETURN node;

Here are the examples of queries using point.withinbbox():

MATCH (node:Label) WHERE NOT point.withinbbox(node.property, point({x:1, y:1}), point({x:2, y:2}))
RETURN node;

MATCH (node:Label) WHERE point.withinbbox(node.property, point({x:1, y:1}), point({x:2, y:2})) = false
RETURN node;

In Memgraph v2.21 point index couldn’t be utilized with the point.withinbbox() function. It could only be utilized with the point.distance function and to use it, the comparison point needed to exist as a variable first. For example, instead of:

MATCH (node:Label) WHERE point.distance(point({x:1, y:1}), node.property) < 1
RETURN node;

you needed to use WITH statement:

WITH point({x:1, y:1}) AS p
MATCH (node:Label) WHERE point.distance(p, node.property) < 1
RETURN node;

From Memgraph v2.22, it is no longer necessary to use the WITH statement for index to be utilized.

To drop the point index use:

DROP POINT INDEX ON :Label(property);

Analyze graph

When multiple label-property indexes exist, the database can sometimes select a non-optimal index due to the data’s distribution.

The ANALYZE GRAPH; query calculates the distribution of property values so the database can select a more optimal label-property index with the smallest average property value size. The query is run only once after all indexes have been created and data inserted in the database.

You can modify indexes using the schema.assert() procedure.

Speed comparison

Below is a comparison of the same query run without an index and with an index. The query without an index took 0.015 seconds to execute, and the query with an index 0.006 seconds.

memgraph> SHOW INDEX INFO;
Empty set (0.001 sec)

memgraph> MATCH (n:Person) WHERE n.name =~ ".*an$" RETURN n.name;
+-------------+
| n.name      |
+-------------+
| "Lillian"   |
| "Logan"     |
| "Susan"     |
| "Sebastian" |
+-------------+
4 rows in set (0.021 sec)

memgraph> CREATE INDEX ON :Person(name);
Empty set (0.015 sec)

memgraph> MATCH (n:Person) WHERE n.name =~ ".*an$" RETURN n.name;
+-------------+
| n.name      |
+-------------+
| "Lillian"   |
| "Logan"     |
| "Susan"     |
| "Sebastian" |
+-------------+
4 rows in set (0.006 sec)

Show created indexes

To see all the information on the label, label-property, edge-type, edge-type property and point indexes, run the following query:

SHOW INDEX INFO;

The query displays a table holding the information on the indexes created, ordered by index type, label, property and count.

To retrieve information about vector indexes, run the following query:

SHOW VECTOR INDEX INFO;

Delete an index

Created indexes can be deleted using the following syntax:

DROP INDEX ON :Label;

DROP INDEX ON :Label(property);

DROP INDEX ON :Label(property1, property2);

DROP EDGE INDEX ON :EDGE_TYPE;

DROP EDGE INDEX ON :EDGE_TYPE(property_name);

DROP GLOBAL EDGE INDEX ON :(property_name);

DROP POINT INDEX ON :Label(property);

These queries instruct all active transactions to abort as soon as possible. Once all transactions have finished, the index will be deleted.

Delete all node indexes

⚠️

The schema.assert() procedure will not drop edge and point indexes. Our plan is to update it, and you can track the progress on our GitHub.

To delete all indexes, use the schema.assert() procedure with the following parameters:

indices_map = {}
unique_constraints = map of key-value pairs of all uniqueness constraints in the database
existence_constraints = map of key-value pairs of all existence constraints in the database
drop_existing = true

Here is an example of indexes and constraints set in the database:

CREATE CONSTRAINT ON (n:Person) ASSERT EXISTS (n.name);
CREATE CONSTRAINT ON (n:Employee) ASSERT n.id IS UNIQUE;
CREATE CONSTRAINT ON (n:Employee) ASSERT n.email IS UNIQUE;
CREATE CONSTRAINT ON (n:Employee) ASSERT n.name, n.surname IS UNIQUE;
CREATE INDEX ON :Student(id);
CREATE INDEX ON :Student;

There are three uniqueness and one existence constraint. Additionally, there are two indexes - one label and one label-property index. To delete all indexes, run:

CALL schema.assert({}, {}, {}, true)
YIELD action, key, keys, label, unique
RETURN action, key, keys, label, unique;

The above query removes all existing indexes because the empty indices_map indicates that no indexes should be asserted as existing, while the drop_existing set to true specifies that all existing indexes should be dropped.

Primarily, the assert() procedure is used to define a schema, but it’s also useful if you need to delete all constraints or delete all node indexes and constraints.

Automatic index creation

💡

Automatic index creation can only be used if the database has IN_MEMORY_TRANSACTIONAL mode enabled.

Using the storage-automatic-label-index-creation-enabled and storage-automatic-edge-type-index-creation-enabled flags, it is possible to create label and edge-type indices automatically. Every time the database encounters a label or edge-type that is currently not indexed, it will create an index for that construct.

Recovery

Existence and unique constraints, and indexes can be recovered in parallel. To enable this behavior, set the storage-parallel-schema-recovery configuration flag to true.

Query optimization

Analyze graph

The ANALYZE GRAPH will check and calculate certain properties of a graph so that the database can choose a more optimal index or MERGE transaction.

Before the introduction of the ANALYZE GRAPH query, the database would choose an index solely based on the number of indexed nodes. But if the number of nodes is the only condition, in some cases the database would choose a non-optimal index. Once the ANALYZE GRAPH is run, Memgraph analyzes the distribution of property values and can select a more optimal label-property index, the one with the smallest average property value size.

The average property value’s group size directly represents the database’s expected number of hits which can be used to estimate the query’s cost. When the average group size is the same, the chi-squared statistic is used to measure how close the distribution of property-value group size is to the uniform distribution. The index with a distribution closest to the uniform distribution is selected.

$\chi^2 = \sum_{i}\frac{(E_i-O_i)^2}{E_i}$

Upon running the ANALYZE GRAPH query, Memgraph also check the node degree of every indexed nodes and calculates the average degree. By having these values, Memgraph can make a more optimal MERGE expansion and improve performance. It’s always better to perform a MERGE by expanding from the node that has a lower degree than the connecting node.

The ANALYZE GRAPH; command should be run only once after all indexes have been created and nodes inserted in the database. In rare situations when one property is set on many more nodes than another property, choosing an index based on average group size and uniform distribution would be misleading. That’s why the database always selects the label-property index with >= 10x fewer nodes than the other label-property index.

Calculate the statistic

Run the following query to calculate the statistics:

ANALYZE GRAPH;

The query will iterate over all label and label-property indexes in the database and calculate the average group size, chi-squared statistic and avg degree for each one, then return the following output:

label	property	num estimation nodes	num groups	avg group size	chi-squared value	avg degree
index’s label	index’s property	number of nodes used for estimation	number of distinct values the property contains	average group size of property’s values	value of the chi-squared statistic	average degree of the indexed nodes

Once the necessary information is obtained, Memgraph can choose the optimal index and MERGE expansion. If you don’t want to run the analysis on all labels, you can specify which labels to use by adding the labels to the query:

ANALYZE GRAPH ON LABELS :Label1, :Label2;

Delete statistic

Information about the graph is persistent between instance reruns as is recovered as all the other data, using snapshots and WAL files. If you want the database to ignore information about the average group size, the chi-squared statistic and the average degree, the existing statistic can be deleted by running:

ANALYZE GRAPH DELETE STATISTICS;

The results will contain all label-property indexes that were successfully deleted:

label	property
index’s label	index’s property

Specific labels can be specified with the construct ON LABELS:

ANALYZE GRAPH ON LABELS :Label1 DELETE STATISTICS;

Index hinting

When executing a query, Memgraph query planner needs to decide where in the query it should start matching. To get the optimal match, it checks the MATCH clause conditions and finds the index that’s likely to be the best choice, if there are multiple indexes to choose from.

However, the selected index might not always be the best one. Sometimes, there are multiple candidate indexes, and the query planner picks the suboptimal one from a performance point of view.

With index hinting, you can instruct the planner to use specific index(es) (if possible) in the query that follows. Here is the syntax for such query:

USING INDEX :Label, :Label2 ...;

USING INDEX :Label(property) ...;

USING INDEX :Label(property1, property2) ...;

It is also possible to specify multiple hints separated with comma. In that case, the planner will apply the first hint that is applicable for a given match.

An example of selecting an index with USING INDEX:

USING INDEX :Person(name)
MATCH (n:Person {name: 'John', gender: 'male'})
RETURN n;

⚠️

Overriding planner behavior with index hints should be used with caution, and only by experienced developers and/or database administrators, as poor index choice may cause queries to perform poorly.

Underlying implementation

The central part of Memgraph’s index data structure is a highly concurrent skip list. Skip lists are probabilistic data structures that allow fast search within an ordered sequence of elements. The structure itself is built in layers, where the bottom layer is an ordinary linked list that preserves the order. Each higher level can be imagined as a highway for layers below.

The implementation details behind skip list operations are well documented in the literature and are out of the scope of this document. Nevertheless, we believe that it is important for more advanced users to understand the following implications of this data structure (n denotes the current number of elements in a skip list):

The average insertion time is O(log(n))
The average deletion time is O(log(n))
The average search time is O(log(n))
The average memory consumption is O(n)

When it comes to label-property indexes, Memgraph stores a list of specific properties that are used in label-property indexes. This list is ordered to make the search faster. All property types can be ordered. First, they are ordered based on the type, and secondly for values within that type.

In the case of composite indices, the list is lexicographically ordered, which means that the pair $(a_1, a_2)$ is ordered before $(b_1, b_2)$ if either $a_1 < b_1$ , or $a_1 = b_1$ and $a_2 < b_2$ .

Data durability Storage access

Indexes

Create an index

Label index

Label-property index

Label-property composite index

Edge-type index

Edge-type property index

Global edge property index

Point index

Analyze graph

Schema-related procedures

Speed comparison

Show created indexes

Delete an index

Delete all node indexes

Automatic index creation

Recovery

Query optimization

Analyze graph

Calculate the statistic

Delete statistic

Index hinting

Underlying implementation