Indexing

Introduction

A database index is a data structure used to improve the speed of data retrieval within a database at the cost of additional writes and storage space for maintaining the index data structure.

Armed with deep understanding of their data model and use-case, users can decide which data to index and, by doing so, significantly improve their data retrieval efficiency

Index Types

At Memgraph, we support two types of indexes:

Label indexing is enabled by default in Memgraph, i.e., Memgraph automatically indexes labeled data. By doing so we optimize queries which fetch nodes by label:

MATCH (n: Label) ... RETURN n;

Indexes can also be created on data with a specific combination of label and property, hence the name label-property index. This operation needs to be specified by the user and should be used with a specific data model and use-case in mind.

For example, suppose we are storing information about certain people in our database and we are often interested in retrieving their age. In that case, it might be beneficial to create an index on nodes labeled as :Person which have a property named age. We can do so by using the following language construct:

CREATE INDEX ON :Person(age);

After the creation of that index, those queries will be more efficient due to the fact that Memgraph's query engine will not have to fetch each :Person node and check whether the property exists. Moreover, even if all nodes labeled as :Person had an age property, creating such index might still prove to be beneficial. The main reason is that entries within that index are kept sorted by property value. Queries such as the following are therefore more efficient:

MATCH (n :Person {age: 42}) RETURN n;

Index based retrieval can also be invoked on queries with WHERE statements. For instance, the following query will have the same effect as the previous one:

MATCH (n) WHERE n:Person AND n.age = 42 RETURN n;

Naturally, indexes will also be used when filtering based on less than or greater than comparisons. For example, filtering all minors (persons under 18 years of age under Croatian law) using the following query will use index based retrieval:

MATCH (n) WHERE n:PERSON and n.age < 18 RETURN n;

Bear in mind that WHERE filters could contain arbitrarily complex expressions and index based retrieval might not be used. Nevertheless, we are continually improving our index usage recognition algorithms.

Information about available indexes can be retrieved by using the following syntax:

RETURN indexInfo();

The result of this query will be a list of all labels and label-property pairs that Memgraph currently indexes.

Created indexes can also be deleted by using the following syntax:

DROP INDEX ON :Label(property)

Dropping an index will instruct all active transactions to abort as soon as possible, and it will wait for them to finish. Once all transaction have finished, it will drop the index.

Uniqueness constraint

Label-property index can also enforce uniqueness constraint and thus ensure that all values in the index have a unique value.

This can be done with the following language construct:

CREATE UNIQUE INDEX ON :Person(email);

If you're absolutely sure that the uniqueness needs to be ensured on the database level, you have to be aware that using uniqueness constraint might reduce write performance significantly. That being said, we are planning to resolve this issue in some future release.

Underlying Implementation

The central part of our index data structure is a highly-concurrent skip list. Skip lists are probabilistic data structures that allow fast search within an ordered sequence of elements. The structure itself is built in layers where the bottom layer is an ordinary linked list that preserves the order. Each higher level can be imagined as a highway for layers below.

The implementation details behind skip list operations are well documented in the literature and are out of scope for this article. Nevertheless, we believe that it is important for more advanced users to understand the following implications of this data structure (n denotes the current number of elements in a skip list):


Have a Question?

Reach out to us over Slack or email, we're always happy to help with code or other questions you might have.