Vector search

Vector search, also known as vector similarity search or nearest neighbor search, is a technique used to find the most similar items in a collection of data based on their vector representations. The vector search is currently an experimental feature. Memgraph implements a READ_UNCOMMITTED isolation level specifically for vector indices. While the main database can operate at any isolation level, the vector index specifically operates at READ_UNCOMMITTED. This design maintains all transactional guarantees at the database level. Only the vector index operations use this relaxed isolation level, ensuring the database's ACID properties remain intact for all other operations.

Run Memgraph with vector search feature

To try out vector search feature you need to enable it and configure it. To enable the feature set the --experimental-enabled to vector-search. To configure the feature set the --experimental-config flag.

Changing the configuration settings depends on the way you are using Memgraph, so please refer to the configuration docs for more information.

Here is the example configuration:

--experimental-config=
'
{
  "vector-search": {
    "index_name": {
      "label": "Node",
      "property": "vector",
      "dimension": 2,
      "capacity": 1000,
      "metric": "cos",
      "resize_coefficient": 2
    }
  }
}
'

Keep in mind that --experimental-enabled and --experimental-config flags are both required and the following fields are mandatory for the configuration: label, property, dimension and capacity.

Configuration parameters:

index_name: string ➡ The vector index to be searched.
label: string ➡ The name of the label on which vector index is indexed.
property: string ➡ The name of the property on which vector index is indexed.
dimension: int ➡ The dimension of vectors in the index.
capacity: int ➡ Minimum capacity for the vector index, which prefers powers of two and is adjusted internally for optimal performance but will be at least the given value.
metric: string ➡ The metric used for the vector search. The default value is l2sq (squared Euclidean distance).
resize_coefficient: int ➡ The resize coefficient is multiplied by the capacity when the index gets full to determine the new capacity, if possible. If the index cannot be resized due to insufficient memory, an exception will be thrown. The default value is 2.

Usage

To run vector search, you need to call vector_search query module.

Unlike other index types, the query planner currently does not utilize vector indices.

Show vector indices

To retrieve information about vector indices, use vector_search.show_index_info() procedure.

Output:

index_name: string ➡ The name of the vector index.
label: string ➡ The name of the label on which vector index is indexed.
property: string ➡ The name of the property on which vector index is indexed.
dimension: int ➡ The dimension of vectors in the index.
capacity: int ➡ The capacity of the vector index.
size: int ➡ The number of entries in the vector index.

Usage:

CALL vector_search.show_index_info() YIELD * RETURN *;

Query vector index

Use the vector_search.search() procedure to search for similar vectors within a vector index. This procedure allows you to find the closest vectors to a query vector based on the selected similarity metric.

Input:

index_name: string ➡ The vector index to search.
limit: int ➡ The number of nearest neighbors to return.
search_query: List[float] ➡ The vector to query in the index.

Output:

distance: double ➡ The distance from the node to the query.
node: Vertex ➡ A node in the vector index matching the given query.
similarity: double ➡ The similarity of the node and the query.

Usage:

CALL vector_search.search("index_name", 1, [2.0, 2.0]) YIELD * RETURN *;

Similarity metrics

The following table lists the supported similarity metrics for vector search. These metrics determine how similarities between vectors are calculated. Default type for the metric is l2sq (squared Euclidean distance).

Metric	Description
`ip`	Inner product (dot product)
`cos`	Cosine similarity
`l2sq`	Squared Euclidean distance
`pearson`	Pearson correlation coefficient
`haversine`	Haversine distance (suitable for geographic data)
`divergence`	A divergence-based metric
`hamming`	Hamming distance
`tanimoto`	Tanimoto coefficient
`sorensen`	Sørensen-Dice coefficient
`jaccard`	Jaccard index

Scalar type

Properties are stored as 64-bit values in the property store and as 32-bit values in the vector index. Scalar type define the data type of each vector element. Default type for the metric is f32.

Example

Here is a simple example of vector search usage.

Run Memgraph and Lab

First run Memgraph MAGE with the vector search enabled and configured with the following Docker command:

docker run -p 7687:7687 -p 7444:7444 memgraph/memgraph-mage:latest --experimental-enabled=vector-search --experimental-config='{"vector-search": {"index_name": {"label": "Node","property": "vector","dimension": 2,"capacity": 1000, "metric": "cos","resize_coefficient": 2}}}'

Then, run Memgraph Lab, a visual user interface:

docker run -p 3000:3000 memgraph/lab:latest

You can also run Memgraph's CLI, mgclient, directly from the Memgraph MAGE Docker container.

Check vector index

After Memgraph MAGE and Lab have been started, head over to the Query execution tab in Memgraph Lab and run the following query to inspect vector index:

CALL vector_search.show_index_info() YIELD * RETURN *;

The above query will result with:

+--------------+--------------+--------------+--------------+--------------+--------------+
| capacity     | dimension    | index_name   | label        | property     | size         |
+--------------+--------------+--------------+--------------+--------------+--------------+
| 2048         | 2            | "index_name" | "Node"       | "vector"     | 0            |
+--------------+--------------+--------------+--------------+--------------+--------------+

Create a node

The next step is to create a node, with Node label and vector property, so it's properly added to the vector index.

CREATE (n:Node {vector: [2, 2]});

To confirm that the above node has been indexed, let's check the vector index info again:

CALL vector_search.show_index_info() YIELD * RETURN *;

The above query results in:

+--------------+--------------+--------------+--------------+--------------+--------------+
| capacity     | dimension    | index_name   | label        | property     | size         |
+--------------+--------------+--------------+--------------+--------------+--------------+
| 2048         | 2            | "index_name" | "Node"       | "vector"     | 1            |
+--------------+--------------+--------------+--------------+--------------+--------------+

We can see the size of the index changed, due to one new node.

Query vector index

Let's search for the similar nodes. Here is an example query to do that:

CALL vector_search.search("index_name", 1, [2.0, 2.0]) YIELD * RETURN *;

We expect to get the most similar node to vector [2.0, 2.0]:

+--------------------------+--------------------------+--------------------------+
| distance                 | node                     | similarity               |
+--------------------------+--------------------------+--------------------------+
| -1.19209e-07             | (:Node {vector: [2, 2]}) | 1                        |
+--------------------------+--------------------------+--------------------------+

The distance is 0 because the two vectors that are being compared with Cosine similarity are the same, leading to the similarity value of 1.

Add more nodes and expand search

Let's add a couple of more nodes:

CREATE (n:Node {vector: [1, 2]});
CREATE (n:Node {vector: [1, 1]});
CREATE (n:Node {vector: [0, 1]});
CREATE (n:Node {vector: [0, 0]});

Let's see the status of the index now:

CALL vector_search.show_index_info() YIELD * RETURN *;

The size is now 5, due to 4 additional nodes:

+--------------+--------------+--------------+--------------+--------------+--------------+
| capacity     | dimension    | index_name   | label        | property     | size         |
+--------------+--------------+--------------+--------------+--------------+--------------+
| 2048         | 2            | "index_name" | "Node"       | "vector"     | 5            |
+--------------+--------------+--------------+--------------+--------------+--------------+

Let's again search for the top five similar nodes to the vector [2.0, 2.0] (to compare it to all nodes we have):

CALL vector_search.search("index_name", 5, [2.0, 2.0]) YIELD * RETURN *;

Notice how we changed the limit to get the top five nearest neighbors. Here are the results:

+--------------------------+--------------------------+--------------------------+
| distance                 | node                     | similarity               |
+--------------------------+--------------------------+--------------------------+
| -1.19209e-07             | (:Node {vector: [1, 1]}) | 1                        |
| -1.19209e-07             | (:Node {vector: [2, 2]}) | 1                        |
| 0.0513167                | (:Node {vector: [1, 2]}) | 0.948683                 |
| 0.292893                 | (:Node {vector: [0, 1]}) | 0.707107                 |
| 1                        | (:Node {vector: [0, 0]}) | 0                        |
+--------------------------+--------------------------+--------------------------+

Since cosine similarity was used as a metric, we have two nodes with the same similarity to the query vector: [1, 1] and [2, 2].