QueryingVector search

Vector search

Vector search, also known as vector similarity search or nearest neighbor search, is a technique used to find the most similar items in a collection of data based on their vector representations. Memgraph implements a READ_UNCOMMITTED isolation level specifically for vector indices. While the main database can operate at any isolation level, the vector index specifically operates at READ_UNCOMMITTED. This design maintains all transactional guarantees at the database level. Only the vector index operations use this relaxed isolation level, ensuring the database’s ACID properties remain intact for all other operations. Memgraph supports two kinds of vector indexes:

  • Single-store vector index on nodes. The vector value is stored only in the vector index backend (USearch); the property store keeps only a reference (vector index ID). This avoids duplicating vector data between the property store and the index.
  • Vector index on edges. This uses a different storage model: the vector remains in the edge property store and is also indexed in USearch. Creating, querying, and dropping edge vector indexes is done with separate syntax and procedures from the single-store (node) index.
⚠️

To configure vector search as described in the example, please use the latest Memgraph version.

Vector search is commonly used as a retrieval technique in RAG systems to find entities based on semantic similarity rather than exact matches.

vector-search-workflow

Create vector index

To run vector search, first create a vector index. The syntax and storage behavior differ for nodes and edges.

Single-store vector index

Single-store vector indices are created with the CREATE VECTOR INDEX command. You need to:

  1. Provide the index name
  2. Specify the label and property it applies to
  3. Define the vector index configuration

Example:

CREATE VECTOR INDEX vector_index_name ON :Label(embedding) WITH CONFIG {"dimension": 256, "capacity": 1000};

dimension and capacity are mandatory.

Vector index on edges

To create a vector index on edges, use:

CREATE VECTOR EDGE INDEX vector_index_name ON :EDGE_TYPE(embedding) WITH CONFIG {"dimension": 256, "capacity": 1000};

Configuration parameters

The following options apply to both single-store vector indexes (nodes) and vector indexes on edges:

  • dimension: int ➡ The dimension of vectors in the index.
  • capacity: int ➡ Minimum capacity for the vector index, which prefers powers of two and is adjusted internally for optimal performance but will be at least the given value.
  • metric: string (default=l2sq) ➡ The similarity metric used for the vector search. The default value is l2sq (squared Euclidean distance).
  • resize_coefficient: int (default=2) ➡ When the index reaches its capacity, it resizes by multiplying the current capacity by this coefficient, if sufficient memory is available. If resizing fails due to memory limitations, an exception will be thrown. Default value is 2.
  • scalar_kind: string (default=f32) ➡ The scalar kind used to store each vector component. Smaller types reduce memory usage but may decrease precision.

To run vector search, call the vector_search query module: use vector_search.search() for a vector index on nodes and vector_search.search_edges() for a vector index on edges.

Unlike other index types, the query planner currently does not utilize vector indices.

Show vector indices

To retrieve information about vector indices, use vector_search.show_index_info() procedure. Additionally, the same information can be retrieved with the SHOW VECTOR INDEX INFO query.

Output:

  • index_name: string ➡ The name of the vector index.
  • label: string ➡ The name of the label (or edge type) on which the vector index is defined.
  • property: string ➡ The name of the property on which the vector index is indexed.
  • dimension: int ➡ The dimension of vectors in the index.
  • capacity: int ➡ The capacity of the vector index.
  • metric: stringSimilarity metric used for vector search.
  • size: int ➡ The number of entries in the vector index.
  • scalar_kind: string ➡ The scalar kind used for each vector element.
  • index_type: string ➡ The type of the index. For a single-store vector index on nodes, the output is label+property_vector; for an index on edges, it is edge-type+property_vector.

Usage:

CALL vector_search.show_index_info() YIELD * RETURN *;

or

SHOW VECTOR INDEX INFO;

Query vector index

Use vector_search.search() for a vector index on nodes and vector_search.search_edges() for a vector index on edges. These procedures return the closest vectors to a query vector based on the index’s similarity metric.

Input:

  • index_name: string ➡ The vector index to search.
  • limit: int ➡ The number of nearest neighbors to return.
  • search_query: List[float|int] ➡ The vector to query in the index. Providing a different type will result in an exception.

Output:

Vector index on nodes:

  • distance: double ➡ The distance from the node to the query.
  • node: Vertex ➡ A node in the vector index matching the given query.
  • similarity: double ➡ The similarity of the node and the query.

Vector index on edges:

  • distance: double ➡ The distance from the edge to the query.
  • edges: Relationship ➡ An edge in the vector index matching the given query.
  • similarity: double ➡ The similarity of the edge and the query.

Usage:

Vector index on nodes:

CALL vector_search.search("index_name", 1, [2.0, 2.0]) YIELD * RETURN *;

Vector index on edges:

CALL vector_search.search_edges("index_name", 1, [2.0, 2.0]) YIELD * RETURN *;

Similarity metrics

The following table lists the supported similarity metrics for vector search. These metrics determine how similarities between vectors are calculated. Default type for the metric is l2sq (squared Euclidean distance).

MetricDescription
ipInner product (dot product)
cosCosine similarity
l2sqSquared Euclidean distance
pearsonPearson correlation coefficient
haversineHaversine distance (suitable for geographic data)
divergenceA divergence-based metric
hammingHamming distance
tanimotoTanimoto coefficient
sorensenSørensen-Dice coefficient
jaccardJaccard index

Cosine similarity

You can calculate cosine similarity directly in queries using the vector_search.cosine_similarity() function. This is useful when you need to compute similarity between vectors without creating a vector index.

Usage:

RETURN vector_search.cosine_similarity([1.0, 2.0], [1.0, 3.0]) AS similarity;

Scalar kind

The scalar_kind setting determines the data type used for each vector element in the index (USearch). By default, the scalar kind is set to f32 (32-bit floating point), which provides a good balance between precision and memory usage. Alternative options, such as f16 for lower memory usage, allow you to fine-tune this tradeoff based on your specific needs.

ScalarDescription
b1x8Binary format (1 bit per element, stored in 8-bit chunks).
u40Unsigned 40-bit integer.
uuidUniversally unique identifier (UUID).
bf1616-bit floating point (bfloat16).
f6464-bit floating point (double).
f3232-bit floating point (float).
f1616-bit floating point.
f88-bit floating point.
u6464-bit unsigned integer.
u3232-bit unsigned integer.
u1616-bit unsigned integer.
u88-bit unsigned integer.
i6464-bit signed integer.
i3232-bit signed integer.
i1616-bit signed integer.
i88-bit signed integer.

Drop vector index

Vector indices are dropped with the DROP VECTOR INDEX command. You need to give the name of the index to be deleted.

DROP VECTOR INDEX vector_index_name;
⚠️

Single-store vector index (nodes only): When you drop a single-store vector index, Memgraph must move all vector data from USearch back into the property store. Every affected node’s property is rewritten from a vector index ID to the full vector. This can be slow (one write per indexed node) and memory costly (vectors are stored in the property store as 64-bit values, increasing property store size). The same effect occurs when you remove a label from a node that had a vector index on that label: if no other vector index references that property, the vector is restored from USearch to the property store. Plan accordingly when dropping indexes or changing labels on large datasets.

Example

Here is a simple example of vector search usage.

Run Memgraph and Lab

First run Memgraph MAGE with the vector search enabled and configured with the following Docker command:

docker run -p 7687:7687 -p 7444:7444 memgraph/memgraph-mage:latest

Then, run Memgraph Lab, a visual user interface:

docker run -p 3000:3000 memgraph/lab:latest 

You can also run Memgraph’s CLI, mgclient, directly from the Memgraph MAGE Docker container.

Create vector index

After Memgraph MAGE and Lab have been started, head over to the Query execution tab in Memgraph Lab and run the following query to create a vector index on nodes:

CREATE VECTOR INDEX index_name ON :Node(vector) WITH CONFIG {"dimension": 2, "capacity": 1000, "metric": "cos", "resize_coefficient": 2, "scalar_kind": "f16"};

Here metric and scalar_kind use values from the similarity metrics and scalar kind tables.

Then, run the following query to inspect vector index:

CALL vector_search.show_index_info() YIELD * RETURN *;

We can get the same information with the following command:

SHOW VECTOR INDEX INFO;

The above query will result with:

+----------+-----------+-------------+--------+----------+------+-------------+-------------------------+
| capacity | dimension | index_name  | label  | property | size | scalar_kind | index_type              |
+----------+-----------+-------------+--------+----------+------+-------------+-------------------------+
| 2048     | 2         | "index_name"| "Node" | "vector" | 0    | "f16"       | "label+property_vector" |
+----------+-----------+-------------+--------+----------+------+-------------+-------------------------+

Create a node

The next step is to create a node, with Node label and vector property, so it’s properly added to the vector index.

CREATE (n:Node {vector: [2, 2]});

To confirm that the above node has been indexed, let’s check the vector index info again:

CALL vector_search.show_index_info() YIELD * RETURN *;

The above query results in:

+----------+-----------+-------------+--------+----------+------+-------------+-------------------------+
| capacity | dimension | index_name  | label  | property | size | scalar_kind | index_type              |
+----------+-----------+-------------+--------+----------+------+-------------+-------------------------+
| 2048     | 2         | "index_name"| "Node" | "vector" | 1    | "f16"       | "label+property_vector" |
+----------+-----------+-------------+--------+----------+------+-------------+-------------------------+

We can see the size of the index changed, due to one new node.

Query vector index

Let’s search for the similar nodes. Here is an example query to do that:

CALL vector_search.search("index_name", 1, [2.0, 2.0]) YIELD * RETURN *;

We expect to get the most similar node to vector [2.0, 2.0]:

+--------------------------+--------------------------+--------------------------+
| distance                 | node                     | similarity               |
+--------------------------+--------------------------+--------------------------+
| -1.19209e-07             | (:Node {vector: [2, 2]}) | 1                        |
+--------------------------+--------------------------+--------------------------+

The distance is 0 because the two vectors that are being compared with Cosine similarity are the same, leading to the similarity value of 1.

Add more nodes and expand search

Let’s add a couple of more nodes:

CREATE (n:Node {vector: [1, 2]});
CREATE (n:Node {vector: [1, 1]});
CREATE (n:Node {vector: [0, 1]});
CREATE (n:Node {vector: [0, 0]});

Let’s see the status of the index now:

CALL vector_search.show_index_info() YIELD * RETURN *;

The size is now 5, due to 4 additional nodes:

+----------+-----------+-------------+--------+----------+------+-------------+-------------------------+
| capacity | dimension | index_name  | label  | property | size | scalar_kind | index_type              |
+----------+-----------+-------------+--------+----------+------+-------------+-------------------------+
| 2048     | 2         | "index_name"| "Node" | "vector" | 5    | "f16"       | "label+property_vector" |
+----------+-----------+-------------+--------+----------+------+-------------+-------------------------+

Let’s again search for the top five similar nodes to the vector [2.0, 2.0] (to compare it to all nodes we have):

CALL vector_search.search("index_name", 5, [2.0, 2.0]) YIELD * RETURN *;

Notice how we changed the limit to get the top five nearest neighbors. Here are the results:

+--------------------------+--------------------------+--------------------------+
| distance                 | node                     | similarity               |
+--------------------------+--------------------------+--------------------------+
| -1.19209e-07             | (:Node {vector: [1, 1]}) | 1                        |
| -1.19209e-07             | (:Node {vector: [2, 2]}) | 1                        |
| 0.0513167                | (:Node {vector: [1, 2]}) | 0.948683                 |
| 0.292893                 | (:Node {vector: [0, 1]}) | 0.707107                 |
| 1                        | (:Node {vector: [0, 0]}) | 0                        |
+--------------------------+--------------------------+--------------------------+

Since cosine similarity was used as a metric, we have two nodes with the same similarity to the query vector: [1, 1] and [2, 2].