embeddings

The embeddings module computes sentence embeddings for text — either for the string properties of nodes in the graph, or for an ad‑hoc list of strings. Two backends are supported:

  • Local (default): a SentenceTransformer model from Hugging Face, running on CPU or CUDA inside the Memgraph process.
  • Remote: any embedding provider supported by LiteLLM — e.g. OpenAI, Ollama, Voyage, Mistral, Bedrock, etc. (and any OpenAI‑compatible endpoint).

Routing is decided by the model_name configuration key: a LiteLLM‑style provider prefix (for example openai/text-embedding-3-small, ollama/nomic-embed-text) is executed remotely; anything else (a bare name like all-MiniLM-L6-v2 or an HF path like BAAI/bge-small-en-v1.5) is loaded locally. See Remote providers for the full list and credentials.

TraitValue
Module typealgorithm
ImplementationPython
Parallelismparallel

Procedures

node_sentence()

The procedure computes the sentence embeddings on the string properties of nodes. Embeddings are created as a property of the nodes in the graph.

Input:

  • input_nodes: List[Vertex] (OPTIONAL) ➡ The list of nodes to compute the embeddings for. If not provided, the embeddings are computed for all nodes in the graph.
  • configuration: (mgp.Map, OPTIONAL): User defined parameters from query module. Defaults to {}.

Configuration options:

NameTypeDefaultDescription
embedding_propertystring"embedding"The name of the property to store the embeddings in.
excluded_propertiesList[string]["embedding"]The list of properties to exclude from the embeddings computation.
model_namestring"all-MiniLM-L6-v2"Model to use. A bare name or HF path (e.g. "BAAI/bge-small-en-v1.5") is loaded locally via sentence-transformers. A LiteLLM provider prefix (e.g. "openai/text-embedding-3-small", "ollama/nomic-embed-text") routes the call to that remote provider — see Remote providers.
return_embeddingsboolFalseWhether to return the embeddings as an additional output or not.
batch_sizeint2000The batch size to use for the embeddings computation (local path only).
chunk_sizeint48The number of batches per “chunk”. This is used when computing embeddings across multiple GPUs, as this has to be done by spawning multiple processes. Each spawned process computes the embeddings for a single chunk. (local multi-GPU path only)
deviceNULL|string| int|List[string|int]NULLThe device to use for the embeddings computation (see below). Ignored on the remote path.

The following keys only apply when model_name routes to a remote provider:

NameTypeDefaultDescription
api_basestring|NULLNULLOverride the provider’s HTTP endpoint. Needed when pointing at a self‑hosted Ollama on a non‑default host or an enterprise proxy. For most providers, LiteLLM already knows the default.
input_typestring"document"Forwarded to providers that distinguish document vs query embeddings (Voyage, Cohere). Use "query" when embedding retrieval queries.
dimensionsint|NULLNULLTarget output dimension for providers that support server‑side truncation (e.g. OpenAI text-embedding-3-*).
timeoutint60Per‑HTTP‑call timeout in seconds.
num_retriesint3Automatic retries on transient failures (429, 5xx, connection errors), handled by LiteLLM with backoff.
normalizeboolTrueL2‑normalize returned vectors client‑side so behavior matches the local path (which normalizes by default).
remote_batch_sizeint|NULLNULLItems per HTTP request. If NULL, defaults to 96 for Cohere (its hard per-call cap; exceeding causes HTTP 400) and 256 for everyone else. Override with remote_batch_size for throughput tuning.
concurrencyint4Number of in‑flight HTTP requests per call. Capped at 32 to bound the worst‑case thread count when many Cypher queries run concurrently. Higher values improve single‑query latency for large inputs but multiply against concurrent Cypher queries — be mindful of provider rate limits.

The device parameter can be one of the following:

  • NULL (default) - Use first GPU if available, otherwise use CPU.
  • "cpu" - Use CPU for computation.
  • "cuda" or "all" - Use all available CUDA devices for computation.
  • "cuda:id" - Use a specific CUDA device for computation.
  • id - Use a specific device for computation.
  • [id1, id2, ...] - Use a list of device ids for computation.
  • ["cuda:id1", "cuda:id2", ...] - Use a list of CUDA devices for computation.

Note: If you’re running on a GPU device, make sure to start your container with the --gpus=all flag.
For more details, see the Install MAGE documentation.

Output:

  • success: bool ➡ Whether the embeddings computation was successful. false on any unrecoverable error (network, auth, invalid input); the procedure never throws.
  • embeddings: List[List[float]]|NULL ➡ The list of embeddings. Only returned if the return_embeddings parameter is set to true in the configuration, otherwise NULL.
  • dimension: int|NULL ➡ The dimension of the embeddings. NULL when success is false, or for empty input on the remote path.

Usage:

To compute the embeddings across the entire graph with the default parameters, use the following query:

CALL embeddings.node_sentence()
YIELD success;

To compute the embeddings for a specific list of nodes, use the following query:

MATCH (n)
WITH n ORDER BY id(n)
LIMIT 5
WITH collect(n) AS subset
CALL embeddings.node_sentence(subset)
YIELD success;

To run the computation on specific device(s), use the following query:

WITH {device: "cuda:1"} AS configuration
CALL embeddings.node_sentence(NULL, configuration)
YIELD success;

To return the embeddings as an additional output, use the following query:

WITH {return_embeddings: True} AS configuration
CALL embeddings.node_sentence(NULL, configuration)
YIELD success, embeddings;

text()

This procedure can be used to return a list of embeddings when given a list of strings.

Input:

  • strings: List[string] ➡ The list of strings to compute the embeddings for.
  • configuration: mgp.Map (OPTIONAL) ➡ User defined parameters from query module. Defaults to {}.

Configuration options:

NameTypeDefaultDescription
model_namestring"all-MiniLM-L6-v2"Model to use. A bare name or HF path (e.g. "BAAI/bge-small-en-v1.5") is loaded locally via sentence-transformers. A LiteLLM provider prefix (e.g. "openai/text-embedding-3-small", "ollama/nomic-embed-text") routes the call to that remote provider — see Remote providers.
batch_sizeint2000The batch size to use for the embeddings computation (local path only).
chunk_sizeint48The number of batches per “chunk”. This is used when computing embeddings across multiple GPUs, as this has to be done by spawning multiple processes. Each spawned process computes the embeddings for a single chunk (local multi-GPU path only).
deviceNULL|string| int|List[string|int]NULLThe device to use for the embeddings computation. Ignored on the remote path.

The following keys only apply when model_name routes to a remote provider:

NameTypeDefaultDescription
api_basestring|NULLNULLOverride the provider’s HTTP endpoint. Needed when pointing at a self‑hosted Ollama on a non‑default host or an enterprise proxy. For most providers, LiteLLM already knows the default.
input_typestring"document"Forwarded to providers that distinguish document vs query embeddings (Voyage, Cohere). Use "query" when embedding retrieval queries.
dimensionsint|NULLNULLTarget output dimension for providers that support server‑side truncation (e.g. OpenAI text-embedding-3-*).
timeoutint60Per‑HTTP‑call timeout in seconds.
num_retriesint3Automatic retries on transient failures (429, 5xx, connection errors), handled by LiteLLM with backoff.
normalizeboolTrueL2‑normalize returned vectors client‑side so behavior matches the local path (which normalizes by default).
remote_batch_sizeint|NULLNULLItems per HTTP request. If NULL, defaults to 96 for Cohere (its hard per-call cap; exceeding causes HTTP 400) and 256 for everyone else. Override with remote_batch_size for throughput tuning.
concurrencyint4Number of in‑flight HTTP requests per call. Capped at 32 to bound the worst‑case thread count when many Cypher queries run concurrently. Higher values improve single‑query latency for large inputs but multiply against concurrent Cypher queries — be mindful of provider rate limits.

Output:

  • success: bool ➡ Whether the embeddings computation was successful. false on any unrecoverable error; the procedure never throws.
  • embeddings: List[List[float]]|NULL ➡ The list of embeddings, or NULL on failure.
  • dimension: int|NULL ➡ The dimension of the embeddings. NULL when success is false.

Usage:

To compute the embeddings for a list of strings, use the following query:

CALL embeddings.text(["Hello", "World"])
YIELD success, embeddings;

model_info()

The procedure returns the information about the model used for the embeddings computation.

Input:

  • configuration: mgp.Map (OPTIONAL) ➡ User defined parameters from query module. Defaults to {}. The key model_name is used to specify the name of the model to use for the embeddings computation.

Output:

  • info: mgp.Map ➡ The information about the model used for the embeddings computation. Contents:
NameTypeDescription
model_namestringThe model name that was supplied (e.g. "all-MiniLM-L6-v2", "openai/text-embedding-3-small").
dimensionintThe dimension of the embeddings. On the remote path this is discovered by making a single probe call to the provider (cached per (model_name, api_base) for the process).
max_sequence_lengthint|NULLThe maximum input sequence length for the local SentenceTransformer model. NULL on the remote path — providers don’t expose this uniformly.

Remote providers

To use a remote embedding provider, set model_name to a LiteLLM provider‑prefixed name. The procedure routes the call through LiteLLM; any bare name or HF path continues to load locally via sentence-transformers as before. A few working examples:

  • "openai/text-embedding-3-small" — OpenAI, 1536 dims (or set dimensions for server‑side truncation)
  • "openai/text-embedding-3-large" — OpenAI, 3072 dims
  • "azure/<your-deployment-name>" — Azure OpenAI
  • "ollama/nomic-embed-text" — self‑hosted Ollama
  • "cohere/embed-english-v3.0" — Cohere (pass input_type: "query" for queries)
  • "voyage/voyage-3" — Voyage AI (pass input_type: "query" for queries)
  • "mistral/mistral-embed" — Mistral
  • "jina_ai/jina-embeddings-v3" — Jina
  • "bedrock/amazon.titan-embed-text-v2:0" — AWS Bedrock

For the authoritative list of supported providers and model identifiers see the LiteLLM providers index.

Credentials

API keys are not accepted through Cypher — they are read from the Memgraph process environment by LiteLLM, using the canonical variable name for each provider. Set the relevant variable in the environment where Memgraph is running (container -e flag, systemd unit, shell export, etc.):

Provider prefixEnv vars LiteLLM reads
openai/OPENAI_API_KEY (and OPENAI_API_BASE if set)
azure/AZURE_API_KEY, AZURE_API_BASE, AZURE_API_VERSION
cohere/COHERE_API_KEY
voyage/VOYAGE_API_KEY
jina_ai/JINA_AI_API_KEY
mistral/MISTRAL_API_KEY
ollama/No key; api_base defaults to http://localhost:11434
bedrock/AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION_NAME
vertex_ai/GOOGLE_APPLICATION_CREDENTIALS (path to a service‑account JSON)
huggingface/HUGGINGFACE_API_KEY
together_ai/TOGETHERAI_API_KEY
fireworks_ai/FIREWORKS_AI_API_KEY
replicate/REPLICATE_API_KEY
deepinfra/DEEPINFRA_API_KEY
groq/GROQ_API_KEY

If the required variable is missing, the call returns success: false and the reason is logged by Memgraph (look for lines starting with Remote path failed:).

Running the container with API access

When Memgraph runs in Docker, provider API keys must be passed through to the container’s environment, not just exported in your host shell. For example, with OpenAI:

docker run -p 7687:7687 \
  -e OPENAI_API_KEY="$OPENAI_API_KEY" \
  memgraph/memgraph-mage

You can verify the variable made it across with docker inspect --format '{{.Config.Env}}' <container>.

The same pattern applies to every other provider — replace OPENAI_API_KEY with the variable(s) from the table above (e.g. -e COHERE_API_KEY=..., -e AWS_ACCESS_KEY_ID=... -e AWS_SECRET_ACCESS_KEY=...).

Reaching a local Ollama from Memgraph

Ollama is a special case because the daemon runs outside Memgraph. Three common setups:

  1. Memgraph runs natively (not in Docker) — the default api_base (http://localhost:11434) works out of the box.
  2. Memgraph in Docker, Ollama on the host — bind Ollama to all interfaces and point api_base at the host gateway:
    # on the host
    OLLAMA_HOST=0.0.0.0:11434 ollama serve
    # Linux only: add the host-gateway alias on docker run
    docker run -p 7687:7687 \
      --add-host=host.docker.internal:host-gateway \
      memgraph/memgraph-mage
    Then in Cypher: api_base: "http://host.docker.internal:11434".
  3. Both in Docker on a shared network (most portable):
    docker network create mg_net
    docker run -d --name ollama --network mg_net -v ollama:/root/.ollama ollama/ollama
    docker exec ollama ollama pull nomic-embed-text
    docker run -d --name memgraph --network mg_net -p 7687:7687 memgraph/memgraph-mage
    Then in Cypher: api_base: "http://ollama:11434".

Credentials live in the Memgraph process environment, not in Cypher. This keeps API keys out of query logs and audit trails. If you need different keys per graph or per tenant, run separate Memgraph instances with their own environment.

Examples

OpenAI — embed a list of strings:

CALL embeddings.text(
  ["hello world", "graph databases are fun"],
  {model_name: "openai/text-embedding-3-small"}
)
YIELD success, embeddings, dimension
RETURN success, dimension, size(embeddings) AS n;

OpenAI — smaller vectors via server‑side truncation:

CALL embeddings.text(
  ["hello"],
  {model_name: "openai/text-embedding-3-small", dimensions: 768}
)
YIELD dimension
RETURN dimension;  // -> 768

OpenAI — write back embeddings to nodes:

MATCH (n:Doc) WITH collect(n) AS nodes
CALL embeddings.node_sentence(nodes, {model_name: "openai/text-embedding-3-small"})
YIELD success, dimension
RETURN success, dimension;

Ollama — embed against a local daemon:

CALL embeddings.text(
  ["hello world", "graph databases are fun"],
  {model_name: "ollama/nomic-embed-text"}
)
YIELD success, embeddings, dimension
RETURN success, dimension, size(embeddings) AS n;

If Memgraph is in a container and Ollama is on the host, pass the reachable URL:

CALL embeddings.text(
  ["hello world"],
  {model_name: "ollama/nomic-embed-text",
   api_base: "http://host.docker.internal:11434"}
)
YIELD success, dimension RETURN success, dimension;

Cohere / Voyage — embedding queries rather than documents:

CALL embeddings.text(
  ["what is a graph database?"],
  {model_name: "voyage/voyage-3", input_type: "query"}
)
YIELD success, embeddings, dimension
RETURN success, dimension;

Inspect the active remote model:

CALL embeddings.model_info({model_name: "openai/text-embedding-3-small"})
YIELD info RETURN info;
// -> {model_name: "openai/text-embedding-3-small", dimension: 1536, max_sequence_length: null}

Example

Create the following graph:

CREATE (a:Node {id: 1, Title: "Stilton", Description: "A stinky cheese from the UK"}),
(b:Node {id: 2, Title: "Roquefort", Description: "A blue cheese from France"}),
(c:Node {id: 3, Title: "Cheddar", Description: "A yellow cheese from the UK"}),
(d:Node {id: 4, Title: "Gouda", Description: "A Dutch cheese"}),
(e:Node {id: 5, Title: "Parmesan", Description: "An Italian cheese"}),
(f:Node {id: 6, Title: "Red Leicester", Description: "The best cheese in the world"});

Run the following query to compute the embeddings:

CALL embeddings.node_sentence()
YIELD success;
 
MATCH (n) 
WHERE n.embedding IS NOT NULL 
RETURN n.Title, n.embedding;

Results:

+---------+
| success |
+---------+
| true    |
+---------+
+----------------------------------------------------------------------+----------------------------------------------------------------------+
| n.Title                                                              | n.embedding                                                          |
+----------------------------------------------------------------------+----------------------------------------------------------------------+
| "Stilton"                                                            | [-0.0485366, -0.021823, 0.0159757, 0.0376443, 0.00594089, -0.0044... |
| "Roquefort"                                                          | [-0.0252884, 0.0250485, -0.0249728, 0.0571037, 0.0386177, 0.03863... |
| "Cheddar"                                                            | [-0.0129724, -0.00756301, -0.00379329, 0.0037531, -0.0134941, 0.0... |
| "Gouda"                                                              | [0.0128716, 0.025435, -0.0288951, 0.0177759, -0.0624398, 0.043577... |
| "Parmesan"                                                           | [-0.0755439, 0.00906182, -0.010977, 0.0208911, -0.0527448, 0.0085... |
| "Red Leicester"                                                      | [-0.0244318, -0.0280038, -0.0373183, 0.0284436, -0.0277753, 0.066... |
+----------------------------------------------------------------------+----------------------------------------------------------------------+

To compute the embeddings for a list of strings, use the following query:

CALL embeddings.text(["Hello", "World"])
YIELD success, embeddings;

Results:

+----------------------------------------------------------+----------------------------------------------------------------------------------+
| success                                                  | embeddings                                                                       |
+----------------------------------------------------------+----------------------------------------------------------------------------------+
| true                                                     | [[-0.0627718, 0.0549588, 0.0521648, 0.08579, -0.0827489, -0.074573, 0.0685547... |
+----------------------------------------------------------+----------------------------------------------------------------------------------+

To get the information about the model used for the embeddings computation, use the following query:

CALL embeddings.model_info()
YIELD info;

Results:

+----------------------------------------------------------------------------+
| info                                                                       |
+----------------------------------------------------------------------------+
| {dimension: 384, max_sequence_length: 256, model_name: "all-MiniLM-L6-v2"} |
+----------------------------------------------------------------------------+