embeddings
The embeddings module computes sentence embeddings for text — either for the string properties of nodes in the graph, or for an ad‑hoc list of strings. Two backends are supported:
- Local (default): a
SentenceTransformermodel from Hugging Face, running on CPU or CUDA inside the Memgraph process. - Remote: any embedding provider supported by LiteLLM — e.g. OpenAI, Ollama, Voyage, Mistral, Bedrock, etc. (and any OpenAI‑compatible endpoint).
Routing is decided by the model_name configuration key: a LiteLLM‑style
provider prefix (for example openai/text-embedding-3-small,
ollama/nomic-embed-text) is executed remotely; anything else (a bare name
like all-MiniLM-L6-v2 or an HF path like BAAI/bge-small-en-v1.5) is loaded
locally. See Remote providers for the full list and
credentials.
| Trait | Value |
|---|---|
| Module type | algorithm |
| Implementation | Python |
| Parallelism | parallel |
Procedures
node_sentence()
The procedure computes the sentence embeddings on the string properties of nodes. Embeddings are created as a property of the nodes in the graph.
Input:
input_nodes: List[Vertex](OPTIONAL) ➡ The list of nodes to compute the embeddings for. If not provided, the embeddings are computed for all nodes in the graph.configuration: (mgp.Map, OPTIONAL): User defined parameters from query module. Defaults to{}.
Configuration options:
| Name | Type | Default | Description |
|---|---|---|---|
embedding_property | string | "embedding" | The name of the property to store the embeddings in. |
excluded_properties | List[string] | ["embedding"] | The list of properties to exclude from the embeddings computation. |
model_name | string | "all-MiniLM-L6-v2" | Model to use. A bare name or HF path (e.g. "BAAI/bge-small-en-v1.5") is loaded locally via sentence-transformers. A LiteLLM provider prefix (e.g. "openai/text-embedding-3-small", "ollama/nomic-embed-text") routes the call to that remote provider — see Remote providers. |
return_embeddings | bool | False | Whether to return the embeddings as an additional output or not. |
batch_size | int | 2000 | The batch size to use for the embeddings computation (local path only). |
chunk_size | int | 48 | The number of batches per “chunk”. This is used when computing embeddings across multiple GPUs, as this has to be done by spawning multiple processes. Each spawned process computes the embeddings for a single chunk. (local multi-GPU path only) |
device | NULL|string| int|List[string|int] | NULL | The device to use for the embeddings computation (see below). Ignored on the remote path. |
The following keys only apply when model_name routes to a remote provider:
| Name | Type | Default | Description |
|---|---|---|---|
api_base | string|NULL | NULL | Override the provider’s HTTP endpoint. Needed when pointing at a self‑hosted Ollama on a non‑default host or an enterprise proxy. For most providers, LiteLLM already knows the default. |
input_type | string | "document" | Forwarded to providers that distinguish document vs query embeddings (Voyage, Cohere). Use "query" when embedding retrieval queries. |
dimensions | int|NULL | NULL | Target output dimension for providers that support server‑side truncation (e.g. OpenAI text-embedding-3-*). |
timeout | int | 60 | Per‑HTTP‑call timeout in seconds. |
num_retries | int | 3 | Automatic retries on transient failures (429, 5xx, connection errors), handled by LiteLLM with backoff. |
normalize | bool | True | L2‑normalize returned vectors client‑side so behavior matches the local path (which normalizes by default). |
remote_batch_size | int|NULL | NULL | Items per HTTP request. If NULL, defaults to 96 for Cohere (its hard per-call cap; exceeding causes HTTP 400) and 256 for everyone else. Override with remote_batch_size for throughput tuning. |
concurrency | int | 4 | Number of in‑flight HTTP requests per call. Capped at 32 to bound the worst‑case thread count when many Cypher queries run concurrently. Higher values improve single‑query latency for large inputs but multiply against concurrent Cypher queries — be mindful of provider rate limits. |
The device parameter can be one of the following:
NULL(default) - Use first GPU if available, otherwise use CPU."cpu"- Use CPU for computation."cuda"or"all"- Use all available CUDA devices for computation."cuda:id"- Use a specific CUDA device for computation.id- Use a specific device for computation.[id1, id2, ...]- Use a list of device ids for computation.["cuda:id1", "cuda:id2", ...]- Use a list of CUDA devices for computation.
Note: If you’re running on a GPU device, make sure to start your container
with the --gpus=all flag.
For more details, see the Install MAGE
documentation.
Output:
success: bool➡ Whether the embeddings computation was successful.falseon any unrecoverable error (network, auth, invalid input); the procedure never throws.embeddings: List[List[float]]|NULL➡ The list of embeddings. Only returned if thereturn_embeddingsparameter is set totruein the configuration, otherwiseNULL.dimension: int|NULL➡ The dimension of the embeddings.NULLwhensuccessisfalse, or for empty input on the remote path.
Usage:
To compute the embeddings across the entire graph with the default parameters, use the following query:
CALL embeddings.node_sentence()
YIELD success;To compute the embeddings for a specific list of nodes, use the following query:
MATCH (n)
WITH n ORDER BY id(n)
LIMIT 5
WITH collect(n) AS subset
CALL embeddings.node_sentence(subset)
YIELD success;To run the computation on specific device(s), use the following query:
WITH {device: "cuda:1"} AS configuration
CALL embeddings.node_sentence(NULL, configuration)
YIELD success;To return the embeddings as an additional output, use the following query:
WITH {return_embeddings: True} AS configuration
CALL embeddings.node_sentence(NULL, configuration)
YIELD success, embeddings;text()
This procedure can be used to return a list of embeddings when given a list of strings.
Input:
strings: List[string]➡ The list of strings to compute the embeddings for.configuration: mgp.Map(OPTIONAL) ➡ User defined parameters from query module. Defaults to{}.
Configuration options:
| Name | Type | Default | Description |
|---|---|---|---|
model_name | string | "all-MiniLM-L6-v2" | Model to use. A bare name or HF path (e.g. "BAAI/bge-small-en-v1.5") is loaded locally via sentence-transformers. A LiteLLM provider prefix (e.g. "openai/text-embedding-3-small", "ollama/nomic-embed-text") routes the call to that remote provider — see Remote providers. |
batch_size | int | 2000 | The batch size to use for the embeddings computation (local path only). |
chunk_size | int | 48 | The number of batches per “chunk”. This is used when computing embeddings across multiple GPUs, as this has to be done by spawning multiple processes. Each spawned process computes the embeddings for a single chunk (local multi-GPU path only). |
device | NULL|string| int|List[string|int] | NULL | The device to use for the embeddings computation. Ignored on the remote path. |
The following keys only apply when model_name routes to a remote provider:
| Name | Type | Default | Description |
|---|---|---|---|
api_base | string|NULL | NULL | Override the provider’s HTTP endpoint. Needed when pointing at a self‑hosted Ollama on a non‑default host or an enterprise proxy. For most providers, LiteLLM already knows the default. |
input_type | string | "document" | Forwarded to providers that distinguish document vs query embeddings (Voyage, Cohere). Use "query" when embedding retrieval queries. |
dimensions | int|NULL | NULL | Target output dimension for providers that support server‑side truncation (e.g. OpenAI text-embedding-3-*). |
timeout | int | 60 | Per‑HTTP‑call timeout in seconds. |
num_retries | int | 3 | Automatic retries on transient failures (429, 5xx, connection errors), handled by LiteLLM with backoff. |
normalize | bool | True | L2‑normalize returned vectors client‑side so behavior matches the local path (which normalizes by default). |
remote_batch_size | int|NULL | NULL | Items per HTTP request. If NULL, defaults to 96 for Cohere (its hard per-call cap; exceeding causes HTTP 400) and 256 for everyone else. Override with remote_batch_size for throughput tuning. |
concurrency | int | 4 | Number of in‑flight HTTP requests per call. Capped at 32 to bound the worst‑case thread count when many Cypher queries run concurrently. Higher values improve single‑query latency for large inputs but multiply against concurrent Cypher queries — be mindful of provider rate limits. |
Output:
success: bool➡ Whether the embeddings computation was successful.falseon any unrecoverable error; the procedure never throws.embeddings: List[List[float]]|NULL➡ The list of embeddings, orNULLon failure.dimension: int|NULL➡ The dimension of the embeddings.NULLwhensuccessisfalse.
Usage:
To compute the embeddings for a list of strings, use the following query:
CALL embeddings.text(["Hello", "World"])
YIELD success, embeddings;model_info()
The procedure returns the information about the model used for the embeddings computation.
Input:
configuration: mgp.Map(OPTIONAL) ➡ User defined parameters from query module. Defaults to{}. The keymodel_nameis used to specify the name of the model to use for the embeddings computation.
Output:
info: mgp.Map➡ The information about the model used for the embeddings computation. Contents:
| Name | Type | Description |
|---|---|---|
model_name | string | The model name that was supplied (e.g. "all-MiniLM-L6-v2", "openai/text-embedding-3-small"). |
dimension | int | The dimension of the embeddings. On the remote path this is discovered by making a single probe call to the provider (cached per (model_name, api_base) for the process). |
max_sequence_length | int|NULL | The maximum input sequence length for the local SentenceTransformer model. NULL on the remote path — providers don’t expose this uniformly. |
Remote providers
To use a remote embedding provider, set model_name to a LiteLLM
provider‑prefixed name. The procedure routes the call through
LiteLLM; any bare name or HF path
continues to load locally via sentence-transformers as before. A few
working examples:
"openai/text-embedding-3-small"— OpenAI, 1536 dims (or setdimensionsfor server‑side truncation)"openai/text-embedding-3-large"— OpenAI, 3072 dims"azure/<your-deployment-name>"— Azure OpenAI"ollama/nomic-embed-text"— self‑hosted Ollama"cohere/embed-english-v3.0"— Cohere (passinput_type: "query"for queries)"voyage/voyage-3"— Voyage AI (passinput_type: "query"for queries)"mistral/mistral-embed"— Mistral"jina_ai/jina-embeddings-v3"— Jina"bedrock/amazon.titan-embed-text-v2:0"— AWS Bedrock
For the authoritative list of supported providers and model identifiers see the LiteLLM providers index.
Credentials
API keys are not accepted through Cypher — they are read from the Memgraph
process environment by LiteLLM, using the canonical variable name for each
provider. Set the relevant variable in the environment where Memgraph is
running (container -e flag, systemd unit, shell export, etc.):
| Provider prefix | Env vars LiteLLM reads |
|---|---|
openai/ | OPENAI_API_KEY (and OPENAI_API_BASE if set) |
azure/ | AZURE_API_KEY, AZURE_API_BASE, AZURE_API_VERSION |
cohere/ | COHERE_API_KEY |
voyage/ | VOYAGE_API_KEY |
jina_ai/ | JINA_AI_API_KEY |
mistral/ | MISTRAL_API_KEY |
ollama/ | No key; api_base defaults to http://localhost:11434 |
bedrock/ | AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION_NAME |
vertex_ai/ | GOOGLE_APPLICATION_CREDENTIALS (path to a service‑account JSON) |
huggingface/ | HUGGINGFACE_API_KEY |
together_ai/ | TOGETHERAI_API_KEY |
fireworks_ai/ | FIREWORKS_AI_API_KEY |
replicate/ | REPLICATE_API_KEY |
deepinfra/ | DEEPINFRA_API_KEY |
groq/ | GROQ_API_KEY |
If the required variable is missing, the call returns success: false and
the reason is logged by Memgraph (look for lines starting with
Remote path failed:).
Running the container with API access
When Memgraph runs in Docker, provider API keys must be passed through to the container’s environment, not just exported in your host shell. For example, with OpenAI:
docker run -p 7687:7687 \
-e OPENAI_API_KEY="$OPENAI_API_KEY" \
memgraph/memgraph-mageYou can verify the variable made it across with
docker inspect --format '{{.Config.Env}}' <container>.
The same pattern applies to every other provider — replace OPENAI_API_KEY
with the variable(s) from the table above (e.g. -e COHERE_API_KEY=...,
-e AWS_ACCESS_KEY_ID=... -e AWS_SECRET_ACCESS_KEY=...).
Reaching a local Ollama from Memgraph
Ollama is a special case because the daemon runs outside Memgraph. Three common setups:
- Memgraph runs natively (not in Docker) — the default
api_base(http://localhost:11434) works out of the box. - Memgraph in Docker, Ollama on the host — bind Ollama to all interfaces
and point
api_baseat the host gateway:Then in Cypher:# on the host OLLAMA_HOST=0.0.0.0:11434 ollama serve # Linux only: add the host-gateway alias on docker run docker run -p 7687:7687 \ --add-host=host.docker.internal:host-gateway \ memgraph/memgraph-mageapi_base: "http://host.docker.internal:11434". - Both in Docker on a shared network (most portable):
Then in Cypher:
docker network create mg_net docker run -d --name ollama --network mg_net -v ollama:/root/.ollama ollama/ollama docker exec ollama ollama pull nomic-embed-text docker run -d --name memgraph --network mg_net -p 7687:7687 memgraph/memgraph-mageapi_base: "http://ollama:11434".
Credentials live in the Memgraph process environment, not in Cypher. This keeps API keys out of query logs and audit trails. If you need different keys per graph or per tenant, run separate Memgraph instances with their own environment.
Examples
OpenAI — embed a list of strings:
CALL embeddings.text(
["hello world", "graph databases are fun"],
{model_name: "openai/text-embedding-3-small"}
)
YIELD success, embeddings, dimension
RETURN success, dimension, size(embeddings) AS n;OpenAI — smaller vectors via server‑side truncation:
CALL embeddings.text(
["hello"],
{model_name: "openai/text-embedding-3-small", dimensions: 768}
)
YIELD dimension
RETURN dimension; // -> 768OpenAI — write back embeddings to nodes:
MATCH (n:Doc) WITH collect(n) AS nodes
CALL embeddings.node_sentence(nodes, {model_name: "openai/text-embedding-3-small"})
YIELD success, dimension
RETURN success, dimension;Ollama — embed against a local daemon:
CALL embeddings.text(
["hello world", "graph databases are fun"],
{model_name: "ollama/nomic-embed-text"}
)
YIELD success, embeddings, dimension
RETURN success, dimension, size(embeddings) AS n;If Memgraph is in a container and Ollama is on the host, pass the reachable URL:
CALL embeddings.text(
["hello world"],
{model_name: "ollama/nomic-embed-text",
api_base: "http://host.docker.internal:11434"}
)
YIELD success, dimension RETURN success, dimension;Cohere / Voyage — embedding queries rather than documents:
CALL embeddings.text(
["what is a graph database?"],
{model_name: "voyage/voyage-3", input_type: "query"}
)
YIELD success, embeddings, dimension
RETURN success, dimension;Inspect the active remote model:
CALL embeddings.model_info({model_name: "openai/text-embedding-3-small"})
YIELD info RETURN info;
// -> {model_name: "openai/text-embedding-3-small", dimension: 1536, max_sequence_length: null}Example
Create the following graph:
CREATE (a:Node {id: 1, Title: "Stilton", Description: "A stinky cheese from the UK"}),
(b:Node {id: 2, Title: "Roquefort", Description: "A blue cheese from France"}),
(c:Node {id: 3, Title: "Cheddar", Description: "A yellow cheese from the UK"}),
(d:Node {id: 4, Title: "Gouda", Description: "A Dutch cheese"}),
(e:Node {id: 5, Title: "Parmesan", Description: "An Italian cheese"}),
(f:Node {id: 6, Title: "Red Leicester", Description: "The best cheese in the world"});Run the following query to compute the embeddings:
CALL embeddings.node_sentence()
YIELD success;
MATCH (n)
WHERE n.embedding IS NOT NULL
RETURN n.Title, n.embedding;Results:
+---------+
| success |
+---------+
| true |
+---------+
+----------------------------------------------------------------------+----------------------------------------------------------------------+
| n.Title | n.embedding |
+----------------------------------------------------------------------+----------------------------------------------------------------------+
| "Stilton" | [-0.0485366, -0.021823, 0.0159757, 0.0376443, 0.00594089, -0.0044... |
| "Roquefort" | [-0.0252884, 0.0250485, -0.0249728, 0.0571037, 0.0386177, 0.03863... |
| "Cheddar" | [-0.0129724, -0.00756301, -0.00379329, 0.0037531, -0.0134941, 0.0... |
| "Gouda" | [0.0128716, 0.025435, -0.0288951, 0.0177759, -0.0624398, 0.043577... |
| "Parmesan" | [-0.0755439, 0.00906182, -0.010977, 0.0208911, -0.0527448, 0.0085... |
| "Red Leicester" | [-0.0244318, -0.0280038, -0.0373183, 0.0284436, -0.0277753, 0.066... |
+----------------------------------------------------------------------+----------------------------------------------------------------------+To compute the embeddings for a list of strings, use the following query:
CALL embeddings.text(["Hello", "World"])
YIELD success, embeddings;Results:
+----------------------------------------------------------+----------------------------------------------------------------------------------+
| success | embeddings |
+----------------------------------------------------------+----------------------------------------------------------------------------------+
| true | [[-0.0627718, 0.0549588, 0.0521648, 0.08579, -0.0827489, -0.074573, 0.0685547... |
+----------------------------------------------------------+----------------------------------------------------------------------------------+To get the information about the model used for the embeddings computation, use the following query:
CALL embeddings.model_info()
YIELD info;Results:
+----------------------------------------------------------------------------+
| info |
+----------------------------------------------------------------------------+
| {dimension: 384, max_sequence_length: 256, model_name: "all-MiniLM-L6-v2"} |
+----------------------------------------------------------------------------+