Atomic GraphRAG Pipelines

Memgraph provides a comprehensive suite of tools and algorithms that make building GraphRAG systems both efficient and scalable. These tools, known as “Atomic GraphRAG Pipelines,” empower you (or an agent) to implement sophisticated pipelines as atomic database queries.

Here’s how each feature plays a role in enhancing your Retrieval-Augmented Generation workflows:

Vector search: A pivotal first step for extracting relevant information during pivot search. Vector search allows you to perform semantic searches, matching context or meaning rather than exact terms.
Deep-path traversals: Enables rapid navigation through complex relationships in your graph, supporting multi-hop queries and reasoning across interconnected data.
Leiden community detection: A faster, more accurate algorithm than Louvain, Leiden ensures well-connected communities, making it ideal for relevance expansion in retrieval processes.
Louvain and dynamic community detection: Louvain is a foundational clustering algorithm for discovering communities, while dynamic community detection adapts in real time, accommodating graph updates without compromising cluster integrity.
PageRank and dynamic PageRank: PageRank ranks nodes based on their influence or importance. Its dynamic counterpart adjusts these rankings as new data is ingested, ensuring retrieval results remain relevant.
Text search: Facilitates keyword-based searches, allowing you to locate specific entities or relationships within your graph quickly. This complements vector search for hybrid retrieval strategies.
Geospatial search: Geospatial search enables you to integrate and retrieve location-based insights, enhancing the relevance and precision of responses.
Run-time schema tracking under SHOW SCHEMA INFO query: Tracks schema changes as they happen, enabling seamless updates and adaptations in your graph structure without manual intervention. This is especially useful if you’re including schema into the LLM context.
Real-time data ingestion: Memgraph supports seamless integration with streaming platforms like Kafka, Redpanda, and Pulsar, allowing you to continuously grow and update your knowledge graph. Combine this with Memgraph MAGE or custom procedures to dynamically query and analyze incoming data.
Embedding procedures embeddings.text and embeddings.node_sentence: Offer the ability to compute embeddings during the preprocessing or retrieval stages.
llm.complete function: Allows you to call any LLM under any given Cypher query.
Server-side parameters: Could be used for many different things, but in this context, the parameters help you with managing configuration under any given query or pipeline.

Question and Pipeline Types

With these building blocks, you can design a tailored GraphRAG pipeline to answer any type of question in the most effective way. There is a logical progression in how these pipelines are applied: Text2Cypher is the most precise and efficient option when a concrete value or node in the graph directly contains the answer. If the answer isn’t encapsulated in a single node or relationship, a form of QFS (Query-Focused Summarization) comes into play, synthesizing information from across the entire dataset to construct an appropriate response.

Analytical - When the answer exists as a concrete value in your graph and you need an exact, deterministic result, use Text2Cypher. The system translates your natural language question into a Cypher query, so you never need to learn the query language yourself. This works well for business metrics that live as properties on nodes or relationships: “What is the CAC for our enterprise segment last quarter?”, “Which portfolio companies have an ROI above 30%?”, or “How many open invoices are tied to this customer?”
Local - To retrieve a focused, contextually-rich subgraph rather than a single value, use Local Graph Search. A vector search surfaces the most semantically relevant entry point, and then graph traversal expands outward through relationships to gather surrounding context. The classic example: searching for “Ronaldo” in a sports recommendation graph should return Cristiano Ronaldo first, but the second result should be Messi, not the Brazilian Ronaldo, because the graph encodes co-occurrence and engagement patterns showing that users who interact with Cristiano Ronaldo content are far more likely to engage with Messi content next. The relationship matters more than the name.
Global - For questions that cannot be answered by finding something specific in the dataset, but instead require reasoning across the entire corpus to surface what is absent, underrepresented, or emergent, apply Query-focused Summarization. QFS can also be used as the inverse of local search: instead of retrieving what exists, it synthesizes what is missing or not well-covered. Example use cases:
- Legal: Given millions of case files and a new incoming case, “What are the recommended next steps and which precedents are most relevant to upsell?”
- Finance: Given a corpus of S-1 filings, “What risks or topics does this sector consistently fail to address?” or “What does Nvidia’s S-1 not cover compared to peers?”
- Publishing: “I want to write a book, which important topics in this domain are not yet well covered in the existing literature?”

atomic_graphrag_pipelines

Knowledge graph creation Text2Cypher