AI ecosystemGraphRAGAtomic Pipelines

Atomic GraphRAG Pipelines

Memgraph provides a comprehensive suite of tools and algorithms that make building GraphRAG systems both efficient and scalable. These tools, known as “Atomic GraphRAG Pipelines,” empower you (or an agent) to implement sophisticated pipelines as atomic database queries.

Here’s how each feature plays a role in enhancing your Retrieval-Augmented Generation workflows:

  1. Vector search: A pivotal first step for extracting relevant information during pivot search. Vector search allows you to perform semantic searches, matching context or meaning rather than exact terms.
  2. Deep-path traversals: Enables rapid navigation through complex relationships in your graph, supporting multi-hop queries and reasoning across interconnected data.
  3. Leiden community detection: A faster, more accurate algorithm than Louvain, Leiden ensures well-connected communities, making it ideal for relevance expansion in retrieval processes.
  4. Louvain and dynamic community detection: Louvain is a foundational clustering algorithm for discovering communities, while dynamic community detection adapts in real time, accommodating graph updates without compromising cluster integrity.
  5. PageRank and dynamic PageRank: PageRank ranks nodes based on their influence or importance. Its dynamic counterpart adjusts these rankings as new data is ingested, ensuring retrieval results remain relevant.
  6. Text search: Facilitates keyword-based searches, allowing you to locate specific entities or relationships within your graph quickly. This complements vector search for hybrid retrieval strategies.
  7. Geospatial search: Geospatial search enables you to integrate and retrieve location-based insights, enhancing the relevance and precision of responses.
  8. Run-time schema tracking under SHOW SCHEMA INFO query: Tracks schema changes as they happen, enabling seamless updates and adaptations in your graph structure without manual intervention. This is especially useful if you’re including schema into the LLM context.
  9. Real-time data ingestion: Memgraph supports seamless integration with streaming platforms like Kafka, Redpanda, and Pulsar, allowing you to continuously grow and update your knowledge graph. Combine this with Memgraph MAGE or custom procedures to dynamically query and analyze incoming data.
  10. Embedding procedures embeddings.text and embeddings.node_sentence: Offer the ability to compute embeddings during the preprocessing or retrieval stages.
  11. llm.complete function: Allows you to call any LLM under any given Cypher query.
  12. Server-side parameters: Could be used for many different things, but in this context, the parameters help you with managing configuration under any given query or pipeline.

Based on the above primitives, there are a few main GraphRAG pipeline types:

  • Analytical: When you need an exact database query to fetch specific data (e.g., deterministic counting), use Text2Cypher
  • Local: To retrieve a focused subset of your data for more contextual answers, leverage Local Graph Search
  • Global: For cases where the entire dataset requires advanced processing or summarization, apply Query-focused Summarization.

atomic_graphrag_pipelines