AI ecosystemGraphRAGKnowledge retrieval

Knowledge retrieval

Knowledge retrieval involves querying a knowledge graph to extract the most relevant information for your use case. In the context of GraphRAG, this step ensures that the right data is passed to the LLM for processing, enabling accurate, context-rich responses.

Choosing the best retrieval strategy depends on your data, the type of insights you’re seeking, and the complexity of your queries. Memgraph provides a suite of powerful tools and techniques to optimize retrieval for your knowledge graph.

medical-records-example

Retrieval types

Finding relevant data in the wast amount of nodes and relationships in your graph is not easy, and that’s why Memgraph has a couple of strategies to facilitate that.

While searching for a certain information it is important to know where to start, how to find the node in the graph which will best describe what you’re looking for. The search for that starting point or a pivot node can also be called a pivot search. Once you get to the pivot node, you have located yourself in the part of the knowledge graph that holds relevant information. There are various strategies to find a pivot node, some of which we are listing below.

Pivot search can be just a start of your journey to provide valuable context to the LLM to ground it and in that way avoid hallucinations. Additionally, to expand the search, you can perform relevance expansion which includes advanced strategies to obtain relevant information.

Pivot search is the starting point of any retrieval process, identifying the core entities and relationships relevant to your query.

This retrieval method uses vector embeddings to find entities based on semantic similarity rather than exact matches. It’s ideal for scenarios where meaning, context, or relatedness is more important than specific terms (e.g., finding similar topics or concepts).

vector-search

Perform vector search on your knowledge graph to extract semantically similar nodes, setting the foundation for more advanced retrieval.

Query example:

Which European city is known for its romantic atmosphere and iconic landmarks like the Eiffel Tower?

Vector search matches nodes semantically associated with “romantic atmosphere” and “iconic landmarks”, leading to nodes like Paris.

Read more

Text search (experimental)

Currently, in Memgraph, text search is a retrieval technique that finds nodes with their properties based on exact text matches. It’s best used for queries involving specific terms, phrases, or entity names (e.g., “Find all nodes containing the keyword ‘climate change’”).

text-search

Use text search as a direct approach to narrow down specific nodes before expanding context.

Query example:

What is the capital of France?

Text search can find all nodes that have France as their property value.

Read more in text search docs.

Geospatial search enables you to integrate and retrieve location-based insights, making it a valuable tool for applications involving geographic data. This functionality relies on the ability to store Point types representing locations with latitude and longitude and on leveraging point indexes to ensure optimized query performance, allowing for quick proximity and spatial queries.

geospatial

Query example:

Find cities within 50 km of Paris.

Geospatial search retrieves cities based purely on proximity to Paris, in this case, Versailles.

Hybrid knowledge retrieval

Combines multiple retrieval methods - such as text search, vector search, and geospatial search - for enhanced precision and versatility. It’s perfect for multi-dimensional queries involving semantic meaning, specific terms, and spatial constraints.

Hybrid retrieval is particularly valuable in complex queries where results must satisfy multiple dimensions simultaneously. For example:

Find all nearby Italian restaurants with excellent reviews and vegetarian options.

Relevance expansion

Relevance expansion techniques allow you to extract deeper insights from your graph by analyzing related entities and connections beyond the initial query. This step is optional but can significantly enhance the quality of responses in a GraphRAG system.

Why you should broaden the search

  1. LLMs can deliver more accurate responses with expanded, meaningful data.
  2. Customize the retrieval process to align with the specific needs of your application.
  3. Extract more value from the semantic connections in your graph.

Deep path traversals

Deep path traversal algorithms are essential for exploring and analyzing complex relationships within a graph database. In Memgraph, these algorithms are built into the core system, enabling efficient and effective graph queries without additional overhead.

Deep path traversals enable reasoning across your graph, providing insights that go beyond surface-level connections. This expanded information is sent to the LLM as part of the input prompt to provide better context.

one-hop

Read more:

Community detection

With community detection algorithms (e.g., Leiden or Louvain), you can group similar nodes that share a strong relationship into the same community and summarize all the important information about such community into a community node. This strategy allows you to get answers to global questions quickly, without the need to search across the whole graph and in that way provide meaningful insights for LLM.

community-detection

Centrality measures

By incorporating PageRank or betweenness centrality as part of your retrieval strategy, you add a robust methods for emphasizing central or influential nodes, which can improve the relevance and depth of query results.

PageRank identifies influential nodes in the graph, helping prioritize highly connected or globally significant information.

Betweenness centrality highlights nodes that act as bridges between communities or topics, ensuring the retrieval covers diverse or connecting concepts.

For example, if there are multiple nodes as potential answers, you can rank the nodes by their PageRank scores to get the most important ones.

How to choose the best retrieval strategy

When choosing the retrieval strategy, it is important to know your data and to have a scope of the queries you want the LLM to be able to answer. Categorizing queries effectively in your GraphRAG system can help optimize your two-step process: pivot search and relevance expansion. Here’s how you can systematically approach query categorization:

  1. Fact-based queries

    • Pivot search: Vector or text search
    • Relevance expansion: Minimal or none
    • Example question: “Where is Memgraph founded?”
  2. Relational queries

    • Pivot search: Vector or text search
    • Relevance expansion: 1-hop or 2-hop traversal
    • Example question: “Who collaborates on Memgraph?”
  3. Contextual queries

    • Pivot search: Vector or text search
    • Relevance expansion: Community detection
    • Example question: “Can you give me more details about Memgraph?”
  4. Path-Based queries

    • Pivot search: Vector or text search
    • Relevance expansion: Deep path traversal
    • Example question: “How is Dominik connected to Memgraph?”
  5. Geospatial queries

    • Pivot search: Geospatial search
    • Relevance expansion: 1-hop traversal
    • Example question: “How many employees work within a 50-mile radius?”
💡

As previously mentioned, there is no ideal recipe for every use case. Still, the above classification might help you better understand how to start building or how to optimize your GraphRAG. Get inspired and see examples and demos on how we and others built a GraphRAG system.