Knowledge retrieval
Knowledge retrieval involves querying a knowledge graph to extract the most relevant information for your use case. In the context of GraphRAG, this step ensures that the right data is passed to the LLM for processing, enabling accurate, context-rich responses.
Choosing the best retrieval strategy depends on your data, the type of insights you’re seeking, and the complexity of your queries. Memgraph provides a suite of powerful tools and techniques to optimize retrieval for your knowledge graph.
Retrieval types
Finding relevant data in the vast amount of nodes and relationships in your graph is not easy, and that’s why Memgraph has a couple of strategies to facilitate that.
While searching for a certain information it is important to know where to start, how to find the node in the graph which will best describe what you’re looking for. The search for that starting point or a pivot node can also be called a pivot search. Once you get to the pivot node, you have located yourself in the part of the knowledge graph that holds relevant information. There are various strategies to find a pivot node, some of which we are listing below.
Pivot search can be just a start of your journey to provide valuable context to the LLM to ground it and in that way avoid hallucinations. Additionally, to expand the search, you can perform relevance expansion which includes advanced strategies to obtain relevant information.
Pivot search combined with relevance expansion is generally more reliable than directly generating Cypher queries from natural language. These approaches can leverage pre-built tools to ensure the resulting query is accurate and contextually appropriate. To improve Cypher query generation from natural language, effective prompt engineering is essential.
Pivot search
Pivot search is the starting point of any retrieval process, identifying the core entities and relationships relevant to your query.
Vector search
This retrieval method uses vector embeddings to find entities based on semantic similarity rather than exact matches. It’s ideal for scenarios where meaning, context, or relatedness is more important than specific terms (e.g., finding similar topics or concepts).
Perform vector search on your knowledge graph to extract semantically similar nodes, setting the foundation for more advanced retrieval.
Query example:
Which European city is known for its romantic atmosphere and iconic landmarks like the Eiffel Tower?
Vector search matches nodes semantically associated with “romantic atmosphere” and “iconic landmarks”, leading to nodes like Paris.
Read more
- Vector search docs
- Simplify Data Retrieval with Memgraph’s Vector Search
- Building a Movie Similarity Search Engine with Vector Search in Memgraph
Text search (experimental)
Currently, in Memgraph, text search is a retrieval technique that finds nodes with their properties based on exact text matches. It’s best used for queries involving specific terms, phrases, or entity names (e.g., “Find all nodes containing the keyword ‘climate change’”).
Use text search as a direct approach to narrow down specific nodes before expanding context.
Query example:
What is the capital of France?
Text search can find all nodes that have France as their property value.
Read more in text search docs.
Geospatial search
Geospatial search enables you to integrate and retrieve location-based insights,
making it a valuable tool for applications involving geographic data. This
functionality relies on the ability to store Point
types representing locations with latitude and
longitude and on leveraging point indexes to ensure
optimized query performance, allowing for quick proximity and spatial queries.
Query example:
Find cities within 50 km of Paris.
Geospatial search retrieves cities based purely on proximity to Paris, in this case, Versailles.
Hybrid knowledge retrieval
Combines multiple retrieval methods - such as text search, vector search, and geospatial search - for enhanced precision and versatility. It’s perfect for multi-dimensional queries involving semantic meaning, specific terms, and spatial constraints.
Hybrid retrieval is particularly valuable in complex queries where results must satisfy multiple dimensions simultaneously. For example:
Find all nearby Italian restaurants with excellent reviews and vegetarian options.
Relevance expansion
Relevance expansion techniques allow you to extract deeper insights from your graph by analyzing related entities and connections beyond the initial query. This step is optional but can significantly enhance the quality of responses in a GraphRAG system.
Why you should broaden the search
- LLMs can deliver more accurate responses with expanded, meaningful data.
- Customize the retrieval process to align with the specific needs of your application.
- Extract more value from the semantic connections in your graph.
Deep path traversals
Deep path traversal algorithms are essential for exploring and analyzing complex relationships within a graph database. In Memgraph, these algorithms are built into the core system, enabling efficient and effective graph queries without additional overhead.
Deep path traversals enable reasoning across your graph, providing insights that go beyond surface-level connections. This expanded information is sent to the LLM as part of the input prompt to provide better context.
Read more:
- Deep path traversal algorithms docs
- Memgraph’s Deep Path Traversal Capabilities
- Memgraph Cypher Implementation: Flexibility and Advanced Traversals
Community detection
With community detection algorithms (e.g., Leiden or Louvain), you can group similar nodes that share a strong relationship into the same community and summarize all the important information about such community into a community node. This strategy allows you to get answers to global questions quickly, without the need to search across the whole graph and in that way provide meaningful insights for LLM.
Centrality measures
By incorporating PageRank or betweenness centrality as part of your retrieval strategy, you add a robust methods for emphasizing central or influential nodes, which can improve the relevance and depth of query results.
PageRank identifies influential nodes in the graph, helping prioritize highly connected or globally significant information.
Betweenness centrality highlights nodes that act as bridges between communities or topics, ensuring the retrieval covers diverse or connecting concepts.
For example, if there are multiple nodes as potential answers, you can rank the nodes by their PageRank scores to get the most important ones.
Cypher generation
LLMs are trained on publicly available resources, including openCypher. Memgraph supports openCypher but has also extended its Cypher implementation with advanced constructs. While these enhancements improve usability, they can make it harder for LLMs to generate optimal Cypher queries tailored for Memgraph.
To address this, we’re continually increasing compatibility with each release and enhancing Memgraph’s error messages. Here’s a prompt that has been particularly helpful in the process, and mostly used in the integrations with other AI tools:
MEMGRAPH_GENERATION_TEMPLATE = """Your task is to directly translate natural language inquiry into precise and executable Cypher query for Memgraph database.
You will utilize a provided database schema to understand the structure, nodes and relationships within the Memgraph database.
Instructions:
- Use provided node and relationship labels and property names from the
schema which describes the database's structure. Upon receiving a user
question, synthesize the schema to craft a precise Cypher query that
directly corresponds to the user's intent.
- Generate valid executable Cypher queries on top of Memgraph database.
Any explanation, context, or additional information that is not a part
of the Cypher query syntax should be omitted entirely.
- Use Memgraph MAGE procedures instead of Neo4j APOC procedures.
- Do not include any explanations or apologies in your responses.
- Do not include any text except the generated Cypher statement.
- For queries that ask for information or functionalities outside the direct
generation of Cypher queries, use the Cypher query format to communicate
limitations or capabilities. For example: RETURN "I am designed to generate
Cypher queries based on the provided schema only."
Schema:
{schema}
With all the above information and instructions, generate Cypher query for the
user question.
The question is:
{question}"""
Still, there are some other advanced techniques that can be done. In GraphChat, Memgraph Lab feature, we currently provide the following prompt:
Your task is to directly translate natural language inquiry into precise and executable Cypher query for Memgraph database. You will utilize a provided database schema to understand the structure, nodes and relationships within the Memgraph database.
Rules:
Use provided node and relationship labels and property names from the schema which describes the database's structure. Schema is defined after keyword Schema down below. Upon receiving a user question, synthesize the schema to craft a precise Cypher query that directly corresponds to the user's intent.
Generate valid executable Cypher queries on top of Memgraph database. Any explanation, context, or additional information that is not a part of the Cypher query syntax should be omitted entirely.
Use Memgraph MAGE procedures instead of Neo4j APOC procedures.
Do not include any explanations or apologies in your responses.
Do not include any text except the generated Cypher statement.
For queries that ask for information or functionalities outside the direct generation of Cypher queries, use the Cypher query format to communicate limitations or capabilities. For example: RETURN "I am designed to generate Cypher queries based on the provided schema only."
We also attach schema, history of conversations (if needed), constraints for write permissions (e.g., if it is a DELETE
command, dismiss that with the message: Sorry, can’t do that! DELETE
queries are not permitted.).
Also, we attach some Cypher specifics to the prompt:
Atom expressions
Atom expressions are not supported, so if you need to do something like this:
MATCH (n)
RETURN n, size((n)-->()) AS edgeCount
Use the following structure with OPTIONAL MATCH or a combination of queries using the keyword WITH:
MATCH (n)
OPTIONAL MATCH (n)-[r]->()
RETURN n, count(r) AS edgeCount
To further improve Cypher query generation, we attach the failed attempts from Memgraph in a form of invalid Cypher queries and error messages they produced with the following instructions:
Generate a new correct version of the Cypher query:
Do not include any explanations or apologies in your responses.
Do not wrap the Cypher query in backticks.
How to choose the best retrieval strategy
When choosing the retrieval strategy, it is important to know your data and to have a scope of the queries you want the LLM to be able to answer. Categorizing queries effectively in your GraphRAG system can help optimize your two-step process: pivot search and relevance expansion. Here’s how you can systematically approach query categorization:
-
Fact-based queries
- Pivot search: Vector or text search
- Relevance expansion: Minimal or none
- Example question: “Where is Memgraph founded?”
-
Relational queries
- Pivot search: Vector or text search
- Relevance expansion: 1-hop or 2-hop traversal
- Example question: “Who collaborates on Memgraph?”
-
Contextual queries
- Pivot search: Vector or text search
- Relevance expansion: Community detection
- Example question: “Can you give me more details about Memgraph?”
-
Path-Based queries
- Pivot search: Vector or text search
- Relevance expansion: Deep path traversal
- Example question: “How is Dominik connected to Memgraph?”
-
Geospatial queries
- Pivot search: Geospatial search
- Relevance expansion: 1-hop traversal
- Example question: “How many employees work within a 50-mile radius?”
As previously mentioned, there is no ideal recipe for every use case. Still, the above classification might help you better understand how to start building or how to optimize your GraphRAG. Get inspired and see examples and demos on how we and others built a GraphRAG system.