Engineering GraphTech Expert insights AI

How Would Microsoft GraphRAG Work Alongside a Graph Database?

By Sara Tilly

10 min readFebruary 24, 2025

Last week, Memgraph hosted a community call to explore a cutting-edge topic in AI—Hierarchical GraphRAG. If you missed it, here’s a deep dive into the session with Jacob Coles, data scientist at Redfield, who walked us through an advanced approach to GraphRAG using hierarchical modeling. The discussion covered everything from the limitations of naive RAG to how Leiden community detection in Memgraph 3.0 enables more structured, large-scale knowledge retrieval.

Watch the full webinar recording on our YouTube channel:

Talking Point 1: The Problem with Naive RAG: Context Windows and Fragmented Knowledge

Retrieval-augmented generation (RAG) typically works by fetching documents from a vector database, chunking them into manageable pieces, and passing them into a language model (LLM). The issue?

Limited context window: LLMs can only process so much text at once. The model might miss the bigger picture if crucial info is scattered across multiple chunks.
Semantic search inefficiencies: Even with vector embeddings, retrieval is local—you’re just grabbing the most similar text, not reasoning over the relationships between concepts.
Disjointed knowledge: If the information needed to answer a question spans multiple chunks, the LLM has to piece it together on the fly. That’s not always reliable.

This is where GraphRAG helps, which structures knowledge in a graph rather than treating documents as standalone entities.

Talking Point 2: GraphRAG and Structuring Data for Better Retrieval

Instead of retrieving isolated text chunks, GraphRAG extracts relationships between concepts, transforming unstructured text into a knowledge graph.

Here’s how it works:

Entity-relationship extraction: Documents are parsed into subject-object-predicate triples (e.g., Steve Jobs - founded - Apple).
Graph indexing: These relationships are stored in Memgraph, allowing for querying over structured knowledge.
Semantic search on graph nodes: Instead of only fetching text chunks, queries traverse the graph, leveraging context-rich connections between entities.

But there’s still a catch—most GraphRAG implementations struggle with large-scale reasoning. Now we can talk about hierarchical modeling.

Talking Point 3: Hierarchical GraphRAG: Beyond Local Search

Inspired by Microsoft’s GraphRAG, Redfield’s approach introduces hierarchical community detection to organize and summarize graph data at different levels.

Leiden Community Detection: Groups related nodes based on connectivity, forming a hierarchical structure.
Multi-level summarization: Each community gets a summary, allowing queries to retrieve high-level insights instead of individual nodes.
Global vs. local retrieval: Users can now search across hierarchical levels, getting both specific details and high-level overviews.

For example, instead of just retrieving all references to Steve Jobs, this method clusters and summarizes related knowledge under categories like Tech Founders, Apple History, or Innovators in Computing.

Talking Point 4: Memgraph 3.0 Features That Enable Hierarchical GraphRAG

Memgraph 3.0 introduces two key features that make hierarchical GraphRAG scalable and efficient:

Leiden Community Detection: This built-in algorithm helps group entities into meaningful clusters, enabling multi-level reasoning.
Vector indexing for embeddings: Allows semantic search over entire communities, making it easier to fetch high-level insights rather than just raw entities.

This combination means faster, more relevant queries and a richer knowledge graph that can support more advanced AI applications.

Talking Point 5: Live Demo: Applying Hierarchical GraphRAG to "The Time Machine"

Jacob ran a live demo using H.G. Wells’ The Time Machine to illustrate the power of community-driven retrieval.

A chatbot built on Memgraph GraphRAG was asked to identify the book’s key themes. The system first retrieved individual entities like time travel and dystopian future. It then leveraged hierarchical summaries to generalize over multiple entities, returning a more insightful and complete answer.

When community detection was turned off, the chatbot’s responses were more fragmented and less accurate—proving the effectiveness of hierarchical knowledge retrieval.

DEMO RESOURCES

Challenges & Future Improvements

While Hierarchical GraphRAG solves many of the issues found in naïve RAG approaches, it’s not without its challenges. One major limitation is entity resolution—right now, entities are matched primarily by name, which can lead to issues when different people or concepts share identical labels. This is something that could be improved with more advanced disambiguation techniques.

Another challenge lies in indexing costs. Extracting entities, generating embeddings, and building hierarchical communities require significant computation. While the upfront cost is high, the long-term payoff comes in the form of more efficient querying, reducing the need for excessive context window stuffing.

Finally, graph traversal optimizations remain an area for growth. While Memgraph’s community detection enables structured reasoning, more advanced ranking and traversal algorithms could further improve retrieval efficiency and relevance. Microsoft’s own implementation has additional query ranking techniques that are worth exploring for future iterations.

Despite these challenges, Hierarchical GraphRAG represents a big step forward in knowledge retrieval, making LLMs more structured, context-aware, and scalable.

Watch the full webinar recording—How Would Microsoft GraphRAG Work Alongside a Graph Database?

Q&A

We’ve compiled the questions and answers from the community call Q&A session.

Note that we’ve paraphrased them slightly for brevity. For complete details, watch the entire video.

1. Could you give a little more detail about how entity resolution was proposed?

Jacob: The entity resolution method we used was part of the LlamaIndex package, which was already pre-built. It operates in a fairly simplistic way—if two entities share the same name, they are automatically resolved as being the same. While this approach is fast and easy to implement, it has limitations in real-world applications where disambiguation is crucial. To improve accuracy, a more probabilistic model or a method that incorporates additional properties beyond just name matching would likely be needed.

2. Any thoughts on using Memgraph with dynamic data that is frequently updated? Are there ways to avoid completely rebuilding the index every time?

Jacob: This largely depends on how dynamic your entity resolution process is and how it handles updates. One of the biggest challenges with dynamic graphs is maintaining consistency without excessive reindexing. That being said, you don’t need to rebuild the entire index every time. Memgraph allows for incremental updates, so you can add new data and modify the graph without fully reconstructing it. This makes it a practical solution for handling evolving datasets in real-time applications.

3. Who decides the entity simulations? Does the user need to do it, or can you throw it to an LLM? How does that work, and is it good or bad?

Jacob: The role of the LLM is deeply integrated at multiple stages. It’s easy to overlook because it's used every step of the way, just in slightly different ways. One major use case is entity extraction.

The process works by feeding the raw text to the LLM along with a structured prompt. The prompt specifies the task: Extract the subject, relation, and object from this text and return it in JSON format using this specific schema. The LLM processes the text and returns structured data.

At that point, additional processing might be necessary—some might use regex or other parsing techniques to refine the output. However, depending on how well the prompt is designed, post-processing might not even be required. In short, the LLM handles entity extraction automatically, but how much manual intervention is needed depends on the setup and the specificity of the prompt design.

I also briefly mentioned this before, but we use a predefined schema in knowledge graph creation, which allows for more structured extraction. Another critical factor is which language model you use—this can make a big difference in performance and cost. For our task, we used GPT-4o Mini, which provided reasonable accuracy while keeping costs manageable. If you require more precision, you might opt for full GPT-4o, but the cost scales significantly, especially when processing thousands of documents. Choosing the right model is a balance between accuracy, performance, and budget constraints.

4. Can communities overlap?

Jacob: In a sense, yes—higher-level communities contain lower-level communities within them. For example, a level-zero community is connected to multiple level-one communities, forming a hierarchical structure. However, communities on the same level do not overlap, as far as I know. Entities belong to only one community at any given level. If someone has a different understanding, feel free to correct me, but that’s my current interpretation of how it works.

5. Have you tried Groq with DeepSeek or Llama 3.3?

Jacob: I haven’t, unfortunately. There’s been a lot of hype around DeepSeek, and from what I’ve heard, it’s quite good, especially when it comes to model training efficiency. But personally, I don’t work on training large models, so I haven’t tested it firsthand. I still think GPT-4 is currently better, but DeepSeek seems promising. As for Llama 3.3, I haven’t experimented with it yet either, but I’d be curious to see how it performs in real-world tasks.

6. GraphRAG is evolving quickly, and different implementations—like Microsoft’s GraphRAG—are gaining traction. What do you think about this approach, and if you were to start over, would you take a different direction?

Jacob: I think it’s a really solid approach. One of its biggest strengths is its ability to generalize over large amounts of information, making it much more efficient for question answering in production. I’ve built chatbots before, and when you rely on a straightforward vector database, you often retrieve too many documents—sometimes not even the most relevant ones. With GraphRAG, you’re offloading a lot of the processing upfront, structuring knowledge in advance rather than on demand. Once those communities and summaries are built, you don’t need to reprocess them every time a query is made.

That said, I feel like what I’ve demonstrated so far is really just a starting point. There’s so much more to explore—especially in terms of graph traversal and querying strategies. The approach I showed today primarily focuses on creating communities, generating summaries, and indexing everything. But the next step is figuring out how to optimize querying, how to better traverse the graph, and how to improve ranking mechanisms.

Since Microsoft’s GraphRAG paper came out last year, I expect this field to evolve significantly. It’s fascinating to see how new language models are emerging, making these techniques even more accessible and cost-effective. Running private, high-performance models is becoming increasingly viable, which could further improve the efficiency and scalability of GraphRAG solutions in the near future.

Upcoming Events

Sign up for another RAG-related webinar scheduled for Feb 27, 2025: Memgraph + DeepSeek for Retrieval Augmented Generation (RAG)