How To Do GraphRAG with DeepSeek

By Sara Tilly

6 min readMarch 6, 2025

When you're dealing with hundreds of pages of internal documentation, acronyms no one's heard of, and the looming threat of GenAI hallucinations, you start looking for ways to ground your large language models in reality.

FI Consulting, a D.C.-based team specializing in complex data solutions, shared how they're doing that in our latest Memgraph Community Call.

Robert Chang (Director of Data Science), Sinan Yildirim (Senior Data Scientist), and James Whyte (Senior Solutions Architect) from FI Consulting walked us through their Retrieval-Augmented Generation (RAG) pipeline, where Memgraph and DeepSeek come together to keep answers accurate, complete, and contextual to their business.

Spoiler: it’s not just about dumping your documents into a vector database and praying for relevance. They’re using a graph to inject structure, relationships, and domain knowledge into the mix—because general-purpose LLMs don’t know that “RD” means “Office of Rural Development” at FI, not “Research and Development.”

Watch the full webinar recording—Memgraph + DeepSeek for Retrieval Augmented Generation RAG.

The RAG Problem: Accuracy and Context Are Everything

“We don’t want creativity. We want the answer,” Robert said, summing up the problem many teams face when using GenAI with sensitive internal knowledge. Hallucinations aren’t just annoying; they can erode trust in the system before it even launches.

That’s why FI Consulting designed their architecture to do three things:

Filter down massive documents into manageable, meaningful chunks.
Prioritize the correct internal language, acronyms, and domain-specific terms.
Return accurate, verifiable answers grounded in their own data.

Why FI Consulting Combines Graphs and LLMs

Robert kicked things off by explaining why they built their RAG system this way—and why graphs are non-negotiable if you care about high-quality, trustworthy answers.

Here are the key takeaways.

LLMs hallucinate

Even the best models can return incomplete or inaccurate answers, especially when asked to pull from massive, unstructured datasets.

Graphs ground the context

By feeding the LLM only the most relevant, filtered slices of data—based on relationships, metadata, and domain-specific entities—you dramatically improve accuracy and reduce noise.

Domain knowledge matters

FI's internal language is full of acronyms and abbreviations (“RD” means “Office of Rural Development” to them, not “Research and Development”). They encode this context in the graph to keep the LLM from getting confused.

Quality answers need constraints

Want projects from 2020–2022? The LLM won’t reliably filter that on its own. But with a graph, you can pre-filter the data by date, region, or any other attribute before the LLM ever sees it.

Graph + LLM = the best of both worlds

The graph handles structure and retrieval. The LLM handles natural language generation. Each plays to its strengths.

At the end of the day, Robert’s core message was simple:

Stop making your LLM do all the work. Let your graph handle what it’s good at—filtering, relationships, and domain knowledge—so your GenAI system can stay accurate, fast, and useful.

Inside the Demo: How FI Consulting Built Their RAG Workflow

During the live demo, Sinan gave a tour of how FI Consulting's RAG system works in practice.

Chunking strategy: Break down massive, 100+ page documents into a hierarchy of “parents” (2,000 tokens) and “children” (200 tokens). This makes embedding and retrieval manageable and precise.
Vector embeddings: Each child chunk gets embedded (1,536 dimensions) and stored directly in Memgraph alongside the raw text, creating a tightly coupled graph of documents and vectors.
Named Entity Recognition (NER): They extract key entities (like agencies, acronyms, and org-specific terms) to build rich context into the graph. Synonyms, aliases, and domain lingo are all handled through a custom NER file, ensuring searches understand the language of FI Consulting.
Smart retrieval: Instead of mindlessly searching everything, they filter queries through NER nodes first, narrowing the search space before running vector similarity (FAISS) and re-ranking steps. This keeps responses fast and focused.
Fallback logic: When there’s no direct match, they use graph traversals to pull in related entities and neighboring documents, which keeps the system from returning dead ends.

This was you get “clean”, accurate, domain-aware answers without overloading the LLM or the user with irrelevant context.

Want the full walkthrough? Check out the complete demo in the webinar on our YouTube channel:

The Infrastructure

To wrap things up, James shared how FI Consulting deployed the whole system—and the good news is, you don't need a data center full of GPUs to pull this off.

Key points from their setup:

Deployment:
- Hosted on Azure using virtual machines.
- Memgraph runs in a Docker container with persistent volumes ensuring seamless backups.
- The setup was quick and painless, thanks to solid Memgraph docs, which are "much easier than other graph vendors," according to James.
LLM hosting:
- Using DeepSeek (quantized version) served through Ollama for lightweight, local LLM inference.
- Started with an NVIDIA A100 GPU (great performance, but ~$3,000/month).
- Optimized to an NVIDIA T4, bringing costs down to ~$500/month while maintaining good response times.
Why Memgraph worked:
- Low latency and high throughput, even with hundreds of thousands of nodes.
- In-memory architecture helped power real-time graph analytics without bottlenecks.
- Easy data migrations and backups across servers with minimal overhead.

“You don’t need massive infrastructure to make this work,” James said. “Memgraph kept things lightweight and cost-effective without sacrificing performance.”

Bottom line, FI Consulting built a RAG system that’s not only accurate and domain-aware but also affordable and scalable—without burning through GPU credits.

Similar Use Cases