
How to build multi-agent RAG system with LlamaIndex?
In the previous blog post, we introduced a single-agent GraphRAG system that combined the power of knowledge graphs, Memgraph and LlamaIndex to enhance information retrieval for LLMs. Building on that foundation, this post explores how multi-agent systems can take things even further.
Multi-agent RAG systems allow different agents to specialize and collaborate. By distributing responsibilities such as retrieval, computation and coordination—across multiple agents, we can build more dynamic and capable pipelines.
In this example, we build a multi-agent GraphRAG system using LlamaIndex and Memgraph, integrating retrieval-augmented generation (RAG) with graph-based querying and tool-using agents. We'll explore how to:
- Set up Memgraph as a graph store for structured knowledge retrieval.
- Use LlamaIndex to create a Property Graph Index and perform Memgraph's vector search on embedded data.
- Implement function agents for both arithmetic operations and semantic retrieval.
- Design an AgentWorkflow that combines retrieval and computation to answer complex queries.
By the end, we'll have a fully functional GraphRAG pipeline capable of answering structured queries while performing calculations on retrieved data.
Prerequisites
-
Make sure you have Docker running in the background
-
Run Memgraph
The easiest way to run Memgraph is by using the following commands:
For Linux/macOS:
curl https://install.memgraph.com | sh
For Windows:
iwr https://windows.memgraph.com | iex
- Install necessary dependencies:
%pip install llama-index llama-index-graph-stores-memgraph python-dotenv
Environment setup
Create .env
file that contains your OpenAI API key:
OPENAI_API_KEY=sk-proj-...
Create the script
Let's first load our .env
file and set the LLM model we want to use. In this
example, we're using OpenAI's gpt-4 model.
from dotenv import load_dotenv
load_dotenv()
from llama_index.llms.openai import OpenAI
from llama_index.core import Settings
# settings
Settings.llm = OpenAI(model="gpt-4",temperature=0)
Define calculator tools
Next, addition and subtraction tools for calculations and a calculator agent are defined. The role of the agent, in this case, will be to perform basic arithmetic operations with access to the defined tools.
from llama_index.core.tools import FunctionTool
from llama_index.core.agent.workflow import FunctionAgent
def add(a: int, b: int) -> int:
"""Add two numbers."""
return a + b
def subtract(a: int, b: int) -> int:
"""Subtract two numbers."""
return a - b
# Create agent configs
calculator_agent = FunctionAgent(
name="calculator",
description="Performs basic arithmetic operations",
system_prompt="You are a calculator assistant.",
tools=[
FunctionTool.from_defaults(fn=add),
FunctionTool.from_defaults(fn=subtract),
],
llm=OpenAI(model="gpt-4"),
)
Load the dataset
Besides the basic operations, we also want to create a RAG pipeline and perform
retrieval operations on the dataset of our choice. In this example, we're using
the PDF file about the Canadian budget for 2023. The file is transformed into PDF
and stored in the data
directory. Let's load that dataset:
from llama_index.core import SimpleDirectoryReader
documents = SimpleDirectoryReader("./data").load_data()
Memgraph graph store
We'll now establish a connection to Memgraph, using
MemgraphPropertyGraphStore
from LlamaIndex. This allows us to store and
retrieve structured data efficiently, enabling graph-based querying for
retrieval-augmented generation (RAG) pipelines.
from llama_index.graph_stores.memgraph import MemgraphPropertyGraphStore
graph_store = MemgraphPropertyGraphStore(
username="", # Your Memgraph username, default is ""
password="", # Your Memgraph password, default is ""
url="bolt://localhost:7687" # Connection URL for Memgraph
)
Create a knowledge graph in Memgraph
This section builds a Property Graph Index using PropertyGraphIndex
from
LlamaIndex. This index allows us to store and retrieve structured knowledge in a
graph database (Memgraph) while leveraging OpenAI embeddings for semantic
search.
from llama_index.core import PropertyGraphIndex
from llama_index.core.indices.property_graph import SchemaLLMPathExtractor
from llama_index.embeddings.openai import OpenAIEmbedding
index = PropertyGraphIndex.from_documents(
documents,
embed_model=OpenAIEmbedding(model_name="text-embedding-ada-002"),
kg_extractors=[
SchemaLLMPathExtractor(
llm=OpenAI(model="gpt-4", temperature=0.0)
)
],
property_graph_store=graph_store,
show_progress=True,
)
RAG Pipeline: query engine and retrieval agent
Let's now set up a Retrieval-Augmented Generation (RAG) pipeline using
LlamaIndex's QueryEngineTool
and FunctionAgent
. The pipeline enables
efficient data retrieval from a structured knowledge base (Memgraph) and
provides contextual responses using OpenAI's GPT-4.
First, we convert the Property Graph Index into a query engine, allowing structured queries over the indexed data.
from llama_index.core.tools import QueryEngineTool
query_engine = index.as_query_engine()
# rag pipeline as a tool
budget_tool = QueryEngineTool.from_defaults(
query_engine,
name="canadian_budget_2023",
description="A RAG engine with some basic facts about the 2023 Canadian federal budget."
)
retriever_agent = FunctionAgent(
name="retriever",
description="Manages data retrieval",
system_prompt="You are a retrieval assistant.",
tools=[
budget_tool,
],
llm=OpenAI(model="gpt-4"),
)
Creating and running the workflow
Finally, and most importantly, let's create an AgentWorkflow that ties together the previously defined agents, including the calculator and retriever agents. This workflow enables us to run a sequence of operations involving both data retrieval and arithmetic computations, allowing the agents to interact with one another.
We define an async function to execute the workflow, sending a user query that asks for both the total amount of the 2023 Canadian federal budget and an additional calculation (adding 3 billion).
from llama_index.core.agent.workflow import (
AgentWorkflow,
FunctionAgent,
ReActAgent,
)
import asyncio
# Create and run the workflow
workflow = AgentWorkflow(
agents=[calculator_agent, retriever_agent], root_agent="calculator"
)
# Define an async function to run the workflow
async def run_workflow():
response = await workflow.run(user_msg="What is the total amount of the 2023 Canadian federal budget? Add 3 billion to that budget using tools")
print(response)
# Run the async function using asyncio
asyncio.run(run_workflow())
Conclusion
By moving from a single-agent to a multi-agent setup, we've unlocked a new level of flexibility and intelligence in GraphRAG systems. Each agent can specialize in a distinct task: retrieving knowledge, performing calculations, or orchestrating workflows, allowing the overall system to handle more complex queries with greater efficiency. In the next blog post, we'll take this further by incorporating Memgraph algorithms as tools into the pipeline, enabling even richer interactions and capabilities.
Additional resources
Check out our ai-demos repository for more detailed examples or some of the previous webinars with interesting LLM use cases, such as: