How to build multi-agent RAG system with LlamaIndex?

By Matea Pesic

8 min readApril 15, 2025

In the previous blog post, we introduced a single-agent GraphRAG system that combined the power of knowledge graphs, Memgraph and LlamaIndex to enhance information retrieval for LLMs. Building on that foundation, this post explores how multi-agent systems can take things even further.

Multi-agent RAG systems allow different agents to specialize and collaborate. By distributing responsibilities such as retrieval, computation and coordination—across multiple agents, we can build more dynamic and capable pipelines.

In this example, we build a multi-agent GraphRAG system using LlamaIndex and Memgraph, integrating retrieval-augmented generation (RAG) with graph-based querying and tool-using agents. We'll explore how to:

Set up Memgraph as a graph store for structured knowledge retrieval.
Use LlamaIndex to create a Property Graph Index and perform Memgraph's vector search on embedded data.
Implement function agents for both arithmetic operations and semantic retrieval.
Design an AgentWorkflow that combines retrieval and computation to answer complex queries.

By the end, we'll have a fully functional GraphRAG pipeline capable of answering structured queries while performing calculations on retrieved data.

Prerequisites

Make sure you have Docker running in the background
Run Memgraph

The easiest way to run Memgraph is by using the following commands:

For Linux/macOS:

curl https://install.memgraph.com | sh

For Windows:

iwr https://windows.memgraph.com | iex

Install necessary dependencies:

%pip install llama-index llama-index-graph-stores-memgraph python-dotenv

Environment setup

Create .env file that contains your OpenAI API key:

OPENAI_API_KEY=sk-proj-...

Create the script

Let's first load our .env file and set the LLM model we want to use. In this example, we're using OpenAI's gpt-4 model.

from dotenv import load_dotenv
load_dotenv()
from llama_index.llms.openai import OpenAI
from llama_index.core import Settings
 
# settings
Settings.llm = OpenAI(model="gpt-4",temperature=0)

Define calculator tools

Next, addition and subtraction tools for calculations and a calculator agent are defined. The role of the agent, in this case, will be to perform basic arithmetic operations with access to the defined tools.

from llama_index.core.tools import FunctionTool
from llama_index.core.agent.workflow import FunctionAgent
 
def add(a: int, b: int) -> int:
    """Add two numbers."""
    return a + b
 
 
def subtract(a: int, b: int) -> int:
    """Subtract two numbers."""
    return a - b
 
# Create agent configs
calculator_agent = FunctionAgent(
    name="calculator",
    description="Performs basic arithmetic operations",
    system_prompt="You are a calculator assistant.",
    tools=[
        FunctionTool.from_defaults(fn=add),
        FunctionTool.from_defaults(fn=subtract),
    ],
    llm=OpenAI(model="gpt-4"),
)

Load the dataset

Besides the basic operations, we also want to create a RAG pipeline and perform retrieval operations on the dataset of our choice. In this example, we're using the PDF file about the Canadian budget for 2023. The file is transformed into PDF and stored in the data directory. Let's load that dataset:

from llama_index.core import SimpleDirectoryReader
 
documents = SimpleDirectoryReader("./data").load_data()

Memgraph graph store

We'll now establish a connection to Memgraph, using MemgraphPropertyGraphStore from LlamaIndex. This allows us to store and retrieve structured data efficiently, enabling graph-based querying for retrieval-augmented generation (RAG) pipelines.

from llama_index.graph_stores.memgraph import MemgraphPropertyGraphStore
 
graph_store = MemgraphPropertyGraphStore(
    username="",  # Your Memgraph username, default is ""
    password="",  # Your Memgraph password, default is ""
    url="bolt://localhost:7687"  # Connection URL for Memgraph
)

Create a knowledge graph in Memgraph

This section builds a Property Graph Index using PropertyGraphIndex from LlamaIndex. This index allows us to store and retrieve structured knowledge in a graph database (Memgraph) while leveraging OpenAI embeddings for semantic search.

from llama_index.core import PropertyGraphIndex
from llama_index.core.indices.property_graph import SchemaLLMPathExtractor
from llama_index.embeddings.openai import OpenAIEmbedding
 
index = PropertyGraphIndex.from_documents(
    documents,
    embed_model=OpenAIEmbedding(model_name="text-embedding-ada-002"),
    kg_extractors=[
        SchemaLLMPathExtractor(
            llm=OpenAI(model="gpt-4", temperature=0.0)
        )
    ],
    property_graph_store=graph_store,
    show_progress=True,
)

RAG Pipeline: query engine and retrieval agent

Let's now set up a Retrieval-Augmented Generation (RAG) pipeline using LlamaIndex's QueryEngineTool and FunctionAgent. The pipeline enables efficient data retrieval from a structured knowledge base (Memgraph) and provides contextual responses using OpenAI's GPT-4.

First, we convert the Property Graph Index into a query engine, allowing structured queries over the indexed data.

from llama_index.core.tools import QueryEngineTool
 
query_engine = index.as_query_engine()
 
# rag pipeline as a tool
budget_tool = QueryEngineTool.from_defaults(
    query_engine,
    name="canadian_budget_2023",
    description="A RAG engine with some basic facts about the 2023 Canadian federal budget."
)
 
retriever_agent = FunctionAgent(
    name="retriever",
    description="Manages data retrieval",
    system_prompt="You are a retrieval assistant.",
    tools=[
        budget_tool,
    ],
    llm=OpenAI(model="gpt-4"),
)

Creating and running the workflow

Finally, and most importantly, let's create an AgentWorkflow that ties together the previously defined agents, including the calculator and retriever agents. This workflow enables us to run a sequence of operations involving both data retrieval and arithmetic computations, allowing the agents to interact with one another.

We define an async function to execute the workflow, sending a user query that asks for both the total amount of the 2023 Canadian federal budget and an additional calculation (adding 3 billion).

from llama_index.core.agent.workflow import (
    AgentWorkflow,
    FunctionAgent,
    ReActAgent,
)
import asyncio
 
# Create and run the workflow
workflow = AgentWorkflow(
    agents=[calculator_agent, retriever_agent], root_agent="calculator"
)
 
# Define an async function to run the workflow
async def run_workflow():
    response = await workflow.run(user_msg="What is the total amount of the 2023 Canadian federal budget? Add 3 billion to that budget using tools")
    print(response)
 
# Run the async function using asyncio
asyncio.run(run_workflow())

Conclusion

By moving from a single-agent to a multi-agent setup, we've unlocked a new level of flexibility and intelligence in GraphRAG systems. Each agent can specialize in a distinct task: retrieving knowledge, performing calculations, or orchestrating workflows, allowing the overall system to handle more complex queries with greater efficiency. In the next blog post, we'll take this further by incorporating Memgraph algorithms as tools into the pipeline, enabling even richer interactions and capabilities.

Additional resources

Check out our ai-demos repository for more detailed examples or some of the previous webinars with interesting LLM use cases, such as: