GraphRAG with Memgraph
LLMs have the knowledge they were trained on. By building a RAG, you’re expanding that knowledge with your data. It is important to understand how to structure and model the data and how to find and extract relevant information for LLM to provide more accurate responses personalized to your specific data.
GraphRAG is a RAG system that combines the strengths of knowledge graphs and LLMs. Knowledge graphs are a structured representation of information where entities and their relationships are organized to enable reasoning and insights.
Here are the main strengths of graph in a RAG system:
- Relational context - Knowledge graph structure holds the information about semantics.
- Improved retrieval accuracy - Having retrieval strategies specific to graphs, such as community detection and impact analytics.
- Multi-hop reasoning - Ability to traverse through data neighborhoods.
- Efficient information navigation - Scanning subgraphs instead of full datasets.
- Dynamically evolving knowledge - Updating graph in real time.
Graph structure is a prerequisite for GraphRAG, and a graph database is even better. A GraphRAG application running in production needs scalability, real-time performance, incremental updates and persistence. Having a graph database as a part of the GraphRAG is especially useful if other application parts also rely on the graph database.
Key Memgraph features
Memgraph is a graph database that stores your knowledge graph, and it ensures durability of stored data for backup and recovery. Refer to our data modeling and knowledge graph modeling guide for tips and tricks on how to model your knowledge.
With Memgraph as an in-memory graph database, you can quickly traverse through your graph with deep path traversals and not worry about latency.
You can ingest streaming data into Memgraph with Kafka, Redpanda or Pulsar which you can then query with (dynamic) MAGE algorithms or your custom procedures. That allows you to have a growing knowledge graph that’s being updated on-the-fly.
Here are the most useful features in Memgraph to build a GraphRAG:
- Deep-path traversals
- Vector search (experimental) - Usually used in the first step of finding and extracting relevant information (pivot search)
- Leiden community detection: Proven to be a better and faster version of Louvain community detection, guaranteeing well-connected communities. It is usually used in the second step of finding and extracting relevant information (relevance expansion).
- Louvain community detection
- Dynamic community detection
- PageRank
- Dynamic PageRank
- Text search
- Run-time schema tracking
Here is how those features fit into the GraphRAG architecture:
Tools
GraphChat is a Memgraph Lab feature that allows users to extract insights from a graph database by asking questions in plain English. It incorporates elements of GraphRAG. This two-phase Generative AI app first generates Cypher queries from the text and then summarizes the query results in the final response.
Integrations
Memgraph offers several integrations with popular AI frameworks to help you customize and build your own GenAI application from scratch. Below are some of the libraries integrated with Memgraph.
LangChain
LangChain is a framework for developing applications powered by large language models (LLMs). Currently, Memgraph’s LangChain integration supports creating a knowledge graph from unstructured data and querying with natural language. You can follow the example on LangChain docs or go through quick start below.
Installation
To install all the required packages, run:
pip install langchain langchain-openai neo4j --user
Environment setup
Before you get started, make sure you have Memgraph running in the background.
Then, instantiate MemgraphGraph
in your Python code. This object holds the
connection to the running Memgraph instance. Make sure to set up all the
environment variables properly.
import os
from langchain_community.chains.graph_qa.memgraph import MemgraphQAChain
from langchain_community.graphs import MemgraphGraph
from langchain_openai import ChatOpenAI
url = os.environ.get("MEMGRAPH_URI", "bolt://localhost:7687")
username = os.environ.get("MEMGRAPH_USERNAME", "")
password = os.environ.get("MEMGRAPH_PASSWORD", "")
graph = MemgraphGraph(
url=url, username=username, password=password, refresh_schema=False
)
The refresh_schema
is initially set to False
because there is still no data in
the database and we want to avoid unnecessary database calls.
To interact with the LLM, you must configure it. Here is how you can set API key as an environment variable for OpenAI:
os.environ["OPENAI_API_KEY"] = "your-key-here"
Graph construction
For the dataset, we’ll use the following text about Charles Darwin:
text = """
Charles Robert Darwin was an English naturalist, geologist, and biologist,
widely known for his contributions to evolutionary biology. His proposition that
all species of life have descended from a common ancestor is now generally
accepted and considered a fundamental scientific concept. In a joint
publication with Alfred Russel Wallace, he introduced his scientific theory that
this branching pattern of evolution resulted from a process he called natural
selection, in which the struggle for existence has a similar effect to the
artificial selection involved in selective breeding. Darwin has been
described as one of the most influential figures in human history and was
honoured by burial in Westminster Abbey.
"""
To construct the graph, first initialize LLMGraphTransformer
from the desired
LLM and convert the document to the graph structure.
llm = ChatOpenAI(temperature=0, model_name="gpt-4-turbo")
llm_transformer = LLMGraphTransformer(llm=llm)
documents = [Document(page_content=text)]
graph_documents = llm_transformer.convert_to_graph_documents(documents)
The graph structure in the GraphDocument
format can be forwarded to the
add_graph_documents()
procedure to import in into Memgraph:
# Make sure the database is empty
graph.query("STORAGE MODE IN_MEMORY_ANALYTICAL")
graph.query("DROP GRAPH")
graph.query("STORAGE MODE IN_MEMORY_TRANSACTIONAL")
# Create KG
graph.add_graph_documents(graph_documents)
The add_graph_documents()
procedure transforms the list of graph_documents
into appropriate Cypher queries and executes them in Memgraph.
In the below image, you can see how the text was transformed into a knowledge graph and stored into Memgraph.
For additional options, check the full guide on the LangChain docs.
Querying
In the end, you can query the knowledge graph:
chain = MemgraphQAChain.from_llm(
ChatOpenAI(temperature=0),
graph=graph,
model_name="gpt-4-turbo",
allow_dangerous_requests=True,
)
print(chain.invoke("Who Charles Robert Darwin collaborated with?")["result"])
Here is the result:
MATCH (:Person {id: "Charles Robert Darwin"})-[:COLLABORATION]->(collaborator)
RETURN collaborator;
Alfred Russel Wallace
In the image below, you can see what’s happening under the hood to get the answer.
LlamaIndex
LlamaIndex is a simple, flexible data framework for connecting custom data sources to large language models. Currently, Memgraph’s integration supports creating a knowledge graph from unstructured data and querying with natural language. You can follow the example on LlamaIndex docs or go through quick start below.
Installation
To install LlamaIndex and Memgraph graph store, run:
pip install llama-index llama-index-graph-stores-memgraph
Environment setup
Before you get started, make sure you have Memgraph running in the background.
To use Memgraph as the underlying graph store for LlamaIndex, define your graph store by providing the credentials used for your database:
from llama_index.graph_stores.memgraph import MemgraphPropertyGraphStore
username = "" # Enter your Memgraph username (default "")
password = "" # Enter your Memgraph password (default "")
url = "" # Specify the connection URL, e.g., 'bolt://localhost:7687'
graph_store = MemgraphPropertyGraphStore(
username=username,
password=password,
url=url,
)
Additionally, a working OpenAI key is required:
import os
os.environ["OPENAI_API_KEY"] = "<YOUR_API_KEY>" # Replace with your OpenAI API key
Dataset
For the dataset, we’ll use a text about Charles Darwin stored in the
/data/charles_darwin/charles.txt
file:
Charles Robert Darwin was an English naturalist, geologist, and biologist,
widely known for his contributions to evolutionary biology. His proposition that
all species of life have descended from a common ancestor is now generally
accepted and considered a fundamental scientific concept. In a joint publication
with Alfred Russel Wallace, he introduced his scientific theory that this
branching pattern of evolution resulted from a process he called natural
selection, in which the struggle for existence has a similar effect to the
artificial selection involved in selective breeding. Darwin has been described
as one of the most influential figures in human history and was honoured by
burial in Westminster Abbey.
from llama_index.core import SimpleDirectoryReader
documents = SimpleDirectoryReader("./data/charles_darwin/").load_data()
The data is now loaded in the documents variable which we’ll pass as an argument in the next step of index creation and graph construction.
Graph construction
LlamaIndex provides multiple graph constructors. In this example, we’ll use the SchemaLLMPathExtractor, which allows to both predefine the schema or use the one LLM provides without explicitly defining entities.
from llama_index.core import PropertyGraphIndex
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
from llama_index.core.indices.property_graph import SchemaLLMPathExtractor
index = PropertyGraphIndex.from_documents(
documents,
embed_model=OpenAIEmbedding(model_name="text-embedding-ada-002"),
kg_extractors=[
SchemaLLMPathExtractor(
llm=OpenAI(model="gpt-4", temperature=0.0),
)
],
property_graph_store=graph_store,
show_progress=True,
)
In the below image, you can see how the text was transformed into a knowledge graph and stored into Memgraph.
Querying
Labeled property graphs can be queried in several ways to retrieve nodes and paths and in LlamaIndex, several node retrieval methods at once can be combined.
If no sub-retrievers are provided, the defaults are LLMSynonymRetriever.
query_engine = index.as_query_engine(include_text=True)
response = query_engine.query("Who did Charles Robert Darwin collaborate with?")
print(str(response))
In the image below, you can see what’s happening under the hood to get the answer.
Resources
- Cedars-Sinai: Using Graph Databases for Knowledge-Aware Automated Machine Learning: A webinar on utilizing AutoML and GraphRAG to improve predicitions and drug discovery for Alzheimer’s disease.
- Optimizing Insulin Management: The Role of GraphRAG in Patient Care: A webinar on GraphRAG with Memgraph which enhances AI decision-making to improve healthcare outcomes.
- Enhancing LLM Chatbot Efficiency with GraphRAG: A webinar from which you can learn how Microchip Technology leveraged Memgraph to improve the efficiency of their LLM-powered chatbot.
- From Questions to Queries: How to Talk to Your Graph Database With LLMs?: A webinar on GraphChat, an AI feature in Memgraph Lab that lets you query graph data using natural language.
- LLMs, Memgraph and Orbit: Modeling Community Networks: A webinar on how Orbit leverages LLMs with Memgraph to model, simulate, and enrich these dynamic conversational ecosystems.
- GenAI Stack: A blog post about Memgraph’s demo app showcasing natural language querying utilizing LangChain integration with various LLMs.
- Querying Memgraph through an LLM: Community call with Brett Brewer, former VP at Microsoft, demoing LangChain integration with Memgraph to query with natural language.
Want to learn more?
To learn more, check out Enhancing AI with graph databases and LLMs bootcamp and on-demand resources. Stay up to date with Memgraph events and watch videos from the AI, LLMs and GraphRAG YouTube playlist.