Integrating Memgraph with LlamaIndex Building GenAI Apps

By Memgraph

7 min readNovember 29, 2024

The recent webinar explored the practical overlap between graph databases and generative AI (GenAI), breaking down how Memgraph and LlamaIndex work together to build smarter, more dynamic applications. Here's a straight-to-the-point recap of the key takeaways.

This webinar brought together Laurie Voss, VP of Developer Relations at LlamaIndex, and Matea Pesic, DX Engineer at Memgraph, to dive into the mechanics of transforming unstructured data into actionable knowledge.

Watch the full webinar recording:

In the meantime, here are the key talking points from the webinar.

Talking Point 1: Introduction to LlamaIndex

Laurie Voss introduced LlamaIndex as a framework in Python and TypeScript for building generative AI applications, especially knowledge assistants. It supports developers with tools for parsing unstructured data, constructing knowledge graphs, and enabling efficient retrieval techniques.

Talking Point 2: LlamaParse and LlamaCloud

Laurie highlighted LlamaParse, a document parsing service that handles PDFs, Word documents, and other formats, ensuring data compatibility with LLMs. Llama Cloud was described as an enterprise service for end-to-end retrieval-augmented generation (RAG) without requiring developers to manage intermediate steps.

Talking Point 3: GraphRAG in LlamaIndex

We go into GraphRAG (Graph-based Retrieval-Augmented Generation), explaining how it uses graph databases like Memgraph to construct and query knowledge graphs. Relationships within the graph enhance context retrieval, making it an essential tool for generative AI workflows.

Talking Point 4: Challenges in RAG and Solutions

We continue to talk about challenges in RAG pipelines, such as limited summarization capabilities, comparison issues, and difficulty handling multipart questions. Solutions include improving data quality and adopting agentic querying strategies for enhanced performance.

Talking Point 5: Agentic Strategies and Workflows

Laurie explained how agents can improve RAG pipelines by incorporating reasoning, memory, multi-turn queries, and error correction. He introduced three reasoning loops (sequential, DAG-based, and tree-based) to address complex tasks and enhance efficiency. Workflows were presented as tools for creating multi-agent systems, allowing agents to collaborate on sophisticated tasks.

Talking Point 6: Introduction to Memgraph

For our audience who are not familiar with Memgraph, Matea covers the basics before going deeper. Memgraph is an in-memory graph database designed for real-time analytics, making it ideal for applications that require fast data processing and dynamic updates. Memgraph includes tools like the MAGE Library, which offers a wide range of graph algorithms, and Memgraph Lab, a visualization and querying tool for exploring data and creating knowledge graphs.

Matea highlights Memgraph’s advanced graph capabilities, including deep graph traversal algorithms like BFS and DFS, and community detection tools, such as the newly added Leiden algorithm, which improves performance and accuracy. These tools allow developers to extract relevant insights and traverse data more efficiently.

Talking Point 7: Memgraph in GenAI Applications

Matea explains how Memgraph complements LlamaIndex in constructing and querying knowledge graphs. She emphasized Memgraph’s ability to traverse data neighborhoods, use graph algorithms like BFS, DFS, and community detection, and handle real-time updates critical for LLMs.

Talking Point 8: GraphChat in Memgraph Lab

Matea showcased GraphChat, a feature in Memgraph Lab that allows users to interact with graph data using natural language. It translates user queries into Cypher, executes them, and delivers natural language responses.

Talking Point 9: LlamaIndex Integration with Memgraph

Matea delved into how Memgraph enhances the capabilities of LlamaIndex, focusing on two critical aspects: knowledge graph creation and querying methods.

Memgraph seamlessly integrates with LlamaIndex to transform unstructured data into a property graph, converting raw information into a structured, queryable format. Matea demonstrated this process using a text file on Charles Darwin, showcasing how LlamaIndex's graph constructors, like the schema extractor, work with Memgraph to extract entities and relationships and store them as a knowledge graph.

She also addressed common challenges in knowledge graph construction, such as entity resolution, where inconsistencies like duplicate nodes (e.g., "Charles Darwin" vs. "Charles Robert Darwin") arise. She outlined potential solutions, including preprocessing data, postprocessing graphs, or leveraging Memgraph's in-memory schema insights to optimize node merging and graph integrity.

Once the knowledge graph is constructed, Matea shifted focus to querying it using LlamaIndex’s LLMSynonymRetriever. For instance, querying “Who did Charles Darwin collaborate with?” yielded results from the graph, such as "Alfred Russell Wallace." LLMSynonymRetriever then translates this data into a natural language response, providing developers with accessible insights.

Talking Point 10: Upcoming Features in Memgraph

Matea announced that Memgraph has rolled out the vector search feature, designed to improve retrieval speed and accuracy. She also discussed ongoing optimizations using schema information and future integration enhancements.

Q&A

We’ve compiled the questions and answers from the community call Q&A session. Note that we’ve paraphrased them slightly for brevity. For complete details, watch the entire video.

Does the schema LLM extractor in LlamaIndex extract properties and assign them to entities and relationships?

Answer: While LlamaIndex can assign properties to entities and relationships, it doesn’t do so by default, which can be surprising to some users. However, it is possible to configure this feature to ensure properties are set appropriately during graph construction.

What’s the difference between Memgraph and Neo4j?

Answer: The main advantage of Memgraph is its performance as an in-memory graph database, which allows faster data processing compared to disk-based solutions like Neo4j. Memgraph's dynamic environment enables real-time updates with algorithms like Dynamic Community Detection and PageRank, offering an edge in dynamic knowledge graphs. Additionally, Memgraph is open source and has a supportive community.

Should attributes like "died in" and "year" be modeled as properties instead of separate entities in the demo you’ve shown?

Answer: Users have the option to define schemas manually or rely on LLMs for schema definition. Predefining schemas can help adjust models to specific use cases and remove outliers.

Are there any demos available for updating knowledge graphs?

Answer: While still under development, Memgraph and LlamaIndex support updating existing graphs by pulling schema information or dynamically adding updates. There are ongoing efforts to create Jupyter Notebook demos for these capabilities.

What scenarios make users move from basic RAG to GraphRAG?

Answer: GraphRAG is ideal when working with data that inherently has complex relationships, such as existing knowledge graphs. For example, a car repair application using a graph of interconnected parts benefits from GraphRAG’s ability to traverse and query these relationships effectively.

How was the integration experience between LlamaIndex and Memgraph?

Answer: The integration process was straightforward, thanks to LlamaIndex's developer-friendly setup. This ease of use allowed the Memgraph team to quickly create graph-based demos and applications without external tools complicating the process.

Related Resources

Memgraph docs
LlamaIndex docs

Memgraph Academy

If you are new to the GraphRAG scene, check out a few short and easy-to-follow lessons from our subject matter experts. For free. Start with: