Memgraph logo
Back to blog
Building Entity Graphs: From Unstructured Text to Graphs in Minutes

Building Entity Graphs: From Unstructured Text to Graphs in Minutes

By Marko Budiselic
7 min readNovember 11, 2025

Most of the world’s knowledge doesn’t live in structured databases. It lives in PDFs, reports, meeting notes, and chat messages. Within this unstructured text lie hidden connections:

  • who works with whom
  • which concepts depend on each other
  • how ideas evolve
  • and much more

These relationships, though implicit, define the truth within a specific domain or context. Yet traditional search and analytics tools can't uncover them. By transforming these documents into an entity graph, a network of people, places, organizations, and concepts, we can make knowledge explicit, discover patterns, and enable reasoning beyond keyword matching.

Unstructured2Graph, just released as part of the Memgraph AI Toolkit, makes these connections quickly visible by converting text into an entity graph that can be queried, analyzed, and reasoned over.

Understanding Unstructured Text

Unlike structured data stored in rows and columns, Unstructured text is information that doesn't follow a predefined format, such as sentences, paragraphs, and documents. Analyzing it directly isn't easy because the text conveys meaning through context, grammar, and nuance rather than explicit structure.

Machines must interpret relationships, resolve ambiguities, and extract entities from natural language to make sense of it. An entity graph organizes this extracted knowledge into nodes, representing people, places, organizations, or concepts, and edges representing their relationships.

It allows us to query and explore connections between different pieces of information, enabling more intelligent search, recommendations, and analytics.

For example, from the sentence "Alice joined Memgraph in 2021 to work on graph database technology,” we can extract entities:

  • Alice
  • Memgraph
  • Graph database technology
  • 2021

You can then connect them through edges to look like:

  • Alice -:WORKS_FOR-> Memgraph
  • Memgraph -:DEVELOPS-> Graph database technology

The resulting graph reveals how entities relate, forming a foundation for deeper insights.

How to Go From Unstructured To Graph

Follow this step-by-step how-to guide on Unstructured2Graph to get started immediately. Here’s all that it covers:

Preprocessing

Transforming raw documents into a graph requires preprocessing through:

  1. Text Extraction: You need to start by pulling readable content from PDFs, DOCX or TXT documents. Tools such as pdfminer.six, PyMuPDF, or textract can parse these formats. However, extraction is often challenging as text may appear inside tables, figures, or even embedded images. This makes it hard to preserve its order and meaning.
  2. Text Cleaning: Once text is extracted, you need to clean it to ensure it's ready for analysis. This step involves removing formatting artifacts, page numbers, headers, and other non-semantic noise, normalizing whitespace and fixing broken encodings or hidden characters.

Clean, consistent text provides the foundation for accurate natural language processing, allowing entity recognition and relationship extraction models to work effectively.

Without these preprocessing steps, the resulting entity graph would be fragmented or misleading, since even minor inconsistencies in text structure can distort how entities and relationships are detected.

In addition to the cleaning steps, there is also a need for chunking, which involves splitting text into smaller, more manageable pieces to improve processing and enable downstream data search using techniques such as vector search.

Libraries like Unstructured.io offer a lot of functionalities to do all this.

Entity & Relationship Extraction

Once the text is extracted and cleaned, you identify your key entities (people, organizations, locations, and other key concepts) using tools such as spaCy or similar NLP models. Then you need to extract relationships to identify how these entities connect. For instance, who works for whom or where an organization operates.

However, traditional NLP models can be complex to set up or train for domain-specific text (like legal or financial documents). Even smaller transformer models can't make it. Fortunately, Large Language Models simplify this process, as they can perform entity and relationship extraction even when the text isn't perfectly cleaned or structured.

After extracting entities and relationships, you can load the results into a graph database like Memgraph. When the data is properly prepared, this step becomes straightforward, producing a connected knowledge network that captures the meaning hidden within the text. This structured representation transforms unorganized information into a searchable, queryable, and analyzable graph.

Simplifying Extraction with LightRAG and Unstructured2Graph

To make this process easier, Memgraph provides Unstructured2Graph in the AI Toolkit, which combines the libraries:

  • unstructured for text parsing
  • LightRAG for ontology generation and LLM-driven extraction

These libraries incorporate expert prompt engineering and well-designed extraction flows, eliminating the need to manually craft prompts or extensively clean the text. Together, they automate entity and relationship extraction, chunking, and ingestion into Memgraph.

A minimal setup can look like this (pseudo-code):

rag = MemgraphLightRAGWrapper()
for document in unstructured2graph.make_chunks(["input.pdf"])
    for chunk document.chunks:
        await rag.ainsert(input=chunk.text)

After executing this snippet, you’ll have a clear entity graph automatically built inside Memgraph.

The MemgraphLightRAGWrapper from the AI Toolkit repository provides helpful utilities for managing the extraction process. For example, disabling embedding generation when not needed, or linking text chunks to the entities extracted from them

This makes transforming documents into structured knowledge graphs both intuitive and production-ready.

To further understand how embeddings fit into this pipeline, see **What Are Embeddings and How Memgraph Deals With Them.**

Performance Considerations

An important consideration when using LLMs for entity extraction is their speed and cost.

In tests using GPT-4o Mini, processing took around 10 seconds per document and required several LLM calls. When dealing with thousands of pages or documents, this quickly adds up, making the overall process time-consuming and potentially expensive.

The unstructured2graph library already provides basic estimates of how long it will take to process all your data. Over time, the goal is to add more options for speeding up the process while also determining the exact cost of knowledge graph construction.

If you want to experiment with running graph construction from any unstructured content, the best place to start is the examples under the unstructured2graph project.

The Memgraph team continues to optimize for faster throughput and better cost control.

Now, let’s return to the graph.

Querying and Applying Your Graph

Once ingested, you can explore your entity graph in Memgraph Lab or any preferred language driver. The following query is likely the simplest way to retrieve the entire entity graph from Memgraph.

MATCH (n)-[r]->(m) RETURN n, r, m;

visualization graph from unstructured

Entity graphs have wide applications across various real-world scenarios:

  • Competitive intelligence: companies use them to extract insights from reports, revealing relationships between organizations, markets, and technologies.
  • Scientific research: researchers build them to summarize scientific papers, connecting concepts, authors, and findings for faster discovery.
  • Legal analysis: legal teams analyze contracts by linking clauses, obligations, and involved parties to uncover dependencies or risks.
  • Retrieval-Augmented Generation (RAG): developers integrate entity graphs into RAG pipelines to enhance chatbot performance and deliver more accurate, context-aware answers by grounding large language models in structured, domain-specific knowledge.

The graph transforms unstructured text into actionable insights and supports more intelligent decision-making across all these use cases.

Follow our How-To Guide on Unstructured2Graph to build your first entity graph pipeline.

Wrapping Up

Unstructured2Graph is more than just a data processing utility, it’s a bridge between the vast, messy world of text and the structured intelligence of graphs.

Whether you’re connecting research papers, analyzing internal reports, or building a domain-specific RAG system, Unstructured2Graph makes it simple to turn unstructured information into connected knowledge that AI systems can understand.

With Memgraph, these entity graphs aren’t static, they’re live, queryable, and ready for exploration. So if your organization’s most valuable knowledge is still locked away in documents, it’s time to start connecting the dots.

Try it today Unstructured2Graph in a few minutes:

See how quickly you can go from text to insight.

Join us on Discord!
Find other developers performing graph analytics in real time with Memgraph.
© 2025 Memgraph Ltd. All rights reserved.