Model a knowledge graph

From the business perspective, a knowledge graph (KG) represents how you organize and represent your data. Hence, it encodes the knowledge about your use case, which can be explored and fetched. Knowledge is encoded in the graph’s structure in the form of nodes, relationships and properties.

From the operational and developer perspective, knowledge graphs represent just a specific way of organizing and storing your data in the graph. Following particular organization techniques, you can encode complex relationships between different entities in your domain.

💡

Knowledge graph is a structured representation of information where entities and their relationships are organized to enable reasoning and insights.

This guide will help you understand fundamental concepts of knowledge graph modeling. You will learn:

About the benefits of knowledge graphs and how they empower developers and businesses.
How to store a knowledge graph into Memgraph.
How to model and use a knowledge graph on a project management example.

Benefits of knowledge graphs

Knowledge graphs are more than just a way to store data—they empower developers and businesses to gain deeper insights, enhance operations, and build smarter systems. Here’s how:

Understand complex relationships: Knowledge graphs reveal hidden patterns and connections between entities that are often missed in traditional relational systems. This is particularly useful in domains with highly interconnected data.
Enhance data discovery: By representing data as a graph, knowledge graphs enable relationship traversal, uncovering new insights or trends that drive better decision-making.
Build AI applications: Knowledge graphs are foundational in building AI-driven applications such as recommendation engines, semantic search systems, and intelligent agents.
Enable flexible querying: Graph queries are dynamic and powerful, allowing developers to perform complex traversals and relationship-based searches that are difficult or impossible in relational databases. With a knowledge graph, you can query data in a way that mirrors your business logic.

Storing a knowledge graph in Memgraph

Memgraph is built on the Labeled Property Graph (LPG) model, one of the most flexible and intuitive ways to represent domain knowledge in graph form. This flexibility allows you to model complex, real-world relationships with ease.

Example

Dataset

In this example we’ll go through a process of knowledge graph creation for a project management scenario. Let’s assume you have a dataset that contains information about people, their skills, the companies they work for, tasks they need to do, tasks that are part of the project, and the project that the client requests. So, you have a network of nodes and relationships representing data without the added layer of meaning or semantics. Initially, this would be the graph model:

Initial Graph Model

The goal is to create a knowledge graph - a structured representation of information where entities and their relationships are organized to enable reasoning and insights.

The main purpose of having a knowledge graph is to get valuable insights from data to improve the area of work you’re dealing with. Another reason to have a knowledge graph is to enable LLM to answer the question for you (GraphRAG). In this project management example, knowledge graph can help you with:

Efficient Resource Allocation
Project Planning and Tracking
Skill Gap Analysis
Improved Client Management
Cross-domain expertise and Flexibility
Cost Optimization

The focus of the knowledge graph could be to answer questions like:

Do people have sufficient skills to perform all the tasks they are working on?
Are we able to execute all the projects?
Do we have missing gaps in skills that we need to fill?
Does the budget cover the cost of people with sufficient skills to work on that project?
Do employees work on tasks that match their domain preferences?

Querying a simple graph

Let’s start with the initial model we have in the database.

To answer the question “Do people have sufficient skills to perform all the tasks they are working on?”, first come up with a Cypher query. For example, match all the people that work on tasks, and then filter them by the necessary skill to perform the task.

Here is how the query would look like:

MATCH (p:Person)-[:WORKS_ON]->(t:Task)
WHERE all(skill IN t.Skills WHERE skill IN p.Skills)
RETURN *;

It doesn’t look like a complicated query, but a prerequisite to come up with it is to know the structure of the graph - what skills represent the person and task nodes, and how they are even correlated. The knowledge is built into the query instead of being encoded in the graph.

Cypher queries can become complex for complicated questions, but if you have a set of predefined questions and someone who knows Cypher, it’s possible to deal with the graph as is by inferring the knowledge from the graph model and writing specific queries to answer specific questions.

Now imagine you decided to enable a software agent in the form of LLM to answer the same question.

One approach is to enrich the prompt for the LLM with the schema of the graph. This approach has its limitations due to the size of LLM’s context window.

Another, and probably better approach would be to encode the knowledge into the graph. LLM then needs to find a part of the graph to focus on, based on the question asked, and expand the relevant context further by traversing across the graph. In GraphRAG terms, you need to find a pivot point and perform relevance expansion. Pivot point is usually found by performing vector search - finding nodes in the graph most similar to the question which is embedded in the same space. The relevance expansion is the process of expanding the data around the pivot point and providing additional relevant context to the LLM model to ground it. For example, the relevance expansion can be done by performing two hops in the graph from the pivot point. Learn more about these terms in our GraphRAG docs.

Encoding knowledge into the graph

Based on the previous example, it is all about skills, relationships with employees, and tasks. In order to improve a graph ontology, the skill needs to become a class, or node in this case.

The improved graph model looks like this:

Improved knowledge graph

Now the knowledge about skills and how they’re related to people and tasks is encoded into the graph model.

Notice how the relationships HAS between Person and Skill and the relationship NEEDS between Task and Skill are now semantically describing the relationship’s meaning. You can basically read sentences from the graph model.

Here is the updated Cypher query that answers the question “Do people have sufficient skills to perform all the tasks they are working on?” based on the new graph model:

MATCH (p:Person)-[h:HAS]->(s:Skill)<-[n:NEEDS]-(t:Task)
WHERE exists((p)-[:WORKS_ON]->(t))
RETURN *

Here is the query that answers the question “Which people don’t have sufficient skills to perform the tasks they are working on?”:

MATCH (p:Person)-[w:WORKS_ON]->(t:Task)-[n:NEEDS]->(s:Skill)
WHERE NOT exists((p)-[:HAS]->(s))
RETURN *

Although you still need to write the query to fetch the data, the query uses graph structure and has the semantics of the graph model. That can be used to infer knowledge from the graph and more easily find the answers to questions that were not even defined in the first place.

Knowledge graph optimization

The process of encoding the knowledge includes modeling the graph to represent the domain knowledge accurately. There is no a one-size-fits-all approach to modeling a knowledge graph, as it depends on the specific use case and the questions you want to answer.

In general, you want to extract all the entities and relationships that are relevant to your domain and represent them as nodes and relationships in the graph. This often means pulling out node properties and relationships from the data and creating a graph schema that captures their connections.

Here is a complete graph model from the previous example that will help answer more domain questions:

Complete knowledge graph

Notice how the graph model grew - from four node labels and three relationship types to six node labels and nine relationship types - and all with same data. This graph model is more complex and semantically rich, and it can be used to answer more complicated questions about the project management scenario. Even just by glancing at the model, you have a better understanding of what is happening.

Now it’s possible to answer this question: “For all the projects, do we have gaps in skills missing?”. Without writing the query, it should be clear from the graph model how to answer it.

Here is the query:

MATCH (p:Project)-[:REQUIRES]->(s:Skill)
OPTIONAL MATCH (person:Person)-[:HAS]->(s)
WITH p, s, collect(person) AS peopleWithSkill
WHERE size(peopleWithSkill) = 0
RETURN p.name AS UnfulfillableProject, collect(s.name) AS MissingSkills;

In the example dataset, there is a project that requires a skill that no one in the company has. This is the result of the above query:

+----------------------+----------------------+
| UnfulfillableProject | MissingSkills        |
+----------------------+----------------------+
| "AI Integration"     | ["Machine Learning"] |
+----------------------+----------------------+

Although we again wrote the query to fetch the data, the knowledge about the skills and projects is encoded into the graph model.

If you want to create this knowledge graph in Memgraph, run the queries from our repository.

Knowledge graphs and LLMs

By combining knowledge graphs with LLMs, you’re creating a GraphRAG system.

For GraphRAG, it is important that the underlying graph model really captures the semantics of the business and questions needed to answer. There are various pieces to the GraphRAG puzzle, as it combines knowledge graphs and LLMs, but one way of improving the system is to plan how to model the knowledge graph properly and to iterate until you get it right.

To return back to the example, let’s say LLM needs to answer the question “Are we able to execute Data Preprocessing project?”.

In the background, the question gets embedded, vector search is performed and pivot node is found. In this case, the pivot node is (:Project {name:"Data Preprocessing"}). The next step is relevance expansion, which is performed with the following query:

MATCH path=(p:Project {name:"Data Preprocessing"})-[r *..2]-(n) RETURN path;

All that’s returned is considered relevant information which is then forwarded to the LLM within the prompt.

Here is the example of the LLM response:


Project Overview

- Project Name: Data Preprocessing
 - Budget: $50,000
 - Start Date: 2024-10-01
 - End Date: 2024-11-30
 - Total Duration: 60 days

Tasks and Requirements

- Task Name: Data Cleaning
 - Description: Clean and preprocess raw data
 - Estimated Man-Days: 10 days
 - Required Skill: Data Analysis (inferred from the relationship)

Personnel Involved

- Person: Jane Smith
 - Cost per Day: $600
 - Skills: Data Analysis (inferred from the relationship)

Conclusion

Based on the analysis, the project does have a sufficient level of skilled
personnel to fulfill the requirements, as Jane Smith is qualified for the
necessary task. Additionally, the budget is adequate to cover her costs, with a
significant amount remaining for any additional needs or tasks that may arise.

If you look closely, the LLM has inferred knowledge from the knowledge graph. Without even thinking about specific things you want to ask, it predicts what the answer is based on the data in the knowledge graph.

Passing in the same data and asking the question: “Do employees work on tasks and projects that match their domain preferences?”

Here is the response from the LLM:

Yes, it appears that people are working on tasks they are interested in:

Task Interest Alignment: Jane Smith, who is assigned to work on the "Data
Cleaning" task, has a preference for the Data Science domain, which is related
to this task. This alignment suggests she is working within an area of interest.

Skills and Preferences: Jane possesses the Data Analysis skill, which is also a
requirement for the project and a direct need for the "Data Cleaning" task. This
indicates that her skills and interests align with her assigned
responsibilities. 

This structure suggests that team members like Jane are indeed
working on tasks that match both their skills and their domain preferences,
which can enhance motivation and project outcomes.

Why this matters

By integrating knowledge graphs with LLMs, GraphRAG enables systems to provide context-aware, intelligent answers to business-critical questions. The ability to encode domain knowledge into a graph and leverage it with LLMs ensures:

Scalable reasoning: Answers are derived from relationships encoded in the graph.
Dynamic adaptability: LLMs can process new and evolving questions without requiring manual query adjustments.
Enhanced insights: Semantically rich graph models provide better context for LLMs, leading to more accurate and meaningful results.

Want to learn more?

Check out our knowledge graph whitepaper to learn the importance of graphs in Enterprises.

If you’re more of a visual type, watch the guide on Memgraph Academy and see how combining graph databases with LLMs can make AI more effective.

In case you need help, schedule a 30 min session with one of our engineers to discuss how Memgraph fits with your architecture. Our engineers are highly experienced in helping companies of all sizes to integrate and get the most out of Memgraph in their projects. Talk to us about data modeling, optimizing queries, defining infrastructure requirements or migrating from your existing graph database. No nonsense or sales pitch, just tech.

Book a call

Modeling guides Model a graph from CSV file