Graph data modeling best practices

Modeling your data for a graph database like Memgraph can be incredibly powerful, but it’s easy to hit a few roadblocks if you’re not careful. Whether you’re a newcomer to graph databases or transitioning from relational systems, understanding common pitfalls and best practices will set you up for success. Here’s a guide to help you avoid mistakes and design clean, efficient, and performant graph models.

Don’t overcomplicate the model

It’s tempting to model every last detail of your data as nodes and relationships. But this can lead to a bloated graph structure that’s inefficient and hard to maintain. Rather stick to simple and direct relationships between key entities. Ask yourself: “Does this node or relationship add real value to my model?” If the answer is no, leave it out. Focus on what matters most for your use case.

Avoid data duplication

Coming from a relational database background, you might feel the urge to duplicate data—like adding redundant properties or repeating nodes—just as you would with foreign keys and JOIN tables. This bloats your graph and wastes resources. Instead of duplicating data, use relationships to connect nodes. For example, instead of adding a worker’s information to every order, store it on a single Worker node and link it to Order nodes with relationships. Graph databases excel at representing relationships between entities. When building your model, think about the connections in your data. How do your entities interact? How can relationships help you query insights faster?

Consider different data types

When modeling a graph in Memgraph, it’s essential to optimize memory usage, as Memgraph is an in-memory database. Choosing the right property data types, you significantly reduce resource consumption and improve performance.

Use efficient data types such as integers where applicable. Booleans are another lightweight data type that minimizes memory usage.

Optimize date and time storage by storing dates and times as temporal data type as opposed to strings. For example, for the datetime value: "2021-10-05T14:15:00": if you store it as a string, it requires 1B per character. With 19 characters + 3B overhead = 22B total memory. Now, if you store it as a temporate type, it only takes up 15B of memory.

When working with properties that have repeated values, use enums like "Pending", "Approved", "Rejected"instead of strings because they take up less memory and are faster to compare, improving query performance.

Sometimes, using strings or integers to represent dates might seem straightforward, but it limits your ability to query time-based data efficiently. Use Memgraph’s built-in temporal data types like date and datetime. These not only save storage space but also allow for more expressive queries.

To index or not

Forgetting to create indexes can make queries painfully slow, but over-indexing can hurt write performance. Finding the right balance is crucial.

Only create indexes on properties that are frequently queried—like name, id, or other search-critical fields. Avoid indexing every property unless it’s absolutely necessary, and periodically review the effectiveness of your indexes as your dataset grows.

Switch to a graph way of thinking

Treating graph queries like SQL JOINs can lead to inefficient and overly complex queries. A graph database isn’t a relational database, and graph traversals are fundamentally different.

Think in terms of relationships and traversals. Instead of joining tables, focus on exploring the connections between nodes. Traversals are intuitive and designed for exploring networks—take advantage of this!

Measure the performance of your queries

Building a great data model is one thing, but ignoring performance testing can leave you with queries that are slow and inefficient when it matters most. Run real-world test queries as you build your model.

Memgraph executes queries based on the generated query plan. Use the PROFILE clause to analyze query execution, understand the impact of each operator and identify bottlenecks early on. It’s easier to fix performance issues in the design phase than after deploying your graph. Changing a single operator in a query plan can make a significant difference in data pipeline and query performance. To update the query plan, update the graph model or index the data properly.

Final thoughts

By following these best practices, you’ll create a graph model that’s not only efficient but also easy to maintain and scale. Always think critically about your data, your goals, and how your graph model can help achieve them. If you have questions or need help, book a call with our Engineers from the Developer Experience team.

Here are also some relevant resources that will come in handy when you’re modeling your graph:

Storage memory usage: Learn how much memory it takes to store a node, or how much memory one integer property takes.
Storage memory calculator: Estimate how much memory will your graph take and what is the total recommended storage space.
Querying best practices: In the end, it all comes down to the performance of your queries. Learn how to optimize them.

Model a graph from CSV file Data migration