Cypher Generation vs Tool Invocation: Designing Reliable AI for Graph Databases

By Katarina Šupe

9 min readMay 28, 2025

Introduction

As a graph database built for performance and developer experience, Memgraph implements openCypher, the most widely adopted standard for graph query languages. While openCypher offers a rich and intuitive syntax to interact with graph databases, Memgraph extends it further to improve user experience and expressive power. For instance, deep path traversals with built-in filtering are natively supported, making complex queries more straightforward and faster to write.

But in an era increasingly dominated by natural language interfaces, there’s a catch. With the rise of large language models (LLMs), users expect to type any question and get an immediate, accurate result, without needing to know Cypher. This raises an important question:

Should we limit the advancement of our query capabilities just to make LLMs more comfortable?

Or is there a better way to bridge the gap between expressive graph querying and LLM-powered interfaces?

This article breaks down two fundamentally different approaches to building natural language interfaces for graph databases:

Letting the LLM generate Cypher directly
Letting the LLM invoke pre-built tools, each mapped to well-defined Cypher logic

We'll break down the trade-offs between the two and explain why Memgraph favors a tool-first approach. An approach that doesn't sacrifice power, safety, or user control.

Direct Cypher Generation

These days, many developers are tasked with building chatbots that let anyone in the company casually “talk” to a database. The idea is compelling. It bridges the knowledge gap, giving everyone easy access to data and quick answers to their questions. In theory, this should boost team productivity across the board.

The appeal of this approach lies in its flexibility, versatility, and creative freedom. With a good enough prompt and a powerful model, an LLM can, in theory, generate any Cypher query for any user question, from simple data lookups to complex graph algorithms. There's no need to predefine a list of allowed actions or restrict users to a set of buttons or filters. It's open-ended and adaptive — a conversational layer over raw data.

But let’s not get confused; working with LLMs is not the same as waving a magic wand. Creating a chatbot that consistently works well in various scenarios is far from simple.

Here’s how Cypher generation with LLMs typically works at a basic level:

A user asks a natural language question.
That question is passed to the LLM via a prompt instructing it to generate a Cypher query.
The LLM produces the Cypher query it believes will retrieve the correct data.
The query is executed against the database.

Now you might wonder: What’s in that prompt that helps the LLM know how to write a Cypher query?

That’s where the real complexity begins, and prompt engineering becomes critical.

To get good results, you need to understand your database deeply and craft prompts that reflect that knowledge. For example, with Memgraph, you’d want to include information about MAGE, Memgraph’s built-in graph analytics library. That way, the model knows how to call procedures like pagerank.get() correctly. Without it, the model might fall back on generic syntax, or worse, syntax tailored to other Cypher-compatible databases.

This leads to another important point: LLMs are trained on massive datasets and are biased toward the most common or popular implementations.

Since other Cypher-supporting vendors have been around longer and have more online documentation and examples, LLMs are more likely to mirror their Cypher dialect, not Memgraph’s.

Regardless of the database, another strategy to improve generation quality is to include the database schema in the prompt. This gives the model context about node labels, properties, and relationships, often leading to better queries.

But there’s a catch! Large schemas mean large prompts, and token limits become a real problem quickly.

Some teams go a step further and retrain or fine-tune the model to better understand their specific schema, query patterns, or Cypher extensions. While this can improve generation quality, it comes with serious downsides. Retraining is slow, costly, and hard to iterate. And if your schema changes frequently or if you’re using a database like Memgraph (where new features, procedures, or custom queries are often added), retraining quickly becomes a bottleneck, not a solution.

So while direct Cypher generation offers flexibility and a low barrier to entry, it comes with fragility, bias, performance issues, and safety concerns. All of these become more obvious as your application grows.

That’s exactly where tool invocation comes in. Instead of teaching the model to write perfect Cypher every time, we give it something simpler: a set of well-defined tools it can choose from.

Tool Invocation

Luckily, after spending enough time experimenting with Cypher generation via prompts, AI agents entered the picture. An AI agent is a system that uses reasoning to select and execute predefined tools or actions in order to fulfil a user's goal based on natural language input.

Now, when thinking about graph databases and Cypher, the natural question is:

What kind of tools can we provide to an agent to empower it for reliable and meaningful communication with the database. And why should we?

The idea is to bridge the gap we discussed earlier. Instead of expecting the LLM to perfectly generate Cypher for a specific database setup or stuffing the entire Cypher syntax and database context into a massive prompt, we offload that responsibility into tools. These tools essentially form a lightweight API layer toward Memgraph.

Some tools might wrap basic functionality, like:

show_schema_info(), to let the agent introspect the database

Others might represent custom, use-case-specific queries, such as:

find_nearby_fraudulent_users(user_id), for a fraud detection system
run_pagerank_on_subgraph(topic), for graph analytics scenarios

Think of it this way: if you’ve ever had a Cypher query you constantly rerun, that’s likely a candidate to become a tool. Once turned into a tool, it becomes something the agent can reliably reuse, with consistent structure and known parameters.

For example, if your use case involves identifying potential fraud rings by exploring users one or two hops from a known fraudulent user, you could wrap that query into a reusable tool. That way, when someone asks the chatbot to “find suspicious users connected to X,” the LLM doesn’t invent a query, it simply calls the tool you already trust. Your chatbot becomes your fraud detection agent, not just a fancy text interface.

This approach is called tool invocation. Instead of letting the LLM generate Cypher directly, we provide it with a curated set of tools, each one backed by a specific Cypher query. The LLM doesn't need to invent anything, it just decides which tool to use based on the user's input. In a way, tools act like APIs toward Memgraph. They have clear inputs, produce predictable outputs, and are easy to maintain and reason about.

The main advantage here is control. You know exactly what queries are being run in your database. You define the logic, optimize performance where needed, and ensure nothing dangerous or unintended slips through. The added bonus is that it becomes easier to debug and audit what your AI assistant is doing, since it’s just calling tools and not improvising Cypher on the spot.

Of course, this doesn’t come for free. The more coverage you want, the more tools you have to define. If you only expose five tools, the agent can only answer five kinds of questions. But if you thoughtfully design your toolset to reflect common patterns and use cases in your domain, your agent becomes far more capable and more trustworthy.

Introducing tools is not limiting what AI can do. Quite the opposite! We’re giving it a solid foundation to work from. Rather than guessing, it executes. Rather than hallucinating, it reuses.

This approach is what powers the AI Toolkit for Memgraph. Instead of relying on prompt engineering or retraining models, we’re giving developers a structured way to build reliable AI assistants for graph data.

Why Memgraph Benefits from a Tool-Based Approach

Memgraph is an in-memory graph database built for performance. It thrives on real-time, complex queries running against constantly changing data. Beyond speed, Memgraph offers a rich set of graph algorithms through the MAGE library and supports fast, runtime schema tracking.

What makes Memgraph unique is how dynamic everything is. Data is constantly changing. Schema evolves on the fly. Users can register custom query modules and procedures at any time. That level of flexibility is powerful, but it’s also unpredictable from an LLM’s perspective.

Expecting an LLM to keep up with all that in real time, through prompting alone or retraining, isn’t realistic. That’s why Memgraph primarily benefits from a tool-based approach to AI.

By exposing queries and procedures through clearly defined tools, you give the LLM exactly what it needs: structure, safety, and a reliable interface to interact with a dynamic system. At the same time, you get complete control over performance and behavior. You can optimize queries behind each tool, monitor their usage, and adjust their logic without retraining.

In a system like Memgraph, where things are constantly in motion, tools become the stable bridge between your evolving graph and the AI trying to make sense of it.

Conclusion

While direct Cypher generation offers flexibility and fast prototyping, it quickly runs into issues with accuracy, performance, and maintainability, especially in dynamic environments like Memgraph. That’s why we’re embracing a tool-first approach, where LLMs interact with the database through a curated set of reliable, Cypher-backed tools. This gives developers more control, improves safety, and leads to more stable and scalable AI systems.

To support this, we’ve released the AI Toolkit for Memgraph, a growing collection of ready-to-use tools designed to help developers build reliable AI agents that work with Memgraph out of the box.

And this is just the beginning. In the next blog post, we’ll dive into how the toolkit works, how to extend it with your tools, and how it fits into your AI-powered workflows.

We’re also bringing this approach directly into Memgraph Lab. The next version of GraphChat will support tool invocation natively, giving you a much more reliable and context-aware way to query your graph using natural language.

Prefer using Cypher generation? No problem! Check out our tips & tricks for improving prompt design and generation accuracy.