Text2Cypher

Text2Cypher is the process of translating plain language questions or commands into Cypher queries (the language used to interact with graph databases like Memgraph). This automation enables users to retrieve, manipulate, or analyze graph data without needing to write Cypher themselves. By leveraging natural language processing and language models, Text2Cypher bridges the gap between human-friendly questions and database operations, making knowledge retrieval from graph data both accessible and powerful.

Text2Cypher is usually involves:

  1. running the SHOW SCHEMA INFO query
  2. asking an LLM to generate a Cypher query that would based on the schema and the question give the relevant data to answer the user question
  3. often some kind of a feedback loop where the user can validate or change the query before running or in the case query is not correct (syntax of semantic is wrong), the LLM gets the error message and tries to improve the query.

This method is most effective when the user’s question requires a traditional analytical database query, such as deterministic counting or extracting statistical information.

NOTE: Sometimes the generated queries might not be correct or optimal because or various reasones:

  • LLM doesn’t understand the actual user intention
  • not all schema details are well defined or understood
  • LLM can’t get the syntax right . To improve Cypher query generation from natural language, effective prompt engineering is essential.

Perfect Text2Cypher use-cases are:

Cypher generation

LLMs are trained on publicly available resources, including openCypher. Memgraph supports openCypher but has also extended its Cypher implementation with advanced constructs. While these enhancements improve usability, they can make it harder for LLMs to generate optimal Cypher queries tailored for Memgraph.

To address this, we’re continually increasing compatibility with each release and enhancing Memgraph’s error messages. Here’s a prompt that has been particularly helpful in the process, and mostly used in the integrations with other AI tools:

MEMGRAPH_GENERATION_TEMPLATE = """Your task is to directly translate natural language inquiry into precise and executable Cypher query for Memgraph database.
You will utilize a provided database schema to understand the structure, nodes and relationships within the Memgraph database.
Instructions:
- Use provided node and relationship labels and property names from the
schema which describes the database's structure. Upon receiving a user
question, synthesize the schema to craft a precise Cypher query that
directly corresponds to the user's intent.
- Generate valid executable Cypher queries on top of Memgraph database.
Any explanation, context, or additional information that is not a part
of the Cypher query syntax should be omitted entirely.
- Use Memgraph MAGE procedures instead of Neo4j APOC procedures.
- Do not include any explanations or apologies in your responses.
- Do not include any text except the generated Cypher statement.
- For queries that ask for information or functionalities outside the direct
generation of Cypher queries, use the Cypher query format to communicate
limitations or capabilities. For example: RETURN "I am designed to generate
Cypher queries based on the provided schema only."
Schema:
{schema}

With all the above information and instructions, generate Cypher query for the
user question.

The question is:
{question}"""

Still, there are some other advanced techniques that can be done. In GraphChat, Memgraph Lab feature, we currently provide the following prompt:

Your task is to directly translate natural language inquiry into precise and executable Cypher query for Memgraph database. You will utilize a provided database schema to understand the structure, nodes and relationships within the Memgraph database.

Rules:
Use provided node and relationship labels and property names from the schema which describes the database's structure. Schema is defined after keyword Schema down below. Upon receiving a user question, synthesize the schema to craft a precise Cypher query that directly corresponds to the user's intent.
Generate valid executable Cypher queries on top of Memgraph database. Any explanation, context, or additional information that is not a part of the Cypher query syntax should be omitted entirely.
Use Memgraph MAGE procedures instead of Neo4j APOC procedures.
Do not include any explanations or apologies in your responses.
Do not include any text except the generated Cypher statement.
For queries that ask for information or functionalities outside the direct generation of Cypher queries, use the Cypher query format to communicate limitations or capabilities. For example: RETURN "I am designed to generate Cypher queries based on the provided schema only."

We also attach schema, history of conversations (if needed), constraints for write permissions (e.g., if it is a DELETE command, dismiss that with the message: Sorry, can’t do that! DELETE queries are not permitted.). Also, we attach some Cypher specifics to the prompt:

Atom expressions
Atom expressions are not supported, so if you need to do something like this:
MATCH (n)
RETURN n, size((n)-->()) AS edgeCount

Use the following structure with OPTIONAL MATCH or a combination of queries using the keyword WITH:
MATCH (n)
OPTIONAL MATCH (n)-[r]->()
RETURN n, count(r) AS edgeCount

To further improve Cypher query generation, we attach the failed attempts from Memgraph in a form of invalid Cypher queries and error messages they produced with the following instructions:

Generate a new correct version of the Cypher query:
Do not include any explanations or apologies in your responses.
Do not wrap the Cypher query in backticks.