Quickstart to custom query module in C++
In order to start working on a query module, you need to have a Memgraph instance running on a server or in a container. The container or the server where the instance is running will also be used as a development environment for your query module in this case.
This will provide you with an isolated environment for testing and developing your query module. In order to follow this guide, the best way would be to have Docker installed on your machine.
If you do not use Docker but have Memgraph on your Linux machine, the following steps will be the same. You will just run commands on your machine instead of the docker container.
Run Memgraph
First, you need to run the Memgraph instance. You can do this by running the following command:
docker run -p 7687:7687 -p 7444:7444 --name memgraph memgraph/memgraph-mage
Keep in mind that you can use the Docker image with just Memgraph or with Memgraph and MAGE. It won’t change the way you develop your C++ query module. If you want to run Memgraph on a native Linux machine, follow these installation steps
Enter into memgraph container shell
After you have started the Memgraph instance, you need to enter the shell. You can do this by running the following command:
docker exec -u 0 -it memgraph bash
This will start bash inside the container as the root user. Since we need to install dependencies and compilers, we need to be the root user. Of course, if you are running a native Linux machine, you can skip this step.
Install the required dependencies
In order to install the required dependencies and tooling, you need to run the following command:
apt update -y
apt install -y git cmake clang clangd vim
In this guide, we will use cmake
and clang
to compile the query module. By default, Memgraph uses clang
as a compiler, so it is recommended that it be used as well. The currently used version of clang
is 17.0.2
, but this is subject to change and can be inspected in the latest version of toolchain file.
Vim and Git are optional. It is recommended to use Git for version control and create a backup since if you delete your container, you will lose the file developed inside the container. Of course, if you do not want to use the Vim for your day-to-day development, you can use any other text editor or IDE that you feel most comfortable with. The recommended way would be to connect to the container and use your favorite IDE. For example Visual Studio Code has Developing inside the container feature.
The same goes for cmake
, you can use any other build system you want or call the compiler directly.
Create a directory for your query module
Now that all of the required dependencies are installed, you can create a directory for your query module. You can do this by running the following command:
mkdir hello_query_module
cd hello_query_module
Develop a sample query module
Now, you can create a sample query module. You can do this by running the following command:
vim hello_query_module.cpp
This will open the Vim editor. You can copy the following code into the editor:
#include <memgraph/mgp.hpp>
#include <memgraph/mg_exceptions.hpp>
const char *kProcedureGet = "get";
const char *kParameterStart = "start";
const char *kParameterSteps = "steps";
const char *kReturnStep = "step";
const char *kReturnNode = "node";
void RandomWalk(mgp_list *args, mgp_graph *memgraph_graph, mgp_result *result, mgp_memory *memory) {
mgp::MemoryDispatcherGuard guard(memory);
const auto arguments = mgp::List(args);
const auto record_factory = mgp::RecordFactory(result);
try {
const auto start = arguments[0].ValueNode();
const auto n_steps = arguments[1].ValueInt();
srand(time(NULL));
auto current_nodes = mgp::List();
current_nodes.AppendExtend(mgp::Value(start));
std::int64_t step = 0;
while (step <= n_steps) {
auto current_node = current_nodes[current_nodes.Size() - 1].ValueNode();
auto neighbours = mgp::List();
for (const auto relationship : current_node.OutRelationships()) {
neighbours.AppendExtend(mgp::Value(relationship));
}
if (neighbours.Size() == 0) {
break;
}
const auto next_node = neighbours[rand() % neighbours.Size()].ValueRelationship().To();
current_nodes.AppendExtend(mgp::Value(next_node));
step++;
}
for (std::int64_t i = 0; i < current_nodes.Size(); i++) {
auto record = record_factory.NewRecord();
record.Insert(kReturnStep, i);
record.Insert(kReturnNode, current_nodes[i].ValueNode());
}
}
catch (const std::exception &e) {
record_factory.SetErrorMessage(e.what());
}
}
extern "C" int mgp_init_module(struct mgp_module *module, struct mgp_memory *memory) {
mgp::MemoryDispatcherGuard guard(memory);
std::int64_t default_steps = 10;
try {
mgp::AddProcedure(RandomWalk,
kProcedureGet,
mgp::ProcedureType::Read,
{
mgp::Parameter(kParameterStart, mgp::Type::Node),
mgp::Parameter(kParameterSteps, mgp::Type::Int, default_steps)
},
{
mgp::Return(kReturnStep, mgp::Type::Int),
mgp::Return(kReturnNode, mgp::Type::Node)
},
module,
memory);
} catch (const std::exception &e) {
return 1;
}
return 0;
}
extern "C" int mgp_shutdown_module() { return 0; }
The code above contains a lot of details that are currently not so important and in the focus. All you need to know is that this is a basic query module that implements a random walk algorithm as a procedure; the rest of the details will be explained later in this guide.
Compile the query module
First, create a CMakeLists.txt
file in the same directory as your hello_query_module.cpp
file. You can do this by running the following command:
vim CMakeLists.txt
In the file, you can copy the following code:
cmake_minimum_required(VERSION 3.10.0)
project(hello_query_module)
# Set the C++ standard to C++20
set(CMAKE_CXX_STANDARD 20)
set(CMAKE_CXX_COMPILER "clang++")
# Create a shared library named _hello_query_module
add_library(_hello_query_module SHARED hello_query_module.cpp)
Create a build directory and run the CMake command:
mkdir build
cd build
Run the cmake and make:
cmake DMAKE_BUILD_TYPE=Release ..
make
Move and load the query module
After the compilation is done, you can move the query module to the Memgraph query modules directory. By default, Memgraph will look for .so files in /usr/lib/memgraph/query_modules/
.
To move the compiled shared library. You can do this by running the following command:
mv lib_hello_query_module.so /usr/lib/memgraph/query_modules/lib_hello_query_module.so
The other option would be to update your CMakeLists.txt
file to copy the file to the query modules directory every time you compile the shared library.
cmake_minimum_required(VERSION 3.10.0)
project(hello_query_module)
# Set the C++ standard to C++20
set(CMAKE_CXX_STANDARD 20)
set(CMAKE_CXX_COMPILER "clang++")
# Create a shared library named _hello_query_module
add_library(_hello_query_module SHARED hello_query_module.cpp)
# Delete the existing file
file(REMOVE /usr/lib/memgraph/query_modules/lib_hello_query_module.so)
# Print a message when the file is successfully removed
add_custom_command(TARGET _hello_query_module PRE_BUILD
COMMAND ${CMAKE_COMMAND} -E echo "File removed successfully"
)
# Move the built library to the destination
add_custom_command(TARGET _hello_query_module POST_BUILD
COMMAND ${CMAKE_COMMAND} -E copy $<TARGET_FILE:_hello_query_module> /usr/lib/memgraph/query_modules/lib_hello_query_module.so
)
# Print a message when the file is successfully moved
add_custom_command(TARGET _hello_query_module POST_BUILD
COMMAND ${CMAKE_COMMAND} -E echo "File moved successfully"
)
Load the query module in Memgraph
If Memgraph was running previously, you need to reload shared libraries so it can see your new query module.
By running mgconsole, which is included with every Memgraph image, you can load the query module. You can do this by running the following command:
mgconsole --fit_to_screen
Once inside the mgconsole, you can load the query module by running the following command:
CALL mg.load_all();
The previous command will reload all query modules that are present in the /usr/lib/memgraph/query_modules/
directory.
Check that the query module is loaded by running the following command:
CALL mg.procedures() YIELD *;
This will return all of the procedures that are loaded in Memgraph.
Test the query module
Now that we have loaded the query module, we can test it. You can do this by running the following command:
CREATE (a:NodeA)-[:Relationship]->(b:NodeB);
Call the query module:
MATCH (a:NodeA)
WITH a
CALL lib_hello_query_module.get(a, 1) YIELD node, step
RETURN *;
If you experience any issues with loading the query module, it is recommended to check and follow the Memgraph logs and restart Memgraph.
Next steps
Once your basic random_walk
Query module is working, you can start developing your own query module. It is recommended that you read this guide to the end, starting with the Query module architecture section.
Also, at this point, consider setting up a Git repository and start versioning your code so you can create a backup via container volumes or by pushing the code to the remote repository.
Query module architecture
The query module allows you to extend the functionality of the Memgraph Cypher language by writing your own custom procedures and functions in C++. Inside the query modules file, you can define multiple procedures and functions for different purposes:
- Read or write procedures
- Magic functions
- Batched procedures
Each of these has a specific purpose and is used in different situations, but they share the same API during the registration of the query module.
The basic parts of every query module are as follows:
#include <mgp.hpp>
// (Query procedure & magic function callbacks)
// missing procedure and function definitions
extern "C" int mgp_init_module(struct mgp_module *module, struct mgp_memory *memory) {
// Register your procedures & functions here
}
extern "C" int mgp_shutdown_module() {
// If you need to release any resources at shutdown, do it here
return 0;
}
- The
mgp.hpp
file contains all declarations of the C++ API for implementation. As you can see by the function signature, the query module is written in C++, but it is a wrapper around the C API. Query module procedures and functions. It should be located in the/usr/include/memgraph
directory, and it is included with every Memgraph docker image or install. - To make your query procedures and functions available, they need to be registered in
mgp_init_module
, the C API registration function. - Finally, you may use
mgp_shutdown_module
to reset any global states or release global resources at shutdown.
Registering procedures and functions as Query modules
All query procedures and functions need to be registered in mgp_init_module
. The way the registration is done depends on the type of procedure or function.
Here is an example of registering a read procedure via the AddProcedure function:
extern "C" int mgp_init_module(struct mgp_module *module, struct mgp_memory *memory) {
try {
mgp::MemoryDispatcherGuard guard(memory);
AddProcedure(RandomWalk, "get", mgp::ProcedureType::Read,
{mgp::Parameter("start", mgp::Type::Node), mgp::Parameter("length", mgp::Type::Int)},
{mgp::Return("random_walk", mgp::Type::Path)}, module, memory);
} catch (const std::exception &e) {
return 1;
}
return 0;
}
Here the procedure’s signature is defined and added as a readable procedure ProcedureType::Read
, named get
. This means that the procedure will be callable from Cypher queries, and it will be named query_module_name.get
in the Cypher namespace.
The procedure will take two named parameters (start
and length
): the starting node and random walk length, and it will yield the computed random walk as a Path
(sequence of nodes connected by relationships) in the random_walk
result field.
When the procedure is called, its arguments (and the graph context) will be passed to the RandomWalk
callback function.
From the perspective of a Cypher query, you would call this procedure as follows:
MATCH (a:NodeA)
WITH a
CALL query_module_name.get(a, 10) YIELD random_walk
RETURN random_walk;
In this case, query_module_name
is the name of the query module (compiled .so file) that contains the procedure. Each query module can contain multiple procedures and functions, and they will all be callable from Cypher queries; you just need to add them to the mgp_init_module
function.
From the perspective of writing an exact RandomWalk procedure, callbacks share the same signature, as laid out below.
Parameter by parameter, the callback receives the procedure arguments args
,
graph context memgraph_graph
, result stream (result
), and memory access.
void RandomWalk(mgp_list *args, mgp_graph *memgraph_graph, mgp_result *result, mgp_memory *memory) {
mgp::MemoryDispatcherGuard guard(memory);
const auto arguments = mgp::List(args);
const auto record_factory = mgp::RecordFactory(result);
try {
const auto start_node = args[0].ValueNode();
const auto length = args[1].ValueInt();
auto random_walk = mgp::Path(start_node);
// (Random walk algorithm logic)
auto record = record_factory.NewRecord();
record.Insert("random_walk", random_walk);
} catch (const std::exception &e) {
record_factory.SetErrorMessage(e.what());
}
}
mgp::memory
was a global object, that meant all of the procedures and
functions in a single shared library will refer to the same mgp::memory
object. As a result, calling such callables simultaneously from multiple
threads will lead to incorrect memory usage. This also includes the case when
the same callable is called from different user sessions. For some versions of
memgraph this was deprecated and the use of the thread-safe
MemoryDispatcherGuard guard(memory)
is advised instead. Its has now been
removed and C++ modules should be recompiled for v2.18.1+. v2.21 onwards
setting mgp::memory
will cause a compilation error, so the guard has to be
used.
Registration and the writing procedure are the same. Aldo, write procedures differ from read procedures in their graph context being mutable.
With them, you may create or delete nodes and relationships, modify their properties, and add or remove node labels.
They use the same interface as readable procedures; the only difference is that the appropriate procedure type parameter is passed to AddProcedure
.
The below code registers and implements a writeable procedure add_x_nodes
, which adds a user-specified number of nodes to the graph:
Here is the full example of registering a write procedure:
extern "C" int mgp_init_module(struct mgp_module *module, struct mgp_memory *memory) {
try {
mgp::MemoryDispatcherGuard guard(memory);
mgp::AddProcedure(AddXNodes, "add_x_nodes", mgp::ProcedureType::Write, {mgp::Parameter("number", mgp::Type::Int)},
{}, module, memory);
} catch (const std::exception &e) {
return 1;
}
return 0;
}
Notice the mgp::ProcedureType::Write
procedure type. Here is the implementation of the callback function AddXNodes
:
void AddXNodes(mgp_list *args, mgp_graph *memgraph_graph, mgp_result *result, mgp_memory *memory) {
try {
mgp::MemoryDispatcherGuard guard(memory);
const auto arguments = mgp::List(args);
auto graph = mgp::Graph(memgraph_graph);
const auto record_factory = mgp::RecordFactory(result);
for (int i = 0; i < arguments[0].ValueInt(); i++) {
graph.CreateNode();
}
} catch (const std::exception &e) {
record_factory.SetErrorMessage(e.what());
}
}
Procedure registration is called during the query module loading; since it binds with Memgraph resources and processes, it is important to handle any errors that might occur during the registration. Hence, the appropriate try-catch blocks are added to avoid any unexpected behavior.
Any exceptions thrown should never leave the scope of your module or registration. You may have a top-level exception handler that returns the error value and potentially logs any error messages. Exceptions that cross the module boundary may cause unexpected issues!
Registering Magic function
Magic functions are a Memgraph feature that lets the user write and call custom Cipher functions. Unlike procedures, functions are simple operations that can’t modify the graph; they return a single value and can be used in any expression or predicate.
Let’s examine an example function that multiplies the numbers passed to it. The
registration is done by AddFunction
in the same way as with query procedures,
the difference being the absence of a “function type” argument (functions don’t
modify the graph).
extern "C" int mgp_init_module(struct mgp_module *module, struct mgp_memory *memory) {
try {
mgp::MemoryDispatcherGuard guard(memory);
mgp::AddFunction(Multiply, "multiply",
{mgp::Parameter("int", mgp::Type::Int), mgp::Parameter("int", mgp::Type::Int)}, module, memory);
} catch (const std::exception &e) {
return 1;
}
return 0;
}
There are two key differences in the function signature:
- the lack of a
mgp_graph *
parameter (the graph is immutable in functions) - different result type (functions return single values, while procedures write result records to the result stream)
The difference in result type means that, to work with function results, we use
a different C++ API class: Result
. Our function is implemented as follows:
void Multiply(mgp_list *args, mgp_func_context *ctx, mgp_func_result *res, mgp_memory *memory) {
mgp::MemoryDispatcherGuard guard(memory);
const auto arguments = mgp::List(args);
auto result = mgp::Result(res);
auto first = arguments[0].ValueInt();
auto second = arguments[1].ValueInt();
result.SetValue(first * second);
}
Registering Batched procedures
Batched readable and writeable procedures in C++ are pretty similar to batched procedures in C. The way procedures work is the same as in C API; the only difference is procedure registration.
void BatchCSVFile(mgp_list *args, mgp_graph *memgraph_graph, mgp_result *result, mgp_memory *memory) {
...
}
void InitBatchCsvFile(mgp_list *args, mgp_graph *memgraph_graph, mgp_result *result, mgp_memory *memory) {
...
}
void CleanupBatchCsvFile(mgp_list *args, mgp_graph *memgraph_graph, mgp_result *result, mgp_memory *memory) {
...
}
extern "C" int mgp_init_module(struct mgp_module *module, struct mgp_memory *memory) {
try {
mgp::MemoryDispatcherGuard guard(memory);
AddBatchProcedure(BatchCSVFile, InitBatchCsvFile, CleanupBatchCsvFile,
"read_csv", mgp::ProcedureType::Read,
{mgp::Parameter("file_name", mgp::Type::String)},
{mgp::Return("row", mgp::Type::Map)}, module, memory);
} catch (const std::exception &e) {
return 1;
}
return 0;
}
Examples of query modules API usage
Logging the Query module
To have insight into the query module execution, outside of debugging, you can use logging. The logging is done by calling the mgp_log
function. The function takes two arguments: the first one is the log level, and the second one is the message that is going to be logged.
Logs will be visible in the Memgraph log files; keep in mind that the log level visibility is set in the Memgraph configuration file.
mgp_log(mgp_log_level::MGP_LOG_LEVEL_INFO, "Hello world from Memgraph Query module procedure!");
Here is the basic example of a full query module that logs a message when it is started:
#include <memgraph/mg_exceptions.hpp>
#include <memgraph/mgp.hpp>
void HelloWorld(mgp_list *args, mgp_graph *memgraph_graph, mgp_result *result,
mgp_memory *memory) {
mgp::MemoryDispatcherGuard guard(memory);
const mgp::List arguments = mgp::List(args);
const auto record_factory = mgp::RecordFactory(result);
try {
mgp_log(mgp_log_level::MGP_LOG_LEVEL_INFO,
"Hello world from Memgraph Query module procedure!");
} catch (const std::exception &e) {
record_factory.SetErrorMessage(e.what());
}
}
extern "C" int mgp_init_module(struct mgp_module *module,
struct mgp_memory *memory) {
mgp::MemoryDispatcherGuard guard(memory);
try {
mgp::AddProcedure(HelloWorld, "hello", mgp::ProcedureType::Read, {},
{}, module, memory);
} catch (const std::exception &e) {
return 1;
}
return 0;
}
extern "C" int mgp_shutdown_module() { return 0; }
If you hop to the logs of Memgraph, you should see something like this:
[memgraph_log] [info] Hello world from Memgraph Query module procedure!
Passing arguments to the functions and procedures
During the registration process of the procedure or function, you need to define the arguments that are going to be passed to the callback function.
The arguments are passed as a list of values that can be wrapped in the Value object.
Here is an example of a procedure that takes a simple string
argument and logs the message with a string argument:
void HelloWorld(mgp_list *args, mgp_graph *memgraph_graph, mgp_result *result,
mgp_memory *memory) {
mgp::MemoryDispatcherGuard guard(memory);
const mgp::List arguments = mgp::List(args);
const auto record_factory = mgp::RecordFactory(result);
try {
const auto input_string = arguments[0].ValueString();
std::string output_string = "Hello world from Memgraph Query module procedure!";
output_string += input_string;
mgp_log(mgp_log_level::MGP_LOG_LEVEL_INFO, output_string.c_str());
} catch (const std::exception &e) {
record_factory.SetErrorMessage(e.what());
}
}
extern "C" int mgp_init_module(struct mgp_module *module,
struct mgp_memory *memory) {
mgp::MemoryDispatcherGuard guard(memory);
try {
mgp::AddProcedure(HelloWorld, "hello", mgp::ProcedureType::Read,
{mgp::Parameter("string_key", mgp::Type::String)}, {},
module, memory);
} catch (const std::exception &e) {
return 1;
}
return 0;
}
extern "C" int mgp_shutdown_module() { return 0; }
Upon being called, HelloWorld
receives the list of arguments (args
) passed
in the query.
With the C++ API, we next retrieve the argument values from args
by putting
them into a list so we can use the indexing ([]
) operator. In the code above,
the retrieving of arguments is done in this line:
const auto input_string = arguments[0].ValueString();
The arguments are raw values at the time of their fetching from the list, so types
are assigned to them with ValueString()
for extra operability and
expressiveness within the algorithm.
To call the Query module, you just need to pass the string argument to the procedure:
CALL lib_hello_query_module.hello("This message is passed inside the Query module procedure, hello!");
As the result of a Query module call, you should see something like this in the Memgraph logs:
[memgraph_log] [info] Hello world from Memgraph Query module procedure! This message is passed inside the Query module procedure, hello!
Passing a simple string argument is not the only way to pass the arguments to the query module. You can pass any type that is supported by the Memgraph. Here is an example of a Query module call that takes a variety of different arguments:
MATCH (node:Country {name: 'Germany'}), (a)-[rel:FRIENDS_WITH {date_of_start: 2011}]->(b), path=(x)-[:FRIENDS_WITH {date_of_start: 2012}]->(y)
WITH node, rel, path
CALL lib_hello_query_module.hello(
123,
"This message is passed inside the Query module procedure, hello!",
true,
1.23,
date(),
node,
rel,
path,
["string1", "string2", "string3"]
)
RETURN *
In the Cypher query above, first, the MATCH
clause is used to get node
, edge
and path
, which were created inside the Memgraph; after that, the data is passed to the query module.
Here is the full example of the query module that takes different types of arguments:
#include <memgraph/mg_exceptions.hpp>
#include <memgraph/mgp.hpp>
#include <string>
void HelloWorld(mgp_list *args, mgp_graph *memgraph_graph, mgp_result *result,
mgp_memory *memory) {
mgp::MemoryDispatcherGuard guard(memory);
const mgp::List arguments = mgp::List(args);
const auto record_factory = mgp::RecordFactory(result);
try {
const auto input_int = arguments[0].ValueInt();
const auto input_string = arguments[1].ValueString();
const auto input_bool = arguments[2].ValueBool();
const auto input_double = arguments[3].ValueDouble();
const auto input_date = arguments[4].ValueDate();
const auto input_node = arguments[5].ValueNode();
const auto input_relationship = arguments[6].ValueRelationship();
const auto input_path = arguments[7].ValuePath();
const auto input_list_of_strings = arguments[8].ValueList();
std::string output_string =
"Hello world from Memgraph Query module procedure!\n";
output_string += "String: " + std::string(input_string) + "\n";
output_string += "Int: " + std::to_string(input_int) + "\n";
output_string += "Bool: " + std::to_string(input_bool) + "\n";
output_string += "Double: " + std::to_string(input_double) + "\n";
output_string += "Date: " + input_date.ToString() + "\n";
output_string += "Node: " + input_node.ToString() + "\n";
output_string += "Relationship: " + input_relationship.ToString() + "\n";
output_string += "Path: " + input_path.ToString() + "\n";
output_string += "List of Strings: \n";
for (auto iterator = input_list_of_strings.begin();
iterator != input_list_of_strings.end(); ++iterator) {
output_string += std::string((*iterator).ValueString()) + "\n";
}
mgp_log(mgp_log_level::MGP_LOG_LEVEL_INFO, output_string.c_str());
} catch (const std::exception &e) {
record_factory.SetErrorMessage(e.what());
}
}
extern "C" int mgp_init_module(struct mgp_module *module,
struct mgp_memory *memory) {
mgp::MemoryDispatcherGuard guard(memory);
try {
mgp::AddProcedure(
HelloWorld, "hello", mgp::ProcedureType::Read,
{mgp::Parameter("int_key", mgp::Type::Int),
mgp::Parameter("string_key", mgp::Type::String),
mgp::Parameter("bool_key", mgp::Type::Bool),
mgp::Parameter("double_key", mgp::Type::Double),
mgp::Parameter("date_key", mgp::Type::Date),
mgp::Parameter("node_key", mgp::Type::Node),
mgp::Parameter("relationship_key", mgp::Type::Relationship),
mgp::Parameter("path_key", mgp::Type::Path),
mgp::Parameter("list_of_strings_key",
{mgp::Type::List, mgp::Type::String})},
{}, module, memory);
} catch (const std::exception &e) {
return 1;
}
return 0;
}
extern "C" int mgp_shutdown_module() { return 0; }
Each of the passed properties is mapped to a log message; notice the case of list
, node
, edge
, and path
that are not directly convertible to the string and require a bit of processing.
For example, Node
is converted to a string by calling the ToString
method. A full list of the methods that can be used on the Node is available in the Node C++ API documentation; the same goes for the rest of the supported types.
[2023-12-06 11:39:21.992] [memgraph_log] [info] Hello world from Memgraph Query module procedure!
String: This message is passed inside the Query module procedure, hello!
Int: 123
Bool: 1
Double: 1.230000
Date: 2023-12-6
Node: (id: 40, :Country, properties: {continent: Europe, language: German, name: Germany, population: 83000000})
Relationship: (id: 43, :Person, properties: {name: John})-[type: FRIENDS_WITH, id: 5, properties: {date_of_start: 2011}]->(id: 44, :Person, properties: {name: Harry})
Path: (id: 45, :Person, properties: {name: Anna})-[type: FRIENDS_WITH, id: 6, properties: {date_of_start: 2012}]->(id: 43, :Person, properties: {name: John})
List of Strings:
string1
string2
string3
Parameters can also contain default values and be optional; for more information, check the C++ API documentation. This also means parameters should be checked before being used inside the procedure.
For example, the value of some property could be null
, or if the list is empty. When developing your query module, you should always check if the optional arguments are valid.
Returning the results from the query module
The results of the query module are returned by Record object. The object is passed to the callback function as an argument. The result object is used to create the result records that are returned to the Cypher query call. By filing the Record factory with the records, you are returning the results to the Cypher query call.
Here is an example of a query module that returns a single record with a single field:
#include <memgraph/mg_exceptions.hpp>
#include <memgraph/mgp.hpp>
void HelloWorld(mgp_list *args, mgp_graph *memgraph_graph, mgp_result *result,
mgp_memory *memory) {
mgp::MemoryDispatcherGuard guard(memory);
const mgp::List arguments = mgp::List(args);
const auto record_factory = mgp::RecordFactory(result);
try {
const auto input_string = arguments[0].ValueString();
auto record = record_factory.NewRecord();
record.Insert("message", input_string);
} catch (const std::exception &e) {
record_factory.SetErrorMessage(e.what());
}
}
extern "C" int mgp_init_module(struct mgp_module *module,
struct mgp_memory *memory) {
mgp::MemoryDispatcherGuard guard(memory);
try {
mgp::AddProcedure(HelloWorld, "hello", mgp::ProcedureType::Read,
{mgp::Parameter("string_key", mgp::Type::String)},
{mgp::Return("return_key", mgp::Type::String)},
module, memory);
} catch (const std::exception &e) {
return 1;
}
return 0;
}
extern "C" int mgp_shutdown_module() { return 0; }
As you can see in all previous examples, in the the part of the try-catch block, the record_factory
is used to return the error message. This is done by calling the SetErrorMessage method.
const auto record_factory = mgp::RecordFactory(result);
try {
//Rest of the code omitted for brevity
} catch (const std::exception &e) {
record_factory.SetErrorMessage(e.what());
}
}
When calling the query module above that returns specific results, you need to use YIELD and RETURN keywords.
In the query module, we return everything that is returned by the query module. Otherwise, you would need to specify the fields that you want to return it to, which would be return_key
in this case.
CALL lib_hello_query_module.hello("This message is passed inside the Query module procedure, hello!") YIELD * RETURN *;
+--------------------------------------------------------------------+
| return_key |
+--------------------------------------------------------------------+
| "This message is passed inside the Query module procedure, hello!" |
+--------------------------------------------------------------------+
The same return logic can be applied to any data type from the Record API.
Reading from the graph
The C Graph object is passed to the callback function as an argument. The C++ API Graph object is used to access the underlying Memgraph storage. The Graph object contains a wide range of available methods that you could use to get the information about the graph, read, modify, and delete it.
Here is the basic example of how to access the graph nodes
, and return them as a list of nodes.
#include <memgraph/mg_exceptions.hpp>
#include <memgraph/mgp.hpp>
const char *kReturnNodes = "nodes";
void HelloWorld(mgp_list *args, mgp_graph *memgraph_graph, mgp_result *result,
mgp_memory *memory) {
mgp::MemoryDispatcherGuard guard(memory);
const mgp::RecordFactory record_factory = mgp::RecordFactory(result);
const mgp::List arguments = mgp::List(args);
const mgp::Graph graph(memgraph_graph);
mgp::List nodes{};
try {
mgp::Nodes::Iterator it = graph.Nodes().begin();
while (it != graph.Nodes().end()) {
mgp::Node node = *it;
nodes.AppendExtend(mgp::Value(node));
++it;
}
auto record = record_factory.NewRecord();
record.Insert(kReturnNodes, nodes);
} catch (const std::exception &e) {
mgp_error err = mgp_log(mgp_log_level::MGP_LOG_LEVEL_ERROR,
"Issue with running the procedure!");
record_factory.SetErrorMessage(e.what());
}
}
extern "C" int mgp_init_module(struct mgp_module *module,
struct mgp_memory *memory) {
mgp::MemoryDispatcherGuard guard(memory);
try {
mgp::AddProcedure(
HelloWorld, "hello", mgp::ProcedureType::Read, {},
{mgp::Return(kReturnNodes, {mgp::Type::List, mgp::Type::Node})}, module,
memory);
} catch (const std::exception &e) {
return 1;
}
return 0;
}
extern "C" int mgp_shutdown_module() { return 0; }
Notice that the C++ Graph object is created by passing the memgraph_graph
object from C API to the constructor.
In place of working with the raw mgp_
type arguments, use the C++ API classes
that provide familiar standard library-like interfaces and do away with needing
manual memory management.
After that, the Nodes
iterator is used to iterate over all of the nodes in the graph.
Here is the call of the Query module from above:
CALL lib_hello_query_module.hello() YIELD * RETURN *;
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| nodes |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| [(:Country {continent: "Europe", language: "German", name: "Germany", population: 83000000}), (:Country {continent: "Europe", language: "French", name: "France", population: 67000000}), (:Cou... |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Write to the graph
If you are using Graph object to modify or delete some parts of the graph, you should register your Query module procedure by passing the procedure type mgp::ProcedureType::Write
.
This will make your procedure graph object mutable, and you will be able to modify the graph.
Here is an example of a query module that creates a node and a relationship:
#include <memgraph/mg_exceptions.hpp>
#include <memgraph/mgp.hpp>
void HelloWorld(mgp_list *args, mgp_graph *memgraph_graph, mgp_result *result,
mgp_memory *memory) {
mgp::MemoryDispatcherGuard guard(memory);
const mgp::RecordFactory record_factory = mgp::RecordFactory(result);
const mgp::List arguments = mgp::List(args);
mgp::Graph graph(memgraph_graph);
try {
auto nodeA = graph.CreateNode();
nodeA.SetProperty("name", mgp::Value("A"));
auto nodeB = graph.CreateNode();
nodeB.SetProperty("name", mgp::Value("B"));
auto relationship = graph.CreateRelationship(nodeA, nodeB, "FRIENDS");
} catch (const std::exception &e) {
mgp_error err = mgp_log(mgp_log_level::MGP_LOG_LEVEL_ERROR,
"Issue with running the procedure!");
record_factory.SetErrorMessage(e.what());
}
}
extern "C" int mgp_init_module(struct mgp_module *module,
struct mgp_memory *memory) {
mgp::MemoryDispatcherGuard guard(memory);
try {
mgp::AddProcedure(HelloWorld, "hello", mgp::ProcedureType::Write, {}, {},
module, memory);
} catch (const std::exception &e) {
return 1;
}
return 0;
}
extern "C" int mgp_shutdown_module() { return 0; }
Notice the example above, which utilizes the CreateNode
and CreateRelationship
methods to create the nodes and relationships. This will create two nodes where each node has a property name
and a relationship between them.
Since the Query module does not return the value to the Cypher query, you can call it like this:
CALL lib_hello_query_module.hello();
You can then check the number of nodes and relationships in the graph or just get the graph node to confirm that the nodes and relationships were created.
MATCH (a {name:"A"}), (b {name:"B"}), p=(a)-[*]->(b) RETURN *
-> ;
+-----------------------------------------+-----------------------------------------+-----------------------------------------+
| a | b | p |
+-----------------------------------------+-----------------------------------------+-----------------------------------------+
| ({name: "A"}) | ({name: "B"}) | ({name: "A"})-[:FRIENDS]->({name: "B"}) |
+-----------------------------------------+-----------------------------------------+-----------------------------------------+
Terminate procedure execution
Since every query module is run as one transaction in Memgraph, just as the execution of a Cypher query can be terminated with TERMINATE TRANSACTIONS "id";
query,
the execution of the procedure can, as well if it takes too long to yield a
response or gets stuck in an infinite loop due to unpredicted input data.
Transaction ID is visible upon calling the SHOW TRANSACTIONS
query.
In order to be able to terminate the procedure, it has to contain a function
graph.CheckMustAbort();
which precedes crucial parts of the code, such as
while
and until
loops or similar points where the procedure might become
costly.
Consider the following example:
#include <cstdint>
#include <unordered_map>
#include <unordered_set>
#include <algorithm>
#include <mgp.hpp>
#include <mg_exceptions.hpp>
// Methods
constexpr char const *get = "get";
// Return object names
char const *return_field = "return";
void Test(mgp_list *args, mgp_graph *memgraph_graph, mgp_result *result, mgp_memory *memory) {
mgp::MemoryDispatcherGuard guard(memory);
const auto record_factory = mgp::RecordFactory(result);
auto graph = mgp::Graph(memgraph_graph);
int64_t id_ = 1;
try {
while (true) {
graph.CheckMustAbort();
++id_;
}
} catch (const mgp::MustAbortException &e) {
std::cout << e.what() << std::endl;
auto new_record = record_factory.NewRecord();
new_record.Insert(return_field, id_);
}
}
extern "C" int mgp_init_module(struct mgp_module *module, struct mgp_memory *memory) {
try {
mgp::MemoryDispatcherGuard guard(memory);
mgp::AddProcedure(Test, get, mgp::ProcedureType::Read, {}, {mgp::Return(return_field, mgp::Type::Int)}, module, memory);
} catch(const std::exception &e) {
return 1;
}
return 0;
}
extern "C" int mgp_shutdown_module() { return 0; }
Notice that graph.CheckMustAbort();
is called before the while
loop. This prevents the procedure from getting stuck in an infinite loop or taking too long to yield a response.
Developing a query module with Mage
In this tutorial, we will learn how to develop a query module in C++ with Mage on the example of the random walk algorithm.
Prerequisites
There are three options for installing and working with Memgraph MAGE:
- install MAGE development environment from Docker Hub
- build MAGE using the
docker build
command - build MAGE from source
Position yourself in the MAGE repository you cloned earlier. Once you are
there, enter the cpp
subdirectory and create a new directory called
random_walk_module
with the random_walk_module.cpp
file inside it.
cpp
└── random_walk_module
└── random_walk_module.cpp
To make sure the module is linked with the rest of MAGE code, we need to add a
CMakeLists.txt
script in the new directory and register our module in the
cpp/CMakelists.txt
script as well. Refer to the existing scripts in MAGE’s
query modules.
Our random_walk
module contains a single procedure, get
which implements the
algorithm. The procedure takes two input parameters: the starting node and the
number of steps (10 by default), and it returns the generated random walk in the
form of a list of step | node
entries, one for each step.
All in all, we can define its signature as get(start: Node, steps: int = 10) -> [step: int | node: Node]
.
Let’s take a look at the structure of our query module.
#include <mg_utils.hpp>
void RandomWalk(mgp_list *args, mgp_graph *memgraph_graph,
mgp_result *result, mgp_memory *memory);
extern "C" int mgp_init_module(struct mgp_module *module,
struct mgp_memory *memory);
extern "C" int mgp_shutdown_module() { return 0; }
If you hop to the part about query module architecture, you will notice that the structure of the query module is the same as the one in the example above.
Main algorithm
The main implementation of the RandomWalk
algorithm is implemented in the code snippet below.
#include <memgraph/mgp.hpp>
#include <memgraph/mg_exceptions.hpp>
const char *kProcedureGet = "get";
const char *kParameterStart = "start";
const char *kParameterSteps = "steps";
const char *kReturnStep = "step";
const char *kReturnNode = "node";
void RandomWalk(mgp_list *args, mgp_graph *memgraph_graph, mgp_result *result, mgp_memory *memory) {
mgp::MemoryDispatcherGuard guard(memory);
const auto arguments = mgp::List(args);
const auto record_factory = mgp::RecordFactory(result);
try {
const auto start = arguments[0].ValueNode();
const auto n_steps = arguments[1].ValueInt();
srand(time(NULL));
auto current_nodes = mgp::List();
current_nodes.AppendExtend(mgp::Value(start));
std::int64_t step = 0;
while (step <= n_steps) {
auto current_node = current_nodes[current_nodes.Size() - 1].ValueNode();
auto neighbours = mgp::List();
for (const auto relationship : current_node.OutRelationships()) {
neighbours.AppendExtend(mgp::Value(relationship));
}
if (neighbours.Size() == 0) {
break;
}
const auto next_node = neighbours[rand() % neighbours.Size()].ValueRelationship().To();
current_nodes.AppendExtend(mgp::Value(next_node));
step++;
}
for (std::int64_t i = 0; i < current_nodes.Size(); i++) {
auto record = record_factory.NewRecord();
record.Insert(kReturnStep, i);
record.Insert(kReturnNode, current_nodes[i].ValueNode());
}
}
catch (const std::exception &e) {
record_factory.SetErrorMessage(e.what());
}
}
extern "C" int mgp_init_module(struct mgp_module *module, struct mgp_memory *memory) {
mgp::MemoryDispatcherGuard guard(memory);
std::int64_t default_steps = 10;
try {
mgp::AddProcedure(RandomWalk,
kProcedureGet,
mgp::ProcedureType::Read,
{
mgp::Parameter(kParameterStart, mgp::Type::Node),
mgp::Parameter(kParameterSteps, mgp::Type::Int, default_steps)
},
{
mgp::Return(kReturnStep, mgp::Type::Int),
mgp::Return(kReturnNode, mgp::Type::Node)
},
module,
memory);
} catch (const std::exception &e) {
return 1;
}
return 0;
}
extern "C" int mgp_shutdown_module() { return 0; }
Compiling and running the query module
To build the query module, we need to run the following command:
python3 setup build -p /usr/lib/memgraph/query_modules/ --lang cpp
The query should be compiled and placed in the /usr/lib/memgraph/query_modules/
directory.
Make sure you have reloded the query modules.
CALL mg.load_all();
Now you can call the query module:
MATCH (start:Node {id: 0})
CALL random_walk.get(start, 2) YIELD step, node
RETURN step, node;
Testing
Test decoupled parts of your code that don’t depend on Memgraph like you would in any other setting. End-to-end (e2e) tests, on the other hand, depend on internal Memgraph data structures, like nodes and edges. After running Memgraph, we need to prepare the testing environment on the host machine. Position yourself in the mage directory you cloned from GitHub. The expected folder structure for each module is the following:
mage
└── e2e
└── random_walk_test
└── test_base
├── input.cyp
└── test.yml
input.cyp
represents a Cypher script for entering the data into the database.
To simplify this tutorial, we’ll leave the database empty. test.yml
specifies
which test query should be run by the database and what the result or
exception should be. Create the files following the aforementioned directory structure.
input.cyp
MATCH (n) DETACH DELETE n;
test.yml
query: >
MATCH (start:Node {id: 0})
CALL random_walk.get(start, 2) YIELD path
RETURN path
output: []
Lastly, run the e2e tests with python:
python test_e2e