Quickstart to custom query module in C++

In order to start working on a query module, you need to have a Memgraph instance running on a server or in a container. The container or the server where the instance is running will also be used as a development environment for your query module in this case.

This will provide you with an isolated environment for testing and developing your query module. In order to follow this guide, the best way would be to have Docker installed on your machine.

If you do not use Docker but have Memgraph on your Linux machine, the following steps will be the same. You will just run commands on your machine instead of the docker container.

Run Memgraph

First, you need to run the Memgraph instance. You can do this by running the following command:

docker run -p 7687:7687 -p 7444:7444 --name memgraph memgraph/memgraph-mage

Keep in mind that you can use the Docker image with just Memgraph or with Memgraph and MAGE. It won't change the way you develop your C++ query module. If you want to run Memgrpah on a native Linux machine, follow these installation steps

Enter into memgraph container shell

After you have started the Memgraph instance, you need to enter the shell. You can do this by running the following command:

docker exec -u 0 -it memgraph bash

This will start bash inside the container as the root user. Since we need to install dependencies and compilers, we need to be the root user. Of course, if you are running a native Linux machine, you can skip this step.

Install the required dependencies

In order to install the required dependencies and tooling, you need to run the following command:

 apt update -y
 apt install -y git cmake clang clangd vim

In this guide, we will use cmake and clang to compile the query module. By default, Memgraph uses clang as a compiler, so it is recommended that it be used as well. The currently used version of clang is 17.0.2, but this is subject to change and can be inspected in the latest version of toolchain (opens in a new tab) file.

Vim and Git are optional. It is recommended to use Git for version control and create a backup since if you delete your container, you will lose the file developed inside the container. Of course, if you do not want to use the Vim for your day-to-day development, you can use any other text editor or IDE that you feel most comfortable with. The recommended way would be to connect to the container and use your favorite IDE. For example Visual Studio Code has Developing inside the container (opens in a new tab)feature.

The same goes for cmake, you can use any other build system you want or call the compiler directly.

Create a directory for your query module

Now that all of the required dependencies are installed, you can create a directory for your query module. You can do this by running the following command:

mkdir /hello_query_module
cd hello_query_module

Develop a sample query module

Now, you can create a sample query module. You can do this by running the following command:

vim hello_query_module.cpp

This will open the Vim editor. You can copy the following code into the editor:

#include <memgraph/mgp.hpp>
#include <memgraph/mg_exceptions.hpp>
 
const char *kProcedureGet = "get";
const char *kParameterStart = "start";
const char *kParameterSteps = "steps";
const char *kReturnStep = "step";
const char *kReturnNode = "node";
 
void RandomWalk(mgp_list *args, mgp_graph *memgraph_graph, mgp_result *result, mgp_memory *memory) {
  mgp::MemoryDispatcherGuard guard(memory);
 
  const auto arguments = mgp::List(args);
  const auto record_factory = mgp::RecordFactory(result);
 
  try { 
 
    const auto start = arguments[0].ValueNode();
    const auto n_steps = arguments[1].ValueInt();
  
    srand(time(NULL));
  
    auto current_nodes = mgp::List();
    current_nodes.AppendExtend(mgp::Value(start));
  
    std::int64_t step = 0;
    while (step <= n_steps) {
      auto current_node = current_nodes[current_nodes.Size() - 1].ValueNode();
  
      auto neighbours = mgp::List();
      for (const auto relationship : current_node.OutRelationships()) {
        neighbours.AppendExtend(mgp::Value(relationship));
      }
  
      if (neighbours.Size() == 0) {
        break;
      }
  
      const auto next_node = neighbours[rand() % neighbours.Size()].ValueRelationship().To();
  
      current_nodes.AppendExtend(mgp::Value(next_node));
      step++;
    }
  
    for (std::int64_t i = 0; i < current_nodes.Size(); i++) {
      auto record = record_factory.NewRecord();
      record.Insert(kReturnStep, i);
      record.Insert(kReturnNode, current_nodes[i].ValueNode());
    }
  }
   catch (const std::exception &e) {
    record_factory.SetErrorMessage(e.what());
  }
}
 
extern "C" int mgp_init_module(struct mgp_module *module, struct mgp_memory *memory) {
  mgp::MemoryDispatcherGuard guard(memory);
 
  std::int64_t default_steps = 10;
  try {
    mgp::AddProcedure(RandomWalk, 
                      kProcedureGet, 
                      mgp::ProcedureType::Read,
                      {
                        mgp::Parameter(kParameterStart, mgp::Type::Node),
                        mgp::Parameter(kParameterSteps, mgp::Type::Int, default_steps)
                      },
                      {
                        mgp::Return(kReturnStep, mgp::Type::Int),
                        mgp::Return(kReturnNode, mgp::Type::Node)
                      }, 
                      module,
                      memory);
  } catch (const std::exception &e) {
    return 1;
  }
  return 0;
}
 
extern "C" int mgp_shutdown_module() { return 0; }

The code above contains a lot of details that are currently not so important and in the focus. All you need to know is that this is a basic query module that implements a random walk algorithm as a procedure; the rest of the details will be explained later in this guide.

Compile the query module

First, create a CMakeLists.txt file in the same directory as your hello_query_module.cpp file. You can do this by running the following command:

vim CMakeLists.txt

In the file, you can copy the following code:

cmake_minimum_required(VERSION 3.10.0)
project(hello_query_module)
 
# Set the C++ standard to C++20
set(CMAKE_CXX_STANDARD 20)
set(CMAKE_CXX_COMPILER "clang++")
 
# Create a shared library named _hello_query_module
add_library(_hello_query_module SHARED hello_query_module.cpp)

Create a build directory and run the CMake command:

mkdir build
cd build

Run the cmake and make:

cmake DMAKE_BUILD_TYPE=Release ..
make

Move and load the query module

After the compilation is done, you can move the query module to the Memgraph query modules directory. By default, Memgraph will look for .so files in /usr/lib/memgraph/query_modules/.

To move the compiled shared library. You can do this by running the following command:

mv lib_hello_query_module.so /usr/lib/memgraph/query_modules/lib_hello_query_module.so

The other option would be to update your CMakeLists.txt file to copy the file to the query modules directory every time you compile the shared library.

cmake_minimum_required(VERSION 3.10.0)
project(hello_query_module)
 
# Set the C++ standard to C++20
set(CMAKE_CXX_STANDARD 20)
set(CMAKE_CXX_COMPILER "clang++")
 
# Create a shared library named _hello_query_module
add_library(_hello_query_module SHARED hello_query_module.cpp)
 
# Delete the existing file
file(REMOVE /usr/lib/memgraph/query_modules/lib_hello_query_module.so)
 
# Print a message when the file is successfully removed
add_custom_command(TARGET _hello_query_module PRE_BUILD
    COMMAND ${CMAKE_COMMAND} -E echo "File removed successfully"
)
 
# Move the built library to the destination
add_custom_command(TARGET _hello_query_module POST_BUILD
    COMMAND ${CMAKE_COMMAND} -E copy $<TARGET_FILE:_hello_query_module> /usr/lib/memgraph/query_modules/lib_hello_query_module.so
)
 
# Print a message when the file is successfully moved
add_custom_command(TARGET _hello_query_module POST_BUILD
    COMMAND ${CMAKE_COMMAND} -E echo "File moved successfully"

Load the query module in Memgraph

If Memgraph was running previously, you need to reload shared libraries so it can see your new query module.

By running mgconsole, which is included with every Memgraph image, you can load the query module. You can do this by running the following command:

mgconsole --fit_to_screen

Once inside the mgconsole, you can load the query module by running the following command:

CALL mg.load_all();

The previous command will reload all query modules that are present in the /usr/lib/memgraph/query_modules/ directory.

Check that the query module is loaded by running the following command:

CALL mg.procedures() YIELD *;

This will return all of the procedures that are loaded in Memgraph.

Test the query module

Now that we have loaded the query module, we can test it. You can do this by running the following command:

CREATE (a:NodeA)-[:Relationship]->(b:NodeB);

Call the query module:

MATCH (a:NodeA)
WITH a
CALL lib_hello_query_module.get(a, 1) YIELD node, step
RETURN *;

If you experience any issues with loading the query module, it is recommended to check and follow the Memgraph logs and restart Memgraph.

Next steps

Once your basic random_walk Query module is working, you can start developing your own query module. It is recommended that you read this guide to the end, starting with the Query module arhitecture section.

Also, at this point, consider setting up a Git repository and starting versioning your code so you can create a backup via container volumes or by pushing the code to the remote repository.

Query module architecture

The query module allows you to extend the functionality of the Memgraph Cypher language by writing your own custom procedures and functions in C++. Inside the query modules file, you can define multiple procedures and functions for different purposes:

  1. Read or write procedures
  2. Magic functions
  3. Batched procedures

Each of these has a specific purpose and is used in different situations, but they share the same API during the registration of the query module.

The basic parts of every query module are as follows:

#include <mgp.hpp>
 
// (Query procedure & magic function callbacks)
// missing procedure and function definitions
 
extern "C" int mgp_init_module(struct mgp_module *module, struct mgp_memory *memory) {
  // Register your procedures & functions here
}
 
extern "C" int mgp_shutdown_module() {
  // If you need to release any resources at shutdown, do it here
  return 0;
}
 
  • The mgp.hpp file contains all declarations of the C++ API for implementation. As you can see by the function signature, the query module is written in C++, but it is a wrapper around the C API. Query module procedures and functions. It should be located in the /usr/include/memgraph directory, and it is included with every Memgraph docker image or install.
  • To make your query procedures and functions available, they need to be registered in mgp_init_module, the C API registration function.
  • Finally, you may use mgp_shutdown_module to reset any global states or release global resources at shutdown.

Registering procedures and functions as Query modules

All query procedures and functions need to be registered in mgp_init_module. The way the registration is done depends on the type of procedure or function.

Here is an example of registering a read procedure via the AddProcedure function:

extern "C" int mgp_init_module(struct mgp_module *module, struct mgp_memory *memory) {
  try {
    mgp::MemoryDispatcherGuard guard(memory);
 
    AddProcedure(RandomWalk, "get", mgp::ProcedureType::Read,
                 {mgp::Parameter("start", mgp::Type::Node), mgp::Parameter("length", mgp::Type::Int)},
                 {mgp::Return("random_walk", mgp::Type::Path)}, module, memory);
  } catch (const std::exception &e) {
    return 1;
  }
  return 0;
}

Here the procedure’s signature is defined and added as a readable procedure ProcedureType::Read, named get. This means that the procedure will be callable from Cypher queries, and it will be named query_module_name.get in the Cypher namespace. The procedure will take two named parameters (start and length): the starting node and random walk length, and it will yield the computed random walk as a Path (sequence of nodes connected by relationships) in the random_walk result field.

When the procedure is called, its arguments (and the graph context) will be passed to the RandomWalk callback function.

From the perspective of a Cypher query, you would call this procedure as follows:

MATCH (a:NodeA)
WITH a
CALL query_module_name.get(a, 10) YIELD random_walk
RETURN random_walk;

In this case, query_module_name is the name of the query module (compiled .so file) that contains the procedure. Each query module can contain multiple procedures and functions, and they will all be callable from Cypher queries; you just need to add them to the mgp_init_module function.

From the perspective of writing an exact RandomWalk procedure, callbacks share the same signature, as laid out below. Parameter by parameter, the callback receives the procedure arguments args, graph context memgraph_graph, result stream (result), and memory access.

void RandomWalk(mgp_list *args, mgp_graph *memgraph_graph, mgp_result *result, mgp_memory *memory) {
 
  mgp::MemoryDispatcherGuard guard(memory);
  const auto arguments = mgp::List(args);
  const auto record_factory = mgp::RecordFactory(result);
  
  try {
    const auto start_node = args[0].ValueNode();
    const auto length = args[1].ValueInt();
 
    auto random_walk = mgp::Path(start_node);
 
    // (Random walk algorithm logic)
 
    auto record = record_factory.NewRecord();
    record.Insert("random_walk", random_walk);
 
  }  catch (const std::exception &e) {
    record_factory.SetErrorMessage(e.what());
  }
}
🚫

As mgp::memory is a global object, that means all of the procedures and functions in a single shared library will refer to the same mgp::memory object. As a result, calling such callables simultaneously from multiple threads will lead to incorrect memory usage. This also includes the case when the same callable is called from different user sessions. This is a constraint when using mgp::memory so the use of the thread-safe MemoryDispatcherGuard guard(memory) is advised instead.

Registration and the writing procedure are the same. Aldo, write procedures differ from read procedures in their graph context being mutable.

With them, you may create or delete nodes and relationships, modify their properties, and add or remove node labels.

They use the same interface as readable procedures; the only difference is that the appropriate procedure type parameter is passed to AddProcedure. The below code registers and implements a writeable procedure add_x_nodes, which adds a user-specified number of nodes to the graph:

Here is the full example of registering a write procedure:

extern "C" int mgp_init_module(struct mgp_module *module, struct mgp_memory *memory) {
  try {
    mgp::MemoryDispatcherGuard guard(memory);
 
    mgp::AddProcedure(AddXNodes, "add_x_nodes", mgp::ProcedureType::Write, {mgp::Parameter("number", mgp::Type::Int)},
                      {}, module, memory);
  } catch (const std::exception &e) {
    return 1;
  }
  return 0;
}

Notice the mgp::ProcedureType::Write procedure type. Here is the implementation of the callback function AddXNodes:

void AddXNodes(mgp_list *args, mgp_graph *memgraph_graph, mgp_result *result, mgp_memory *memory) {
  try {
    mgp::MemoryDispatcherGuard guard(memory);
    const auto arguments = mgp::List(args);
    auto graph = mgp::Graph(memgraph_graph);
    const auto record_factory = mgp::RecordFactory(result);
 
    for (int i = 0; i < arguments[0].ValueInt(); i++) {
      graph.CreateNode();
    }
  } catch (const std::exception &e) {
    record_factory.SetErrorMessage(e.what());
  }
}

Procedure registration is called during the query module loading; since it binds with Memgraph resources and processes, it is important to handle any errors that might occur during the registration. Hence, the appropriate try-catch blocks are added to avoid any unexpected behavior.

🚫

Any exceptions thrown should never leave the scope of your module or registration. You may have a top-level exception handler that returns the error value and potentially logs any error messages. Exceptions that cross the module boundary may cause unexpected issues!

Registering Magic function

Magic functions are a Memgraph feature that lets the user write and call custom Cipher functions. Unlike procedures, functions are simple operations that can’t modify the graph; they return a single value and can be used in any expression or predicate.

Let’s examine an example function that multiplies the numbers passed to it. The registration is done by AddFunction in the same way as with query procedures, the difference being the absence of a "function type" argument (functions don’t modify the graph).

extern "C" int mgp_init_module(struct mgp_module *module, struct mgp_memory *memory) {
  try {
    mgp::MemoryDispatcherGuard guard(memory);
 
    mgp::AddFunction(Multiply, "multiply",
                     {mgp::Parameter("int", mgp::Type::Int), mgp::Parameter("int", mgp::Type::Int)}, module, memory);
  } catch (const std::exception &e) {
    return 1;
  }
  return 0;
}

There are two key differences in the function signature:

  • the lack of a mgp_graph * parameter (the graph is immutable in functions)
  • different result type (functions return single values, while procedures write result records to the result stream)

The difference in result type means that, to work with function results, we use a different C++ API class: Result. Our function is implemented as follows:

void Multiply(mgp_list *args, mgp_func_context *ctx, mgp_func_result *res, mgp_memory *memory) {
  mgp::MemoryDispatcherGuard guard(memory);
  const auto arguments = mgp::List(args);
  auto result = mgp::Result(res);
 
  auto first = arguments[0].ValueInt();
  auto second = arguments[1].ValueInt();
 
  result.SetValue(first * second);
}

Registering Batched procedures

Batched readable and writeable procedures in C++ are pretty similar to batched procedures in C. The way procedures work is the same as in C API; the only difference is procedure registration.

 
void BatchCSVFile(mgp_list *args, mgp_graph *memgraph_graph, mgp_result *result, mgp_memory *memory) {
  ...
}
 
void InitBatchCsvFile(mgp_list *args, mgp_graph *memgraph_graph, mgp_result *result, mgp_memory *memory) {
  ...
}
 
void CleanupBatchCsvFile(mgp_list *args, mgp_graph *memgraph_graph, mgp_result *result, mgp_memory *memory) {
  ...
}
 
 
extern "C" int mgp_init_module(struct mgp_module *module, struct mgp_memory *memory) {
  try {
    mgp::MemoryDispatcherGuard guard(memory);
 
    AddBatchProcedure(BatchCSVFile, InitBatchCsvFile, CleanupBatchCsvFile,
                 "read_csv", mgp::ProcedureType::Read,
                 {mgp::Parameter("file_name", mgp::Type::String)},
                 {mgp::Return("row", mgp::Type::Map)}, module, memory);
  } catch (const std::exception &e) {
    return 1;
  }
  return 0;
}

Examples of query modules API usage

Logging the Query module

To have insight into the query module execution, outside of debugging, you can use logging. The logging is done by calling the mgp_log function. The function takes two arguments: the first one is the log level, and the second one is the message that is going to be logged. Logs will be visible in the Memgraph log files; keep in mind that the log level visibility is set in the Memgraph configuration file.

 mgp_log(mgp_log_level::MGP_LOG_LEVEL_INFO, "Hello world from Memgraph Query module procedure!");

Here is the basic example of a full query module that logs a message when it is started:

#include <memgraph/mg_exceptions.hpp>
#include <memgraph/mgp.hpp>
 
void HelloWorld(mgp_list *args, mgp_graph *memgraph_graph, mgp_result *result,
                mgp_memory *memory) {
 
  mgp::MemoryDispatcherGuard guard(memory);
 
  const mgp::List arguments = mgp::List(args);
  const auto record_factory = mgp::RecordFactory(result);
 
  try {
    mgp_log(mgp_log_level::MGP_LOG_LEVEL_INFO,
                            "Hello world from Memgraph Query module procedure!");
  } catch (const std::exception &e) {
    record_factory.SetErrorMessage(e.what());
  }
}
 
extern "C" int mgp_init_module(struct mgp_module *module,
                               struct mgp_memory *memory) {
  mgp::MemoryDispatcherGuard guard(memory);
 
  try {
    mgp::AddProcedure(HelloWorld, "hello", mgp::ProcedureType::Read, {},
                      {}, module, memory);
  } catch (const std::exception &e) {
    return 1;
  }
  return 0;
}
 
extern "C" int mgp_shutdown_module() { return 0; }

If you hop to the logs of Memgraph, you should see something like this:

[memgraph_log] [info] Hello world from Memgraph Query module procedure!

Passing arguments to the functions and procedures

During the registration process of the procedure or function, you need to define the arguments that are going to be passed to the callback function.

The arguments are passed as a list of values that can be wrapped in the Value object.

Here is an example of a procedure that takes a simple string argument and logs the message with a string argument:

  void HelloWorld(mgp_list *args, mgp_graph *memgraph_graph, mgp_result *result,
                mgp_memory *memory) {
 
  mgp::MemoryDispatcherGuard guard(memory);
 
  const mgp::List arguments = mgp::List(args);
  const auto record_factory = mgp::RecordFactory(result);
 
  try {
    const auto input_string = arguments[0].ValueString();
 
    std::string output_string = "Hello world from Memgraph Query module procedure!";
    output_string += input_string;
    mgp_log(mgp_log_level::MGP_LOG_LEVEL_INFO, output_string.c_str());
  } catch (const std::exception &e) {
    record_factory.SetErrorMessage(e.what());
  }
}
 
extern "C" int mgp_init_module(struct mgp_module *module,
                               struct mgp_memory *memory) {
  mgp::MemoryDispatcherGuard guard(memory);
 
  try {
    mgp::AddProcedure(HelloWorld, "hello", mgp::ProcedureType::Read,
                      {mgp::Parameter("string_key", mgp::Type::String)}, {},
                      module, memory);
  } catch (const std::exception &e) {
    return 1;
  }
  return 0;
}
 
extern "C" int mgp_shutdown_module() { return 0; }
 

Upon being called, HelloWorld receives the list of arguments (args) passed in the query.

With the C++ API, we next retrieve the argument values from args by putting them into a list so we can use the indexing ([]) operator. In the code above, the retrieving of arguments is done in this line:

  const auto input_string = arguments[0].ValueString();

The arguments are raw values at the time of their fetching from the list, so types are assigned to them with ValueString() for extra operability and expressiveness within the algorithm.

To call the Query module, you just need to pass the string argument to the procedure:

CALL lib_hello_query_module.hello("This message is passed inside the Query module procedure, hello!");

As the result of a Query module call, you should see something like this in the Memgraph logs:

[memgraph_log] [info] Hello world from Memgraph Query module procedure! This message is passed inside the Query module procedure, hello!

Passing a simple string argument is not the only way to pass the arguments to the query module. You can pass any type that is supported by the Memgraph. Here is an example of a Query module call that takes a variety of different arguments:

MATCH (node:Country {name: 'Germany'}), (a)-[rel:FRIENDS_WITH {date_of_start: 2011}]->(b), path=(x)-[:FRIENDS_WITH {date_of_start: 2012}]->(y)
WITH node, rel, path
CALL lib_hello_query_module.hello(
    123, 
    "This message is passed inside the Query module procedure, hello!", 
    true, 
    1.23, 
    date(), 
    node, 
    rel, 
    path, 
    ["string1", "string2", "string3"]
)
RETURN *

In the Cypher query above, first, the MATCH clause is used to get node, edge and path, which were created inside the Memgraph; after that, the data is passed to the query module.

Here is the full example of the query module that takes different types of arguments:

#include <memgraph/mg_exceptions.hpp>
#include <memgraph/mgp.hpp>
#include <string>
 
void HelloWorld(mgp_list *args, mgp_graph *memgraph_graph, mgp_result *result,
                mgp_memory *memory) {
 
  mgp::MemoryDispatcherGuard guard(memory);
 
  const mgp::List arguments = mgp::List(args);
  const auto record_factory = mgp::RecordFactory(result);
  try {
    const auto input_int = arguments[0].ValueInt();
    const auto input_string = arguments[1].ValueString();
    const auto input_bool = arguments[2].ValueBool();
    const auto input_double = arguments[3].ValueDouble();
    const auto input_date = arguments[4].ValueDate();
    const auto input_node = arguments[5].ValueNode();
    const auto input_relationship = arguments[6].ValueRelationship();
    const auto input_path = arguments[7].ValuePath();
    const auto input_list_of_strings = arguments[8].ValueList();
 
    std::string output_string =
        "Hello world from Memgraph Query module procedure!\n";
    output_string += "String: " + std::string(input_string) + "\n";
    output_string += "Int: " + std::to_string(input_int) + "\n";
    output_string += "Bool: " + std::to_string(input_bool) + "\n";
    output_string += "Double: " + std::to_string(input_double) + "\n";
    output_string += "Date: " + input_date.ToString() + "\n";
    output_string += "Node: " + input_node.ToString() + "\n";
    output_string += "Relationship: " + input_relationship.ToString() + "\n";
    output_string += "Path: " + input_path.ToString() + "\n";
    output_string += "List of Strings:  \n";
    for (auto iterator = input_list_of_strings.begin();
         iterator != input_list_of_strings.end(); ++iterator) {
      output_string += std::string((*iterator).ValueString()) + "\n";
    }
 
    mgp_log(mgp_log_level::MGP_LOG_LEVEL_INFO, output_string.c_str());
 
  } catch (const std::exception &e) {
    record_factory.SetErrorMessage(e.what());
  }
}
 
extern "C" int mgp_init_module(struct mgp_module *module,
                               struct mgp_memory *memory) {
  mgp::MemoryDispatcherGuard guard(memory);
 
  try {
    mgp::AddProcedure(
        HelloWorld, "hello", mgp::ProcedureType::Read,
        {mgp::Parameter("int_key", mgp::Type::Int),
         mgp::Parameter("string_key", mgp::Type::String),
         mgp::Parameter("bool_key", mgp::Type::Bool),
         mgp::Parameter("double_key", mgp::Type::Double),
         mgp::Parameter("date_key", mgp::Type::Date),
         mgp::Parameter("node_key", mgp::Type::Node),
         mgp::Parameter("relationship_key", mgp::Type::Relationship),
         mgp::Parameter("path_key", mgp::Type::Path),
         mgp::Parameter("list_of_strings_key",
                        {mgp::Type::List, mgp::Type::String})},
        {}, module, memory);
  } catch (const std::exception &e) {
    return 1;
  }
  return 0;
}
 
extern "C" int mgp_shutdown_module() { return 0; }

Each of the passed properties is mapped to a log message; notice the case of list, node, edge, and path that are not directly convertible to the string and require a bit of processing.

For example, Node is converted to a string by calling the ToString method. A full list of the methods that can be used on the Node is available in the Node C++ API documentation; the same goes for the rest of the supported types.

[2023-12-06 11:39:21.992] [memgraph_log] [info] Hello world from Memgraph Query module procedure!
String: This message is passed inside the Query module procedure, hello!
Int: 123
Bool: 1
Double: 1.230000
Date: 2023-12-6
Node: (id: 40, :Country, properties: {continent: Europe, language: German, name: Germany, population: 83000000})
Relationship: (id: 43, :Person, properties: {name: John})-[type: FRIENDS_WITH, id: 5, properties: {date_of_start: 2011}]->(id: 44, :Person, properties: {name: Harry})
Path: (id: 45, :Person, properties: {name: Anna})-[type: FRIENDS_WITH, id: 6, properties: {date_of_start: 2012}]->(id: 43, :Person, properties: {name: John})
List of Strings:  
string1
string2
string3

Parameters can also contain default values and be optional; for more information, check the C++ API documentation. This also means parameters should be checked before being used inside the procedure. For example, the value of some property could be null, or if the list is empty. When developing your query module, you should always check if the optional arguments are valid.

Returning the results from the query module

The results of the query module are returned by Record object. The object is passed to the callback function as an argument. The result object is used to create the result records that are returned to the Cypher query call. By filing the Record factory with the records, you are returning the results to the Cypher query call.

Here is an example of a query module that returns a single record with a single field:

#include <memgraph/mg_exceptions.hpp>
#include <memgraph/mgp.hpp>
 
void HelloWorld(mgp_list *args, mgp_graph *memgraph_graph, mgp_result *result,
                mgp_memory *memory) {
 
  mgp::MemoryDispatcherGuard guard(memory);
 
  const mgp::List arguments = mgp::List(args);
  const auto record_factory = mgp::RecordFactory(result);
  try {
    const auto input_string = arguments[0].ValueString();
 
    auto record = record_factory.NewRecord();
    record.Insert("message", input_string);
 
  } catch (const std::exception &e) {
    record_factory.SetErrorMessage(e.what());
  }
} 
 
extern "C" int mgp_init_module(struct mgp_module *module,
                               struct mgp_memory *memory) {
  mgp::MemoryDispatcherGuard guard(memory);
 
  try {
    mgp::AddProcedure(HelloWorld, "hello", mgp::ProcedureType::Read,
                      {mgp::Parameter("string_key", mgp::Type::String)}, 
                      {mgp::Return("return_key", mgp::Type::String)},
                      module, memory);
  } catch (const std::exception &e) {
    return 1;
  }
  return 0;
}
 
extern "C" int mgp_shutdown_module() { return 0; }

As you can see in all previous examples, in the the part of the try-catch block, the record_factory is used to return the error message. This is done by calling the SetErrorMessage method.

  const auto record_factory = mgp::RecordFactory(result);
  try {
    //Rest of the code ommited for brevity
  } catch (const std::exception &e) {
    record_factory.SetErrorMessage(e.what());
  }
} 
 

When calling the query module above that returns specific results, you need to use YIELD and RETURN keywords.

In the query module, we return everything that is returned by the query module. Otherwise, you would need to specify the fields that you want to return it to, which would be return_key in this case.

CALL lib_hello_query_module.hello("This message is passed inside the Query module procedure, hello!") YIELD * RETURN *;
+--------------------------------------------------------------------+
| return_key                                                         |
+--------------------------------------------------------------------+
| "This message is passed inside the Query module procedure, hello!" |
+--------------------------------------------------------------------+

The same return logic can be applied to any data type from the Record API.

Reading from the graph

The C Graph object is passed to the callback function as an argument. The C++ API Graph object is used to access the underlying Memgraph storage. The Graph object contains a wide range of available methods that you could use to get the information about the graph, read, modify, and delete it.

Here is the basic example of how to access the graph nodes, and return them as a list of nodes.

#include <memgraph/mg_exceptions.hpp>
#include <memgraph/mgp.hpp>
 
const char *kReturnNodes = "nodes";
void HelloWorld(mgp_list *args, mgp_graph *memgraph_graph, mgp_result *result,
                mgp_memory *memory) {
 
  mgp::MemoryDispatcherGuard guard(memory);
 
  const mgp::RecordFactory record_factory = mgp::RecordFactory(result);
  const mgp::List arguments = mgp::List(args);
  const mgp::Graph graph(memgraph_graph);
  mgp::List nodes{};
  try {
    mgp::Nodes::Iterator it = graph.Nodes().begin();
    while (it != graph.Nodes().end()) {
      mgp::Node node = *it;
      nodes.AppendExtend(mgp::Value(node));
      ++it;
    }
    auto record = record_factory.NewRecord();
    record.Insert(kReturnNodes, nodes);
  } catch (const std::exception &e) {
    mgp_error err = mgp_log(mgp_log_level::MGP_LOG_LEVEL_ERROR,
                            "Issue with running the procedure!");
    record_factory.SetErrorMessage(e.what());
  }
}
 
extern "C" int mgp_init_module(struct mgp_module *module,
                               struct mgp_memory *memory) {
  mgp::MemoryDispatcherGuard guard(memory);
 
  try {
    mgp::AddProcedure(
        HelloWorld, "hello", mgp::ProcedureType::Read, {},
        {mgp::Return(kReturnNodes, {mgp::Type::List, mgp::Type::Node})}, module,
        memory);
  } catch (const std::exception &e) {
    return 1;
  }
  return 0;
}
 
extern "C" int mgp_shutdown_module() { return 0; }
 

Notice that the C++ Graph object is created by passing the memgraph_graph object from C API to the constructor.

In place of working with the raw mgp_ type arguments, use the C++ API classes that provide familiar standard library-like interfaces and do away with needing manual memory management.

After that, the Nodes iterator is used to iterate over all of the nodes in the graph.

Here is the call of the Query module from above:

CALL lib_hello_query_module.hello() YIELD * RETURN *;
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| nodes                                                                                                                                                                                              |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| [(:Country {continent: "Europe", language: "German", name: "Germany", population: 83000000}), (:Country {continent: "Europe", language: "French", name: "France", population: 67000000}), (:Cou... |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Write to the graph

If you are using Graph object to modify or delete some parts of the graph, you should register your Query module procedure by passing the procedure type mgp::ProcedureType::Write. This will make your procedure graph object mutable, and you will be able to modify the graph.

Here is an example of a query module that creates a node and a relationship:

#include <memgraph/mg_exceptions.hpp>
#include <memgraph/mgp.hpp>
 
void HelloWorld(mgp_list *args, mgp_graph *memgraph_graph, mgp_result *result,
                mgp_memory *memory) {
 
  mgp::MemoryDispatcherGuard guard(memory);
 
  const mgp::RecordFactory record_factory = mgp::RecordFactory(result);
  const mgp::List arguments = mgp::List(args);
  mgp::Graph graph(memgraph_graph);
  try {
    auto nodeA = graph.CreateNode();
    nodeA.SetProperty("name", mgp::Value("A"));
    auto nodeB = graph.CreateNode();
    nodeB.SetProperty("name", mgp::Value("B"));
    auto relationship = graph.CreateRelationship(nodeA, nodeB, "FRIENDS");
  } catch (const std::exception &e) {
    mgp_error err = mgp_log(mgp_log_level::MGP_LOG_LEVEL_ERROR,
                            "Issue with running the procedure!");
    record_factory.SetErrorMessage(e.what());
  }
}
 
extern "C" int mgp_init_module(struct mgp_module *module,
                               struct mgp_memory *memory) {
  mgp::MemoryDispatcherGuard guard(memory);
 
  try {
    mgp::AddProcedure(HelloWorld, "hello", mgp::ProcedureType::Write, {}, {},
                      module, memory);
  } catch (const std::exception &e) {
    return 1;
  }
  return 0;
}
 
extern "C" int mgp_shutdown_module() { return 0; }

Notice the example above, which utilizes the CreateNode and CreateRelationship methods to create the nodes and relationships. This will create two nodes where each node has a property name and a relationship between them.

Since the Query module does not return the value to the Cypher query, you can call it like this:

CALL lib_hello_query_module.hello(); 

You can then check the number of nodes and relationships in the graph or just get the graph node to confirm that the nodes and relationships were created.

MATCH (a {name:"A"}), (b {name:"B"}), p=(a)-[*]->(b) RETURN *
       -> ;
+-----------------------------------------+-----------------------------------------+-----------------------------------------+
| a                                       | b                                       | p                                       |
+-----------------------------------------+-----------------------------------------+-----------------------------------------+
| ({name: "A"})                           | ({name: "B"})                           | ({name: "A"})-[:FRIENDS]->({name: "B"}) |
+-----------------------------------------+-----------------------------------------+-----------------------------------------+

Terminate procedure execution

Since every query module is run as one transaction in Memgraph, just as the execution of a Cypher query can be terminated with TERMINATE TRANSACTIONS "id"; query, the execution of the procedure can, as well if it takes too long to yield a response or gets stuck in an infinite loop due to unpredicted input data.

Transaction ID is visible upon calling the SHOW TRANSACTIONS query.

In order to be able to terminate the procedure, it has to contain a function graph.CheckMustAbort(); which precedes crucial parts of the code, such as while and until loops or similar points where the procedure might become costly.

Consider the following example:

#include <cstdint>
#include <unordered_map>
#include <unordered_set>
#include <algorithm>
#include <mgp.hpp>
#include <mg_exceptions.hpp>
 
// Methods
constexpr char const *get = "get";
// Return object names
char const *return_field = "return";
 
 
void Test(mgp_list *args, mgp_graph *memgraph_graph, mgp_result *result, mgp_memory *memory) {
    mgp::MemoryDispatcherGuard guard(memory);
    const auto record_factory = mgp::RecordFactory(result);
    auto graph = mgp::Graph(memgraph_graph);
    int64_t id_ = 1;
    try {
        while (true) {
            graph.CheckMustAbort();
            ++id_;
        }
    } catch (const mgp::MustAbortException &e) {
        std::cout << e.what() << std::endl;
        auto new_record = record_factory.NewRecord();
        new_record.Insert(return_field, id_);
    }
}
 
 
extern "C" int mgp_init_module(struct mgp_module *module, struct mgp_memory *memory) {
    try {
        mgp::MemoryDispatcherGuard guard(memory);
        mgp::AddProcedure(Test, get, mgp::ProcedureType::Read, {}, {mgp::Return(return_field, mgp::Type::Int)}, module, memory);
    } catch(const std::exception &e) {
        return 1;
    }
    return 0;
}
 
extern "C" int mgp_shutdown_module() { return 0; }

Notice that graph.CheckMustAbort(); is called before the while loop. This prevents the procedure from getting stuck in an infinite loop or taking too long to yield a response.

Developing a query module with Mage

In this tutorial, we will learn how to develop a query module in C++ with Mage on the example of the random walk algorithm.

Prerequisites

There are three options for installing and working with Memgraph MAGE:

Position yourself in the MAGE repository you cloned earlier. Once you are there, enter the cpp subdirectory and create a new directory called random_walk_module with the random_walk_module.cpp file inside it.

cpp
└── random_walk_module
    └── random_walk_module.cpp

To make sure the module is linked with the rest of MAGE code, we need to add a CMakeLists.txt script in the new directory and register our module in the cpp/CMakelists.txt script as well. Refer to the existing scripts in MAGE’s query modules (opens in a new tab).

Our random_walk module contains a single procedure, get which implements the algorithm. The procedure takes two input parameters: the starting node and the number of steps (10 by default), and it returns the generated random walk in the form of a list of step | node entries, one for each step. All in all, we can define its signature as get(start: Node, steps: int = 10) -> [step: int | node: Node].

Let’s take a look at the structure of our query module.

#include <mg_utils.hpp>
 
void RandomWalk(mgp_list *args, mgp_graph *memgraph_graph,
                     mgp_result *result, mgp_memory *memory);
 
extern "C" int mgp_init_module(struct mgp_module *module,
                               struct mgp_memory *memory);
 
extern "C" int mgp_shutdown_module() { return 0; }
 

If you hop to the part about query module arhitecture, you will notice that the structure of the query module is the same as the one in the example above.

Main algorithm

The main implementation of the RandomWalk algorithm is implemented in the code snippet below.

#include <memgraph/mgp.hpp>
#include <memgraph/mg_exceptions.hpp>
 
const char *kProcedureGet = "get";
const char *kParameterStart = "start";
const char *kParameterSteps = "steps";
const char *kReturnStep = "step";
const char *kReturnNode = "node";
 
void RandomWalk(mgp_list *args, mgp_graph *memgraph_graph, mgp_result *result, mgp_memory *memory) {
  mgp::MemoryDispatcherGuard guard(memory);
 
  const auto arguments = mgp::List(args);
  const auto record_factory = mgp::RecordFactory(result);
 
  try { 
 
    const auto start = arguments[0].ValueNode();
    const auto n_steps = arguments[1].ValueInt();
  
    srand(time(NULL));
  
    auto current_nodes = mgp::List();
    current_nodes.AppendExtend(mgp::Value(start));
  
    std::int64_t step = 0;
    while (step <= n_steps) {
      auto current_node = current_nodes[current_nodes.Size() - 1].ValueNode();
  
      auto neighbours = mgp::List();
      for (const auto relationship : current_node.OutRelationships()) {
        neighbours.AppendExtend(mgp::Value(relationship));
      }
  
      if (neighbours.Size() == 0) {
        break;
      }
  
      const auto next_node = neighbours[rand() % neighbours.Size()].ValueRelationship().To();
  
      current_nodes.AppendExtend(mgp::Value(next_node));
      step++;
    }
  
    for (std::int64_t i = 0; i < current_nodes.Size(); i++) {
      auto record = record_factory.NewRecord();
      record.Insert(kReturnStep, i);
      record.Insert(kReturnNode, current_nodes[i].ValueNode());
    }
  }
   catch (const std::exception &e) {
    record_factory.SetErrorMessage(e.what());
  }
}
 
extern "C" int mgp_init_module(struct mgp_module *module, struct mgp_memory *memory) {
  mgp::MemoryDispatcherGuard guard(memory);
 
  std::int64_t default_steps = 10;
  try {
    mgp::AddProcedure(RandomWalk, 
                      kProcedureGet, 
                      mgp::ProcedureType::Read,
                      {
                        mgp::Parameter(kParameterStart, mgp::Type::Node),
                        mgp::Parameter(kParameterSteps, mgp::Type::Int, default_steps)
                      },
                      {
                        mgp::Return(kReturnStep, mgp::Type::Int),
                        mgp::Return(kReturnNode, mgp::Type::Node)
                      }, 
                      module,
                      memory);
  } catch (const std::exception &e) {
    return 1;
  }
  return 0;
}
 
extern "C" int mgp_shutdown_module() { return 0; }

Compiling and running the query module

To build the query module, we need to run the following command:

python3 setup build -p /usr/lib/memgraph/query_modules/ --lang cpp

The query should be compiled and placed in the /usr/lib/memgraph/query_modules/ directory. Make sure you have reloded the query modules.

CALL mg.load_all();

Now you can call the query module:

MATCH (start:Node {id: 0})
  CALL random_walk.get(start, 2) YIELD step, node
  RETURN step, node;

Testing

Test decoupled parts of your code that don't depend on Memgraph like you would in any other setting. End-to-end (e2e) tests, on the other hand, depend on internal Memgraph data structures, like nodes and edges. After running Memgraph, we need to prepare the testing environment on the host machine. Position yourself in the mage directory you cloned from GitHub. The expected folder structure for each module is the following:

mage
└── e2e
    └── random_walk_test
        └── test_base
            ├── input.cyp
            └── test.yml

input.cyp represents a Cypher script for entering the data into the database. To simplify this tutorial, we'll leave the database empty. test.yml specifies which test query should be run by the database and what the result or exception should be. Create the files following the aforementioned directory structure.

input.cyp

MATCH (n) DETACH DELETE n;

test.yml

query: >
  MATCH (start:Node {id: 0})
    CALL random_walk.get(start, 2) YIELD path
    RETURN path
 
output: []

Lastly, run the e2e tests with python:

python test_e2e