Data streamsTransformation modules

Transformation modules

In order to connect Memgraph to a data stream, it needs to know how to transform the incoming messages in order to consume them correctly. This is done with a transformation module.

To create a transformation module, you need to:

  1. Create a Python or a shared library file (module).
  2. Save the file into the Memgraph’s query_modules or internal_modules directory (default: /usr/lib/memgraph/query_modules and /var/lib/memgraph/internal_modules/).
  3. Load the file into Memgraph either on startup (automatically) or by running a CALL mg.load_all(); query.

If you are using Memgraph Lab you can create transformation module within the application.

Creating a transformation module

Memgraph supports user-defined transformations procedures written in Python and C that act on data received from a streaming engine. These transformation procedures are grouped into a module called Transformation module, which is then loaded into Memgraph on startup or later on. A transformation module consists of a transformation, a query procedure, or both.

Currently, we support transformations for Kafka, Pulsar and Redpanda streams.

The available API references are:

For examples of transformation modules check out the tutorial on implementing a Python transformation module, Python transformation examples of different format messages or an example of a transformation module written in C.

Loading modules

Modules can be loaded on startup or when the instance is already running.

Loading on startup

Memgraph attempts to load the modules from all *.so and *.py files it finds in the default (/usr/lib/memgraph/query_modules and /var/lib/memgraph/internal_modules/) directories. The *.so modules are written using the C API and the *.py modules are written using the Python API. Each file corresponds to one module. The names of these files will be mapped to module names. For example, hello.so will be mapped to the hello module and a py_hello.py script will be mapped to the py_hello module.

If you want to change the directory in which Memgraph searches for transformation modules, change or extend the --query-modules-directory flag in the main configuration file (/etc/memgraph/memgraph.conf) or supply it as a command-line parameter (e.g., when using Docker).

💡

Pass configuration flags as arguments after the image name, such as ... memgraph/memgraph-mage --log-level=TRACE --query-modules-directory=path/path.

Transfer transformation module into a Docker container:

If you are using Docker to run Memgraph, you will need to copy the transformation module file from your local directory into the Docker container where Memgraph can access it.

1. Open a new terminal and find the CONTAINER ID of the Memgraph Docker container:

docker ps

2. Copy a file from your current directory to the container with the command:

docker cp ./file_name.py <CONTAINER ID>:/usr/lib/memgraph/query_modules/file_name.py

The file is now inside your Docker container.

Loading while the instance is already running

To load a specific transformation module from a *.so and *.py files that were added to the default directories (/usr/lib/memgraph/query_modules and /var/lib/memgraph/internal_modules/) while the instance was already running, use:

CALL mg.load(module_name);

To load all transformation modules, use:

CALL mg.load_all();

Creating transformation modules within Memgraph Lab

If you are using Memgraph Lab to connect to the database instance, you can create the transformation module within the application:

  1. Go to Query Modules and click on + New Module.
  2. Give the transformation module a name and Create it.
  3. Write the transformation procedures and click Save & Close.

You will see the signature and overview of the transformation procedure that you can now use while creating a new stream.

Utility procedures for transformations

Query procedures that allow you to gain more insight into modules and transformations are written under our utility mg query module. For transformations, this module offers:

ProcedureDescription
mg.transformations() :: (name :: STRING)Lists all transformation procedures.
mg.load(module_name :: STRING) :: ()Loads or reloads the given module.
mg.load_all() :: ()Loads or reloads all modules.

For example, you can invoke mg.transformations() from mgconsole or Memgraph Lab with the following command:

CALL mg.transformations() YIELD *;

This will yield the following result:

+-------------------------------------------+-------------------------------------------------------+-------------+
| name                                      | path                                                  | is_editable |
+-------------------------------------------+-------------------------------------------------------+-------------+
| "batch.transform"                         | "/usr/lib/memgraph/query_modules/batch.py"            | true        |
+-------------------------------------------+-------------------------------------------------------+-------------+

To load a module (named e.g. hello) that wasn’t loaded on startup (probably because it was added to Memgraph’s directory once Memgraph was already running), you can invoke:

CALL mg.load("hello");

If you wish to reload an existing module, say the hello module above, use the same procedure:

CALL mg.load("hello");

To reload all existing modules and load any newly added ones, use:

CALL mg.load_all();