How to Write Custom Cypher Procedures with NetworkX and Memgraph

Table of Contents
A Macbook Pro Laptop

Introduction

NetworkX is a package for the creation, manipulation, and study of the dynamics, functions and structures of networks. It allows us to use complex graph algorithms to solve network-related problems.

Even though NetworkX is a very powerful and versatile package, it is somewhat limited in speed and efficiency because of its Python implementation and lack of quality storage. While this is not a problem with smaller data sets, in large networks this can be quite cumbersome. Using Memgraph, an in-memory graph database, as the storage solution provides additional benefits and functionalities to NetworkX.

In this post, we will focus on how to implement custom Cypher procedures using query modules and NetworkX. We will also discuss the open-source project MAGE where you can find already implemented modules and contribute your own.

Prerequisites

The prerequisites for this tutorial are:

  • Memgraph DB: a native fully distributed in-memory graph database built to handle real-time use-cases at enterprise scale. Follow the Docker Installation instructions on the Quick Start page to get started.
  • Memgraph Lab: an integrated development environment used to import data, develop, debug and profile database queries and visualize query results.

Understanding NetworkX Basics

To fully utilize the power of graphs, you first need to get a basic understanding of the underlying concepts in graph theory and their corresponding terminology when working with the NetworkX package.

The four main types of graphs that you will see throughout this and other tutorials are:

  • Graph - An undirected graph with self-loops.
  • DiGraph - A directed graph with self-loops.
  • MultiGraph - An undirected graph with self-loops and parallel edges.
  • MultiDiGraph - A directed graph with self-loops and parallel edges.

A self-loop (also called a loop or a buckle) is an edge that connects a vertex to itself while parallel edges (also called multiple edges or a multi-edge) are two or more edges that are incident to the same two nodes.

Memgraph offers a comprehensive set of thin wrappers around most of the algorithms and objects present in the NetworkX package. The wrapper functions have the capability to create a NetworkX compatible graph-like object that can stream the native database graph directly saving on memory usage significantly. For example, the NetworkX class MultiDiGraph has a MemgraphMultiDiGraph wrapper while the DiGraph class uses the MemgraphDiGraph wrapper class.

Setting up NetworkX in Memgraph

Memgraph supports extending the query language with user-written procedures. These procedures are grouped into modules, which can then be loaded on startup or later on. We are going to create such a procedure to work with the NetworkX package.

To get started, we need to create and mount a volume to access the query_modules directory. This directory contains all of the built-in query modules and it’s where you save new custom query modules. Create an empty directory ~modules on your host machine and execute the following command:

docker volume create --driver local --opt type=none  --opt device=~modules --opt o=bind modules

Now, you can start Memgraph and mount the created volume:

docker run -it --rm -v modules:/usr/lib/memgraph/query_modules -p 7687:7687 memgraph

Everything from the directory /usr/lib/memgraph/query_modules will be visible/editable in your mounted modules volume and vice versa. If you take a look at the /query_modules directory, you will find built-in query modules that come prepackaged with Memgraph. To learn more about them visit the How to Use Query Modules Provided by Memgraph? guide.

Writing a Custom Procedure for Cypher

First, we need a basic data set to start with. Let’s create a very simple graph with two cities that are connected by a road.

The cities are nodes while the road is an edge connecting these nodes. Open Memgraph Lab and in the Query tab run the following query:

CREATE (c1:City { name: 'Boston' }), (c2:City { name: 'New York' })
CREATE (c1)-[r:ROAD_TO { distance: 100}]->(c2)
RETURN c1, c2, r;
Query result in Memgraph Lab

This has created two nodes with the label City and a name property. The edge between them is of type ROAD_TO and has the property distance which represents the distance in kilometers.

We want a slightly more complex network to use with NetworkX so let’s extend it with the following queries:

MATCH (c1:City { name: 'Boston' })
CREATE (c2:City { name: 'Philadelphia' }), (c3:City { name: 'Seatle' }), (c4:City { name: 'Los Angeles' }), (c5:City { name: 'San Francisco' })
CREATE (c1)-[:ROAD_TO { distance: 110}]->(c2)
CREATE (c3)-[:ROAD_TO { distance: 150}]->(c2)
CREATE (c4)-[:ROAD_TO { distance: 200}]->(c5)
Query result in Memgraph Lab

This is a simple directed graph and while using NetworkX it would be stored as a DiGraph. If we were to duplicate all the edges and reverse the direction of the duplicates, we would end up with a MultiDiGraph because of the parallel edges. This could be accomplished by the following query:

MATCH (c0:City { name: 'New York' }), (c1:City { name: 'Boston' }), 
      (c2:City { name: 'Philadelphia' }), (c3:City { name: 'Seatle' }), 
      (c4:City { name: 'Los Angeles' }), (c5:City { name: 'San Francisco' })
CREATE (c1)<-[:ROAD_TO { distance: 100}]-(c0)
CREATE (c1)<-[:ROAD_TO { distance: 110}]-(c2)
CREATE (c3)<-[:ROAD_TO { distance: 150}]-(c2)
CREATE (c4)<-[:ROAD_TO { distance: 200}]-(c5)
Query result in Memgraph Lab

Now, let’s play with our graph in NetworkX. Create a file called hello_world.py in your mounted volume. If you followed the previous instructions on how to start Memgraph it should be in your Home directory and named modules. Add the following content to your newly created module:

import mgp
import networkx as nx
from mgp_networkx import (MemgraphMultiDiGraph, MemgraphDiGraph,
                          MemgraphMultiGraph, MemgraphGraph,
                          PropertiesDictionary)

@mgp.read_proc
def procedure(ctx: mgp.ProcCtx) -> mgp.Record(nodes=mgp.List[mgp.Vertex]):

    g = MemgraphMultiDiGraph(ctx=ctx)

    return mgp.Record(nodes=list(g.nodes()))

The first three lines are basic Python imports which all of our query modules that utilize NetworkX need to have.

Because we are working with data stored in Memgraph, a wrapper is used to reference the stored graph. The line g = MemgraphMultiDiGraph(ctx=ctx) implies that we are creating this reference from the current context ctx (automatically passed in all procedure calls) and it represents a nx.MultiDiGraph class.

The procedure returns a list of nodes from the graph. If you need a more detailed understanding of the custom objects and wrappers take a look at the query modules Reference Guide.

To call our procedure, first, it needs to be loaded. Memgraph automatically loads all procedures on start but you can also execute the following query to reload them if you added new or change existing ones:

CALL mg.load_all();

Next, we call our procedure and return the results:

CALL hello_world.procedure() YIELD nodes RETURN nodes;

What we get is a list of nodes in the graph.

Let’s change the procedure to also return the number of nodes and the number of edges in the graph.

@mgp.read_proc
def procedure(ctx: mgp.ProcCtx) -> mgp.Record(nodes=mgp.List[mgp.Vertex],
                                              number_of_nodes=int,
                                              number_of_edges=int):

    g = MemgraphMultiDiGraph(ctx=ctx)

    list_of_nodes = list(g.nodes())
    num_of_nodes = g.number_of_nodes()
    num_of_edges = g.number_of_edges()

    return mgp.Record(nodes=list_of_nodes,
                      number_of_nodes=num_of_nodes,
                      number_of_edges=num_of_edges)

Before you call the procedure, don’t forget to reload the module.

CALL mg.load_all();
CALL hello_world.procedure() YIELD * RETURN *;

With YIELD * we make sure that the procedure returns all of the results and by using RETURN * we print them all out.

Editing Graphs with NetworkX

It’s important to note that you can’t do any database write operations directly from a query module. Otherwise, it wouldn’t be possible to guarantee the integrity of the database. You can however save the whole graph or just a subgraph as a completely new object and then perform edits before returning the results. Let’s say we wanted to find out what our custom procedure would return if we removed the cities Boston and New York from our network. Create a new procedure in the same module and name it remove_node_procedure():

@mgp.read_proc
def remove_node_procedure(ctx: mgp.ProcCtx) -> mgp.Record(nodes=mgp.List[mgp.Vertex],
                                              number_of_nodes=int):

    g = MemgraphMultiDiGraph(ctx=ctx)
    g_copy = nx.MultiDiGraph(g)

    for node in list(g_copy.nodes):
        if node.properties.get('name') == 'Boston':
            g_copy.remove_node(node)

    list_of_nodes = list(g_copy.nodes)
    num_of_nodes = g_copy.number_of_nodes()

    return mgp.Record(nodes=list_of_nodes,
                      number_of_nodes=num_of_nodes)

As you can see, the returned list does not contain the city Boston.

Query result in Memgraph Lab

Passing Arguments to Custom Procedures

Let’s make a new procedure that uses the nx.is_simple_path() algorithm. A simple path in a graph is a nonempty sequence of nodes in which no node appears more than once in the sequence, and each adjacent pair of nodes in the sequence is adjacent in the graph. The simple_path_procedure() could look like this:

@mgp.read_proc
def simple_path_procedure(ctx: mgp.ProcCtx, nodes: mgp.List[mgp.Vertex]
                          ) -> mgp.Record(is_simple_path=bool):

    g = MemgraphMultiDiGraph(ctx=ctx)

    simple_path = nx.is_simple_path(g, nodes)

    return mgp.Record(is_simple_path=simple_path)

You can call the procedure by executing the following query from Memgraph Lab:

MATCH p=((c1:City {name:'Boston'})-[:ROAD_TO]->(c2:City {name:'New York'}))
CALL hello_world.simple_path_procedure(nodes(p)) YIELD is_simple_path RETURN is_simple_path;

With this query, we have created a list of nodes from the path Boston–New York and the simple_path_procedure() should return True because it is a simple path.

MAGE - Introducing Memgraph Advanced Graph Extensions

MAGE is an open-source repository provided by Memgraph that contains implementations of query modules, written by our engineers or contributed by the developer community.

In it, you can find implementations of various algorithms in multiple programming languages, all runnable inside Memgraph. Some of the available algorithms and problems that are tackled with MAGE include PageRank, Weakly Connected Components (Union Find), Jaccard similarity coefficient, Travelling Salesman Problem… and many more.

Don’t hesitate to contribute any custom modules you come up with if you think they would be of help to fellow developers. We would be more than happy to take a look and add them to the repository.

Conclusion and Next Steps

While there are some limited functionalities, Memgraph offers a straightforward way of using the NetworkX package on graph-like objects that can stream the native database graph directly. This kind of implementation improves memory usage significantly and adds in-memory efficiency to NetworkX algorithms.

For a more in-depth explanation of how to create your own custom Cypher procedures take a look at our How to Implement Query Modules? guide or go through the already implemented algorithms in MAGE.

Explore this tutorial with Memgraph Playground

We preloaded the dataset for you so you can get started in second! No sign up or download required.

Launch playground
Back to Memgraph blog

Read Next