Build C++ Graph Analytics Without Worrying About Memory

By Ante Pusic

7 min readOctober 3, 2022

Extracting value from graph data seems like so much more hassle than it needs to be. Datasets are often useful only when big enough, and at that point you need to scale up your analytics pipeline. More often than not, graph analytics products need changes to make them compatible with what you’ve got. Sometimes, even these changes don’t do the trick, and you need to develop new graph algorithms or ML.

All this is possible with minimal overhead using Memgraph and its MAGE graph analytics library, even if you haven’t used graph databases before. Grab a coffee, and keep reading to find out how the new MAGE C++ API gives you the speed of C++ and the smoothness of higher-level languages for handling graph data.

Extend queries with custom graph methods

With MAGE, you can build user-defined methods as query procedures and functions, and run them with Cypher queries. Procedures are full-featured operations that can modify the graph and pass the changes for further processing. They return a stream of results which can be complex multi-field values. For simpler use cases, use functions: they take the graph as immutable and return a single value.

Graph users often avoid storing data within graph DBs due to performance constraints: accessing the stored graph can cause bottlenecks, especially in dynamic analytics. We have worked around this issue in two ways:

Memgraph is an in-memory database, eliminating the need for expensive read/write operations.
The new C++ API directly interfaces with the stored graph and does not create its own graph representation.

In short, if the data is in the DB, it’s ready to use.

However, high-performance code is worth nothing if you don’t know how to write it. Thus, our team has made coding graph methods easier. The following code snippet implements a procedure that adds a user-defined number of nodes to the graph:

// Universal procedure signature ↓
void AddXNodes(mgp_list *args, mgp_graph *memgraph_graph, mgp_result *result, mgp_memory *memory) {
  mgp::memory = memory;	// Grant memory access (for single-thread code)
  const auto arguments = mgp::List(args);	// Get the arguments
  const auto n_new_nodes = arguments[0].ValueInt();
 
  auto graph = mgp::Graph(memgraph_graph);	// Get the graph
  for (int i = 0; i < n_new_nodes; i++) {
    graph.CreateNode();
  }
}
 
// Initialization function ↓
extern "C" int mgp_init_module(struct mgp_module *module, struct mgp_memory *memory) {
  try {
    mgp::memory = memory;		// Grant memory access (for single-thread code)
 
    mgp::AddProcedure(AddXNodes, "add_x_nodes", mgp::ProcedureType::Write,
    {mgp::Parameter("number", mgp::Type::Int)}, {}, module, memory);
    // Add a procedure with the `void add_x_nodes(int number)` signature
    // that writes data to the graph
  } catch (const std::exception &e) {
    return 1;
  }
  return 0;
}

To familiarize yourself with the new API, learn from example with the reference guide and consult the documentation.

Intuitive interface

The new C++ API is designed for humans, not robots. We followed best practices toreduce unnecessary cognitive load: the components have simple and consistent interfaces, common use cases require fewer user actions, and the API comes with developer guides and extensive documentation.

Memory management is probably the main pain point in C++ development, its usefulness notwithstanding. The new C++ API automatically manages the memory used by graph data, saving you time that would otherwise go to debugging and writing repetitive code.

Under the hood, this API builds upon the existing C API. The ability to use C++ features ties in directly with our goal of creating a simple, consistent interface. During design, we had our minds set on avoiding the cognitive friction that comes with context-switching between C++ standard library classes and ours. Compare the code snippets below. They both do the same thing - iterate over graph nodes - but the latter is consistent with standard C++ iterables and manages memory on its own.

// C API
auto *vertices_it = mgp::graph_iter_vertices(memgraph_graph, memory);
 
for (auto *vertex = mgp::vertices_iterator_get(vertices_it); vertex;
     vertex = mgp::vertices_iterator_next(vertices_it)) {
  // (...)
}
mgp::vertices_iterator_destroy(vertices_it);

// C++ API
for (const auto node : graph.Nodes()) {
  // (...)
}

Going further

In its initial release, the new C++ API has made it easier to develop query modules with user-defined procedures and functions. To this end, we have cut down boilerplate code and made data accessible through a simple, consistent and high-performance graph API. Besides query modules, Memgraph supports transformation modules, which enable the database to ingest data from streams. For the next release, we plan to add support for transformation modules, as well as restricting graph operations to subgraphs or extending them to work with multiple graph iterations simultaneously. We will maintain the focus on simplicity and consistency when adding new features and incrementally improve what we’ve already got.

What’s next?

Thanks for keeping with us! We’ve worked hard to make graph analytics fast, scalable, and easy to develop and use, come try MAGE out. You can build graph analytics tailored to your needs in C++ or Python, or even contribute to MAGE and see your work now available to everybody. Tell us what you’re doing on our Discord server, we’ll be happy to talk about it with you!