Run algorithms

Run algorithms

Built-in deep path traversal algorithms (BFS, DFS, Weighted shortest path, All shortest paths) are runned by using their designated clauses.

Running algorithms and procedures from query modules available in the MAGE library are runned using the CALL clause.

Run procedures from MAGE library

The MAGE library containes query modules, each with their own procedures.

To call the procedures within queries use the following Cypher syntax:

CALL module.procedure([optional parameter], arg1, "string_argument", ...) YIELD res1, res2, ...;

Every procedure has a first optional parameter and the rest of them depend on the procedure you are trying to call. The optional parameter must be result of the aggregation function project(). If such a parameter is provided, all operations will be executed on a projected graph, i.e., a subgraph. Otherwise, you will work on the whole graph stored inside Memgraph.

Each procedure returns zero or more records, where each record contains named fields. The YIELD clause is used to select fields you are interested in or all of them (*). If you are not interested in any fields, omit the YIELD clause. The procedure will still run, but the record fields will not be stored in variables. If you are trying to YIELD fields that are not a part of the produced record, the query will result in an error.

Procedures can be standalone, or a part of a larger query when we want the procedure to work on data the query is producing.

For example:

MATCH (node) CALL module.procedure(node) YIELD result RETURN *;

When the CALL clause is a part of a larger query, results from the query are returned using the RETURN clause. If the CALL clause is followed by a clause that only updates the data and doesn't read it, RETURN is unnecessary. It is the Cypher convention that read-only queries need to end with a RETURN, while queries that update something don't need to RETURN anything.

Also, if the procedure itself writes into the database, all the rest of the clauses in the query can only read from the database, and the CALL clause can only be followed by the YIELD clause and/or RETURN clause.

If a procedure returns a record with the same field name as some variable we already have in the query, that field name can be aliased with some other name using the AS sub-clause:

MATCH (result) CALL module.procedure(42) YIELD result AS procedure_result RETURN *;

Managing query modules from Memgraph Lab

You can inspect query modules in Memgraph Lab (v2.0 and newer). Just navigate to Query Modules.

There you can see all the loaded query modules, delete them, or see procedures and transformations they define by clicking on the arrow icon.

By expanding procedures you can receive information about the procedure's signature, input and output variables and their data type, as well as the CALL query you can run directly from the Query Modules view.

Run procedures on subgraph

When executing any MAGE procedure, the algorithm is executed on the whole network.

This can be impractical when:

  • the graph is heterogeneous, and you want to run the module only on specific labels,
  • the graph is too large, and you only want to use the analytics to update only a portion of it,
  • the network contains multiple diverse data models and graphs, and running analytics on mixed graphs at once might yield unexpected results.

That is why Memgraph enables module execution on subgraphs and graph projections. The insights yielded by graph algorithms can then affect only the necessary nodes in the graph, making the data more consistent and up to its specifications.

Built-in deep path traversal algorithms cannot be run on subgraphs.

Available graph projections

Graph projection function in Memgraph is called project(), and it is used in the following way:

MATCH p=(n)-[r]->(m)
WITH project(p) AS subgraph
RETURN subgraph;

The path is specified first which denotes source and target nodes as well as relationships connecting them. The function project then constructs a subgraph out of all the generated paths.

Because the matched pattern actually includes all the nodes and the relationships in the graph, the result of this query is a projection of the whole graph. To isolate a certain part of the graph, constraints need to be added to either labels, edge types, or properties, like in the query below:

MATCH p=(n:SpecificLabel)-[r:REL_TYPE]->(m:SpecificLabel)
WITH project(p) AS subgraph
RETURN subgraph;

The query above will return a subgraph of SpecificLabel nodes connected with the relationships of type REL_TYPE.

Calling query modules on graph projections

If you want to run query modules on subgraphs, specify the projected graph as the first argument of the query module.

CALL module.procedure([optional graph parameter], argument1, argument2, ...) YIELD * RETURN *;

If the optional graph projection parameter is not included as the first argument, the query module will be executed on the whole graph.

Example of creating a projection and executing a procedure on a subgraph:

MATCH p=(n:SpecificLabel)-[r:REL_TYPE]->(m:SpecificLabel)
WITH project(p) AS subgraph
CALL module.procedure(subgraph, argument1, argument2, ...) YIELD * RETURN *;

Practical example with Twitter influencers

In this practical example, PageRank algorithm will be executed on a fictional Twitter dataset. PageRank execution is grouped by the Twitter hashtag, and each Tweet can have a different number of retweets.

CREATE (n:Tweet {id: 1, hashtag: "#WorldCup", content: "Cool world cup! #WorldCup"});
CREATE (n:Tweet {id: 2, hashtag: "#WorldCup", content: "The ball is round #WorldCup!"});
 
CREATE (n:Tweet {id: 3, hashtag: "#christmas", content: "The town is so shiny during #christmas!"});
CREATE (n:Tweet {id: 4, hashtag: "#christmas", content: "Why didn't I get any presents for #christmas?"});
 
MATCH (n:Tweet {id: 1}) MERGE (n)<-[:RETWEETED]-(:Tweet {hashtag: "#WorldCup", content: "Croatia first this year!"});
MATCH (n:Tweet {id: 1}) MERGE (n)<-[:RETWEETED]-(:Tweet {hashtag: "#WorldCup", content: "Farewall Dani Alves!"});
MATCH (n:Tweet {id: 2}) MERGE (n)<-[:RETWEETED]-(:Tweet {hashtag: "#WorldCup", content: "This is the best WC ever!"});
MATCH (n:Tweet {id: 2}) MERGE (n)<-[:RETWEETED]-(:Tweet {hashtag: "#WorldCup", content: "It's not so hot this time of the year in Qatar."});
MATCH (n:Tweet {id: 2}) MERGE (n)<-[:RETWEETED]-(:Tweet {hashtag: "#WorldCup", content: "What are your predictions?"});
MATCH (n:Tweet {id: 3}) MERGE (n)<-[:RETWEETED]-(:Tweet {hashtag: "#christmas", content: "Don't be a Grinch!"});
MATCH (n:Tweet {id: 4}) MERGE (n)<-[:RETWEETED]-(:Tweet {hashtag: "#christmas", content: "I'll give you a present!"});
MATCH (n:Tweet {id: 4}) MERGE (n)<-[:RETWEETED]-(:Tweet {hashtag: "#christmas", content: "Santa Claus will visit you tonight!"});
MATCH (n:Tweet {id: 4}) MERGE (n)<-[:RETWEETED]-(:Tweet {hashtag: "#christmas", content: "This year save me from tears"});
MATCH (n:Tweet {id: 4}) MERGE (n)<-[:RETWEETED]-(:Tweet {hashtag: "#christmas", content: "All I want for Christmas is youuuu"});

Running PageRank on the whole network

To run the PageRank algorithms available in the MAGE library, use the following query:

CALL pagerank.get() YIELD node, rank
SET node.rank = rank;

The PageRank algorithm will take into account all the nodes in the graph. It doesn't really make sense to correlate tweets about World Cup with tweets about Christmas, as they are thematically quite different and should be analyzed separately.

Running PageRank on a subgraph

To run the PageRank on a subset of Christmas tweets only, that portion of the graph is saved as a projection and used as the first argument of the query module:

MATCH p=(n:Tweet {hashtag: "#christmas"})-[r]->(m)
WITH project(p) AS christmas_graph
CALL pagerank.get(christmas_graph) YIELD node, rank
SET node.rank = rank
RETURN node.hashtag, node.content, rank
ORDER BY rank DESC;

The above query successfully updated the rank of the Christmas tweets only!

Let's do the same on the World Cup tweets by changing the value of the hashtag property:

MATCH p=(n:Tweet {hashtag: "#WorldCup"})-[r]->(m)
WITH project(p) AS christmas_graph
CALL pagerank.get(christmas_graph) YIELD node, rank
SET node.rank = rank
RETURN node.hashtag, node.content, rank
ORDER BY rank DESC;

Mapping custom procedure names to existing query procedures

If you want to replace procedure names your application calls without changing the application code, you can define the mapping of the old and new procedure names in a JSON file, then set the path to the files as the value of the query-callable-mappings-path configuration flag.

Example of a JSON file:

{
    "db.components": "mgps.components",
    "util.validate": "mgps.validate"
}

Control procedure memory usage

When running a procedure, Memgraph controls the maximum memory usage that the procedure may consume during its execution. By default, the upper memory limit when running a procedure is 100 MB. If your query procedure requires more memory to yield its results, you can increase the memory limit using the following syntax:

CALL module.procedure(arg1, arg2, ...) PROCEDURE MEMORY LIMIT 100 KB YIELD result;
CALL module.procedure(arg1, arg2, ...) PROCEDURE MEMORY LIMIT 100 MB YIELD result;
CALL module.procedure(arg1, arg2, ...) PROCEDURE MEMORY UNLIMITED YIELD result;

The limit can either be specified to a specific value (either in KB or in MB), or it can be set to unlimited.