Run algorithms
Built-in deep path traversal algorithms (BFS, DFS, Weighted shortest path, All shortest paths) are runned by using their designated clauses.
Running algorithms and procedures from query modules available in the MAGE
library are runned using the CALL
clause.
Run procedures from MAGE library
The MAGE library containes query modules, each with their own procedures.
To call the procedures within queries use the following Cypher syntax:
CALL module.procedure([optional parameter], arg1, "string_argument", ...) YIELD res1, res2, ...;
Every procedure has a first optional parameter and the rest of them depend on
the procedure you are trying to call. The optional parameter must be result of
the aggregation function
project()
. If such a
parameter is provided, all operations will be executed on a projected graph,
i.e., a subgraph. Otherwise, you will work on the whole
graph stored inside Memgraph.
Each procedure returns zero or more records, where each record contains named
fields. The YIELD
clause is used to select fields you are interested in or all
of them (*). If you are not interested in any fields, omit the YIELD
clause.
The procedure will still run, but the record fields will not be stored in
variables. If you are trying to YIELD
fields that are not a part of the
produced record, the query will result in an error.
Procedures can be standalone, or a part of a larger query when we want the procedure to work on data the query is producing.
For example:
MATCH (node) CALL module.procedure(node) YIELD result RETURN *;
When the CALL
clause is a part of a larger query, results from the query are
returned using the RETURN
clause. If the CALL
clause is followed by a clause
that only updates the data and doesn’t read it, RETURN
is unnecessary. It is
the Cypher convention that read-only queries need to end with a RETURN
, while
queries that update something don’t need to RETURN
anything.
Also, if the procedure itself writes into the database, all the rest of the
clauses in the query can only read from the database, and the CALL
clause can
only be followed by the YIELD
clause and/or RETURN
clause.
If a procedure returns a record with the same field name as some variable we
already have in the query, that field name can be aliased with some other name
using the AS
sub-clause:
MATCH (result) CALL module.procedure(42) YIELD result AS procedure_result RETURN *;
Managing query modules from Memgraph Lab
You can inspect query modules in Memgraph Lab (v2.0 and newer). Just navigate to Query Modules.
There you can see all the loaded query modules, delete them, or see procedures and transformations they define by clicking on the arrow icon.
By expanding procedures you can receive information about the procedure’s
signature, input and output variables and their data type, as well as the CALL
query you can run directly from the Query Modules view.
Run procedures on subgraph
When executing any MAGE procedure, the algorithm is executed on the whole network.
This can be impractical when:
- the graph is heterogeneous, and you want to run the module only on specific labels,
- the graph is too large, and you only want to use the analytics to update only a portion of it,
- the network contains multiple diverse data models and graphs, and running analytics on mixed graphs at once might yield unexpected results.
That is why Memgraph enables module execution on subgraphs and graph projections. The insights yielded by graph algorithms can then affect only the necessary nodes in the graph, making the data more consistent and up to its specifications.
Built-in deep path traversal algorithms cannot be run on subgraphs.
Available graph projections
Graph projection function in Memgraph is called project(), and it is used in the following way:
MATCH p=(n)-[r]->(m)
WITH project(p) AS subgraph
RETURN subgraph;
The path is specified first which denotes source and target nodes as well as
relationships connecting them. The function project
then constructs a subgraph
out of all the generated paths.
Because the matched pattern actually includes all the nodes and the relationships in the graph, the result of this query is a projection of the whole graph. To isolate a certain part of the graph, constraints need to be added to either labels, edge types, or properties, like in the query below:
MATCH p=(n:SpecificLabel)-[r:REL_TYPE]->(m:SpecificLabel)
WITH project(p) AS subgraph
RETURN subgraph;
The query above will return a subgraph of SpecificLabel
nodes connected with
the relationships of type REL_TYPE
.
Calling query modules on graph projections
If you want to run query modules on subgraphs, specify the projected graph as the first argument of the query module.
CALL module.procedure([optional graph parameter], argument1, argument2, ...) YIELD * RETURN *;
If the optional graph projection parameter is not included as the first argument, the query module will be executed on the whole graph.
Example of creating a projection and executing a procedure on a subgraph:
MATCH p=(n:SpecificLabel)-[r:REL_TYPE]->(m:SpecificLabel)
WITH project(p) AS subgraph
CALL module.procedure(subgraph, argument1, argument2, ...) YIELD * RETURN *;
Practical example with Twitter influencers
In this practical example, PageRank algorithm will be executed on a fictional Twitter dataset. PageRank execution is grouped by the Twitter hashtag, and each Tweet can have a different number of retweets.
CREATE (n:Tweet {id: 1, hashtag: "#WorldCup", content: "Cool world cup! #WorldCup"});
CREATE (n:Tweet {id: 2, hashtag: "#WorldCup", content: "The ball is round #WorldCup!"});
CREATE (n:Tweet {id: 3, hashtag: "#christmas", content: "The town is so shiny during #christmas!"});
CREATE (n:Tweet {id: 4, hashtag: "#christmas", content: "Why didn't I get any presents for #christmas?"});
MATCH (n:Tweet {id: 1}) MERGE (n)<-[:RETWEETED]-(:Tweet {hashtag: "#WorldCup", content: "Croatia first this year!"});
MATCH (n:Tweet {id: 1}) MERGE (n)<-[:RETWEETED]-(:Tweet {hashtag: "#WorldCup", content: "Farewall Dani Alves!"});
MATCH (n:Tweet {id: 2}) MERGE (n)<-[:RETWEETED]-(:Tweet {hashtag: "#WorldCup", content: "This is the best WC ever!"});
MATCH (n:Tweet {id: 2}) MERGE (n)<-[:RETWEETED]-(:Tweet {hashtag: "#WorldCup", content: "It's not so hot this time of the year in Qatar."});
MATCH (n:Tweet {id: 2}) MERGE (n)<-[:RETWEETED]-(:Tweet {hashtag: "#WorldCup", content: "What are your predictions?"});
MATCH (n:Tweet {id: 3}) MERGE (n)<-[:RETWEETED]-(:Tweet {hashtag: "#christmas", content: "Don't be a Grinch!"});
MATCH (n:Tweet {id: 4}) MERGE (n)<-[:RETWEETED]-(:Tweet {hashtag: "#christmas", content: "I'll give you a present!"});
MATCH (n:Tweet {id: 4}) MERGE (n)<-[:RETWEETED]-(:Tweet {hashtag: "#christmas", content: "Santa Claus will visit you tonight!"});
MATCH (n:Tweet {id: 4}) MERGE (n)<-[:RETWEETED]-(:Tweet {hashtag: "#christmas", content: "This year save me from tears"});
MATCH (n:Tweet {id: 4}) MERGE (n)<-[:RETWEETED]-(:Tweet {hashtag: "#christmas", content: "All I want for Christmas is youuuu"});
Running PageRank on the whole network
To run the PageRank algorithms available in the MAGE library, use the following query:
CALL pagerank.get() YIELD node, rank
SET node.rank = rank;
The PageRank algorithm will take into account all the nodes in the graph. It doesn’t really make sense to correlate tweets about World Cup with tweets about Christmas, as they are thematically quite different and should be analyzed separately.
Running PageRank on a subgraph
To run the PageRank on a subset of Christmas tweets only, that portion of the graph is saved as a projection and used as the first argument of the query module:
MATCH p=(n:Tweet {hashtag: "#christmas"})-[r]->(m)
WITH project(p) AS christmas_graph
CALL pagerank.get(christmas_graph) YIELD node, rank
SET node.rank = rank
RETURN node.hashtag, node.content, rank
ORDER BY rank DESC;
The above query successfully updated the rank of the Christmas tweets only!
Let’s do the same on the World Cup tweets by changing the value of the hashtag property:
MATCH p=(n:Tweet {hashtag: "#WorldCup"})-[r]->(m)
WITH project(p) AS christmas_graph
CALL pagerank.get(christmas_graph) YIELD node, rank
SET node.rank = rank
RETURN node.hashtag, node.content, rank
ORDER BY rank DESC;
Mapping custom procedure names to existing query procedures
If you want to replace procedure names your application calls without changing
the application code, you can define the mapping of the old and new procedure
names in a JSON file, then set the path to the files as the value of the
query-callable-mappings-path
configuration
flag.
Example of a JSON file:
{
"db.components": "mgps.components",
"util.validate": "mgps.validate"
}
Control procedure memory usage
When running a procedure, Memgraph controls the maximum memory usage that the
procedure may consume during its execution. By default, the upper memory limit
when running a procedure is 100 MB
. If your query procedure requires more
memory to yield its results, you can increase the memory limit using the
following syntax:
CALL module.procedure(arg1, arg2, ...) PROCEDURE MEMORY LIMIT 100 KB YIELD result;
CALL module.procedure(arg1, arg2, ...) PROCEDURE MEMORY LIMIT 100 MB YIELD result;
CALL module.procedure(arg1, arg2, ...) PROCEDURE MEMORY UNLIMITED YIELD result;
The limit can either be specified to a specific value (either in KB
or in
MB
), or it can be set to unlimited.