periodic

The periodic module enables users to execute a query periodically in batches. In this case, the name periodic doesn’t indicate that the query is executed after a time interval, but rather that, due to the complexity of the query, the results of some input source are batched to speed up execution.

If you enabled authentication in Memgraph, set the following environment variables to be able to use the procedures in the query module:

MG_HOST
MG_PORT
MG_USERNAME
MG_PASSWORD

These environment variables are different from the MEMGRAPH environment variables and need to be set in addition to them.

⚠️

As the results are batched and executed in different transactions, every executed batch is committed by itself. If an issue occurs while running this procedure, the already committed batches cannot be rolled back.

Source code

Trait	Value
Module type	module
Implementation	C++
Parallelism	sequential

Procedures

The procedure executes a query periodically in batches.

`iterate()`

Input:

subgraph: Graph (OPTIONAL) ➡ A specific subgraph, which is an object of type Graph returned by the project() function, on which the algorithm is run. If subgraph is not specified, the algorithm is computed on the entire graph by default.
input_query: string ➡ The input query which will yield the results that need to be batched.
running_query: string ➡ The query which will be executed on the batched results:
config: Map[string, int] ➡ Procedure parameters:
- batch_size: Integer ➡ The value specifies the number of results in a batch.

Output:

success: boolean ➡ true if the procedure executed successfully as a whole, false otherwise.
number_of_executed_batches: Integer ➡ The number of executed batches (possibly a fraction of the full number if the procedure returned success: false).

Usage:

To batch results, use the following query:

CALL periodic.iterate(
  "LOAD CSV FROM '/tmp/file.csv' WITH HEADER AS row RETURN row.node_id AS node_id, row.supernode_id AS supernode_id", 
  "MATCH (s:SuperNode {supernode_id: supernode_id}), (n:Node {node_id: node_id}) CREATE (s)-[:HAS_REL_TO]->(n)", 
  {batch_size: 5000}) 
YIELD success, number_of_executed_batches
RETURN success, number_of_executed_batches;

`delete()`

Input:

subgraph: Graph (OPTIONAL) ➡ A specific subgraph, which is an object of type Graph returned by the project() function, on which the algorithm is run. If subgraph is not specified, the algorithm is computed on the entire graph by default.
config: Map[string, int] ➡ Procedure parameters:
- labels: List[string] ➡ The value specifies the which node labels to search when deleting.
- edge_types: List[string] ➡ The value specifies the which edge types to search when deleting.
- batch_size: integer ➡ The value specifies the batch size used when deleting periodically.

Output:

success: boolean ➡ true if the procedure executed successfully as a whole, false otherwise.
number_of_executed_batches: Integer ➡ The number of executed batches (possibly a fraction of the full number if the procedure returned success: false).
number_of_deleted_nodes: Integer ➡ The number of deleted nodes during periodic delete.
number_of_deleted_relationships: Integer ➡ The number of deleted relationships during periodic delete.

Usage:

Batch size in every query is arbitrary. It is advised to be up to a million, but highly depends on the workload. Please test the batch size according to your workload.

To batch delete only the nodes, and their corresponding edges, use the following query:

CALL periodic.delete({batch_size: 100000, labels: ["MY_LABEL"]}) 
YIELD success, number_of_executed_batches, number_of_deleted_nodes, number_of_deleted_relationships
RETURN success, number_of_executed_batches, number_of_deleted_nodes, number_of_deleted_relationships;

To batch delete only the relationships, use the following query:

CALL periodic.delete({batch_size: 100000, edge_types: ["MY_EDGE_TYPE"]}) 
YIELD success, number_of_executed_batches, number_of_deleted_nodes, number_of_deleted_relationships
RETURN success, number_of_executed_batches, number_of_deleted_nodes, number_of_deleted_relationships;

To batch delete everything, use the following query:

CALL periodic.delete({batch_size: 100000}) 
YIELD success, number_of_executed_batches, number_of_deleted_nodes, number_of_deleted_relationships
RETURN success, number_of_executed_batches, number_of_deleted_nodes, number_of_deleted_relationships;

Example

The example shows how to import relationships that create a supernode from a CSV file.

Create indexes

First, create indexes on all relevant labels and properties to speed up import.

CREATE INDEX ON :SuperNode;
CREATE INDEX ON :SuperNode(supernode_id);
CREATE INDEX ON :Node;
CREATE INDEX ON :Node(node_id);

Create nodes

At the end of the example, one node will be connected to the other million nodes. The following query will create a million and one nodes.

CREATE (:SuperNode {supernode_id: 1});
FOREACH (i IN range(1, 1000000) | CREATE (:Node {id: i}));

The CSV file

The CSV file contains the data needed to create the relationships, the ID of a start node and the end node of a relationship:

supernode_id,node_id
1,1
1,2
1,3
1,4
1,5
1,6
...
1,999998
1,999999
1,1000000

`periodic.iterate()`

The procedure will execute the relationship creation between a supernode and all the other nodes based on the CSV file in batches:

CALL periodic.iterate(
  "LOAD CSV FROM '/tmp/file.csv' WITH HEADER AS row RETURN row.node_id AS node_id, row.supernode_id AS supernode_id",
  "MATCH (s:SuperNode {supernode_id: supernode_id}), (n:Node {node_id: node_id}) CREATE (s)-[:HAS_REL_TO]->(n)",
  {batch_size: 5000})
YIELD success, number_of_executed_batches
RETURN success, number_of_executed_batches;

Result:

+------------------+----------------------------+
| success          | number_of_executed_batches |
+------------------+----------------------------+
| true             | 200                        |
+------------------+----------------------------+

path refactor