cugraph

NVIDIA cuGraph is a graph analytics library that is part of NVIDIA’s RAPIDS open-source data science suite containing machine learning tools and libraries for various applications in data science. It can be used from Memgraph on machines that meet the system requirements.

This set of modules is built on top of NVIDIA cuGraph and provides a set of wrappers for most of the algorithms present in the cuGraph repository.

TraitValue
Module typemodule
ImplementationCUDA
Graph directionundirected/directed
Edge weightsunweighted/weighted
Parallelismparallelized

Modules and procedures

The cugraph module is a collection of distinct GPU-powered modules with their own procedures.

You can execute these algorithms on graph projections, subgraphs or portions of the graph.

cugraph.balanced_cut_clustering.get()

Procedure for finding the balanced cut clustering of the graph’s nodes.

Input:

  • num_clusters: integer ➡ Number of clusters.
  • num_eigenvectors: integer (default=2) ➡ Number of eigenvectors to be used (must be less than or equal to num_clusters).
  • ev_tolerance: float (default=0.00001) ➡ Tolerance used by the eigensolver.
  • ev_max_iter: integer (default=100) ➡ Maximum number of iterations for the eigensolver.
  • kmean_tolerance: float (default=0.00001) ➡ Tolerance used by the k-means solver.
  • kmean_max_iter: integer (default=100) ➡ Maximum number of iterations for the k-means solver.
  • weight_property: string (default="weight") ➡ The values of the given relationship. Property are used as weights by the algorithm. If this property is not set for a relationship, the fallback value is 1.0.

Output:

  • node: Vertex ➡ A graph node for which the algorithm was performed and returned as part of the results.
  • cluster: integer ➡ Cluster of a node.

Usage:

FInd the balaned cut clustering using the following query:

CALL cugraph.balanced_cut_clustering.get(3)
YIELD node, cluster
RETURN node, cluster;

cugraph.betweenness_centrality.get()

Procedure for finding betweenness centrality scores for all nodes in the graph.

Input:

  • normalized: boolean (default=True) ➡ Normalize the output.
  • directed: boolean (default=True) ➡ Graph directedness. (default True)
  • weight_property: string (default="weight") ➡ The values of the given relationship property are used as weights by the algorithm. If this property is not set for a relationship, the fallback value is 1.0.

Output:

  • node: Vertex ➡ A graph node for which the algorithm was performed and returned as part of the results.
  • betweenness_centrality: float ➡ Betweenness centrality score of a node.

Usage:

Calculate betweenness centrality scores using the following query:

CALL cugraph.betweenness_centrality.get()
YIELD node, betweenness_centrality
RETURN node, betweenness_centrality;

cugraph.generator.rmat()

Generate a graph using a Recursive MATrix (R-MAT) graph generation algorithm and load it in Memgraph.

Input:

  • scale: integer (default=4) ➡ Scale factor to set the number of nodes in the graph.
  • num_edges: integer (default=100) ➡ Number of relationships in the generated graph.
  • node_labels: mgp.List[string] (default=[]) ➡ Labels on created nodes. Defaults to empty list.
  • edge_type: string (default="RELATIONSHIP") ➡ Relationship type, defines the name of the relationship.
  • a: double (default=0.57) ➡ First partition probability.
  • b: double (default=0.19) ➡ Second partition probability.
  • c: double (default=0.19) ➡ Third partition probability.
  • seed: integer (default=0) ➡ RNG (random number generator) seed value
  • clip_and_flip: boolean (default=False) ➡ Controls whether to generate relationships only in the lower triangular part (including the diagonal) of the graph adjacency matrix (if set to True) or not (if set to False).

Output:

The generated graph is loaded into Memgraph.

  • message: string ➡ Success message if the graph is loaded.

Usage:

Use the following query to generate a graph using Recursive MATrix (R-MAT) graph generation algorithm:

CALL cugraph.generator.rmat()
YIELD message;

cugraph.hits.get()

Find HITS authority and hub values for all nodes in the graph. The HITS algorithm computes two numbers for each node: its authority, which estimates the value of its content, and its hub value, which estimates the value of its links to other nodes.

Whereas the HITS algorithm was designed for directed graphs, this implementation does not check if the input graph is directed and will execute on undirected graphs.

Input:

  • tolerance: float (default=1e-5) ➡ HITS approximation tolerance (custom values not supported by NVIDIA cuGraph).
  • max_iterations: integer (default=100) ➡ Maximum number of iterations before returning an answer (custom values not supported by NVIDIA cuGraph).
  • normalized: boolean (default=True) ➡ Normalize the output (False is not supported by NVIDIA cuGraph).
  • directed: boolean (default=True) ➡ Graph directedness.

Output:

  • node: Vertex ➡ A graph node for which the algorithm was performed and returned as part of the results.
  • hubs: float ➡ Hub value of a node.
  • authorities: float ➡ Authority value of a node.

Usage:

Use the following query to get the hub and authority value of nodes:

CALL cugraph.hits.get()
YIELD node, hubs, authorities
RETURN node, hubs, authorities;

cugraph.katz_centrality.get()

Find Katz centrality scores for all nodes in the graph.

Input:

  • alpha: float (default=None) ➡ Attenuation factor defining the walk length importance. If not specified, calculated as 1 / max(out_degree).
  • beta: float (default=1.0) ➡ Weight scalar (currently not supported by NVIDIA cuGraph).
  • epsilon: float (default=1e-6) ➡ Set the tolerance for the approximation, this parameter should be a small magnitude value.
  • max_iterations: integer (default=100) ➡ Maximum number of iterations before returning an answer.
  • normalized: boolean (default=True) ➡ Normalize the output.
  • directed: boolean (default=True) ➡ Graph directedness.

Output:

  • node: Vertex ➡ A graph node for which the algorithm was performed and returned as part of the results.
  • katz_centrality: float ➡ Katz centrality score of a node.

Usage:

Use the following query to calculate the Katz centrality score:

CALL cugraph.katz_centrality.get()
YIELD node, katz_centrality
RETURN node, katz_centrality;

cugraph.leiden.get()

Partition the graph into communities using the Leiden method.

Input:

  • max_iterations: integer (default=100) ➡ Maximum number of iterations (levels) of the algorithm.
  • resolution: float (default=1.0) ➡ Controls community size (lower values lead to fewer, larger communities and vice versa).

Output:

  • node: Vertex ➡ A graph node for which the algorithm was performed and returned as part of the results.
  • partition: integer ➡ Partition of a node.

Usage:

Use the following query to partition the graph into communities:

CALL cugraph.leiden.get()
YIELD node, partition
RETURN node, partition;

cugraph.louvain.get()

Partition the graph into communities using the Louvain method.

Input:

  • max_iterations: integer (default=100) ➡ Maximum number of iterations (levels) of the algorithm.
  • resolution: float (default=1.0) ➡ Controls community size (lower values lead to fewer, larger communities and vice versa).
  • directed: boolean (default=True) ➡ Graph directedness.

Output:

  • node: Vertex ➡ A graph node for which the algorithm was performed and returned as part of the results.
  • partition: integer ➡ Partition of a node.

Usage:

Use the following query to partition the graph into communities:

CALL cugraph.louvain.get()
YIELD node, partition
RETURN node, partition;

cugraph.pagerank.get()

Find PageRank scores for all nodes in the graph.

Input:

  • max_iterations: integer (default=100) ➡ The maximum number of iterations before returning an answer (default 100). Use it to limit the execution time or do an early exit before the solver reaches the convergence tolerance.
  • damping_factor: float (default=0.85) ➡ The damping factor represents the probability to follow an outgoing edge.
  • stop_epsilon: float (default=1e-5) ➡ The convergence tolerance for PageRank approximation. Lowering tolerance improves the approximation, but setting this parameter too low can ensue in non-convergence due to numerical round-off. Values between 0.01 and 0.00001 are usually acceptable.
  • weight_property: string (default="weight") ➡ The values of the given relationship property are used as weights by the algorithm. If this property is not set for a relationship, the fallback value is 1.0.

Output:

  • node: Vertex ➡ A graph node for which the algorithm was performed and returned as part of the results.
  • pagerank: float ➡ PageRank score of a node.

Usage:

Use the following query to get PageRank scores:

CALL cugraph.pagerank.get()
YIELD node, pagerank
RETURN node, pagerank;

cugraph.personalized_pagerank.get()

Find personalized PageRank scores for all nodes in the graph.

Input:

  • personalization_vertices: mgp.List[mgp.Vertex] ➡ Graph nodes with personalization values.
  • personalization_values: mgp.List[float] ➡ Above nodes’ personalization values.
  • weight_property: string (default="weight") ➡ The values of the given relationship. property are used as weights by the algorithm. If this property is not set for a relationship, the fallback value is 1.0.
  • damping_factor: float (default=0.85) ➡ The damping factor represents the probability to follow an outgoing edge.
  • stop_epsilon: float (default=1e-5) ➡ The convergence tolerance for PageRank approximation. Lowering tolerance improves the approximation, but setting this parameter too low can ensue in non-convergence due to numerical round-off. Values between 0.01 and 0.00001 are usually acceptable.
  • max_iterations: integer (default=100) ➡ The maximum number of iterations before returning an answer. Use it to limit the execution time or do an early exit before the solver reaches the convergence tolerance.

Output:

  • node: Vertex ➡ A graph node for which the algorithm was performed and returned as part of the results.
  • pagerank: float ➡ PageRank score of a node.

Usage:

use the following query to calculate the personalized PageRank score:

MATCH (n: Node {id: 1}), (m: Node {id: 2})
CALL cugraph.pagerank.get([n, m], [0.2, 0.5])
YIELD node, pagerank
RETURN node, pagerank;

cugraph.spectral_clustering.get()

Find the spectral clustering of the graph’s nodes.

Input:

  • num_clusters: integer ➡ Number of clusters.
  • num_eigenvectors: integer (default=2) ➡ Number of eigenvectors to be used (must be less than or equal to num_clusters).
  • ev_tolerance: float (default=0.00001) ➡ Tolerance used by the eigensolver.
  • ev_max_iter: integer (default=100) ➡ Maximum number of iterations for the eigensolver.
  • kmean_tolerance: float (default=0.00001) ➡ Tolerance used by the k-means solver.
  • kmean_max_iter: integer (default=100) ➡ Maximum number of iterations for the k-means solver.
  • weight_property: string (default="weight") ➡ The values of the given relationship property are used as weights by the algorithm. If this property is not set for a relationship, the fallback value is 1.0.

Output:

  • node: Vertex ➡ A graph node for which the algorithm was performed and returned as part of the results.
  • cluster: integer ➡ Cluster of a node.

Usage:

Use the following query to find the spectral clustering of the node:

CALL cugraph.spectral_clustering.get(3)
YIELD node, cluster
RETURN node, cluster;