community_detection

Community in graphs mirrors real-world communities, like social circles. In a graph, communities are sets of nodes. M. Girvan and M. E. J. Newman note that nodes in a community connect more intensely with each other than with outside nodes.

This module employs the Louvain method (opens in a new tab) for community detection based on Blondel's paper Fast unfolding of communities in large networks (opens in a new tab) using the Grappolo (opens in a new tab) parallel approach.

Louvain's algorithm is from the modularity maximization community detection family. It starts with each node in its own community, then uses a greedy approach to merge communities for the best modularity score. With an O(nlogn)\mathcal{O}(n\log{}n) runtime for nn nodes, it suits large graphs. It's further enhanced by parallelization via a distance-1 graph coloring and a coarsening technique preserving communities.

TraitValue
Module typealgorithm
ImplementationC++
Graph directionundirected
Relationship weightsweighted / unweighted
Parallelismparallel

Procedures

get()

Computes graph communities using the Louvain method.

Input:

  • subgraph: Graph (OPTIONAL) ➡ A specific subgraph, which is an object of type Graph returned by the project() function, on which the algorithm is run. If subgraph is not specified, the algorithm is computed on the entire graph by default.

  • weight: string (default=null) ➡ Specifies the name of the property containing the edge weight. Users can set their own weight property; if this property is not specified, the algorithm uses the weight edge attribute by default. If neither is set, each edge's weight defaults to 1. To utilize a custom weight property, the user must set the --storage-properties-on-edges=true flag.

  • coloring: boolean (default=False) ➡ If set, use the graph coloring heuristic for effective parallelization.

  • min_graph_shrink: integer (default=100000) ➡ The graph coarsening optimization stops upon shrinking the graph to this many nodes.

  • community_alg_threshold: double (default=0.000001) ➡ Controls how long the algorithm iterates. When the gain in modularity goes below the threshold, iteration is over. Valid values are between 0 and 1 (exclusive).

  • coloring_alg_threshold: double (default=0.01) ➡ If coloring is enabled, controls how long the algorithm iterates. When the gain in modularity goes below this threshold, a final iteration is performed using the community_alg_threshold value. Valid values are between 0 and 1 (exclusive). This parameter's value should be higher than community_alg_threshold.

Output:

  • node: Vertex ➡ A graph node for which the algorithm was performed and returned as part of the results.
  • community_id: integer ➡ Community ID. Defaults to 1-1 if the node does not belong to any community.

Usage:

Use the following query to detect communities:

CALL community_detection.get()
YIELD node, community_id;

get_subgraph()

Computes graph communities over a subgraph using the Louvain method.

Input:

  • subgraph: Graph (OPTIONAL) ➡ A specific subgraph, which is an object of type Graph returned by the project() function, on which the algorithm is run. If subgraph is not specified, the algorithm is computed on the entire graph by default.

  • subgraph_nodes: List[Node] ➡ List of nodes in the subgraph.

  • subgraph_relationships: List[Relationship] ➡ List of relationships in the subgraph.

  • weight: string (default=null) ➡ Specifies the name of the property containing the edge weight. Users can set their own weight property; if this property is not specified, the algorithm uses the weight edge attribute by default. If neither is set, each edge's weight defaults to 1. To utilize a custom weight property, the user must set the --storage-properties-on-edges=true flag.

  • coloring: bool (default=False) ➡ If set, use the graph coloring heuristic for effective parallelization.

  • min_graph_shrink: int (default=100000) ➡ The graph coarsening optimization stops upon shrinking the graph to this many nodes.

  • community_alg_threshold: double (default=0.000001) ➡ Controls how long the algorithm iterates. When the gain in modularity goes below the threshold, iteration is over. Valid values are between 0 and 1 (exclusive).

  • coloring_alg_threshold: double (default=0.01) ➡ If coloring is enabled, controls how long the algorithm iterates. When the gain in modularity goes below this threshold, a final iteration is performed using the community_alg_threshold value. Valid values are between 0 and 1 (exclusive). This parameter's value should be higher than community_alg_threshold.

Output:

  • node: Vertex ➡ A graph node for which the algorithm was performed and returned as part of the results.
  • community_id: int ➡ Community ID. Defaults to 1-1 if the node does not belong to any community.

Usage:

Use the following query to compute communities in a subgraph:

MATCH (a)-[e]-(b)
WITH COLLECT(a) AS nodes, COLLECT (e) AS relationships
CALL community_detection.get_subgraph(nodes, relationships)
YIELD node, community_id;

Example

Database state

The database contains the following data:

Created with the following Cypher queries:

MERGE (a: Node {id: 0}) MERGE (b: Node {id: 1}) CREATE (a)-[r: Relation]->(b);
MERGE (a: Node {id: 0}) MERGE (b: Node {id: 2}) CREATE (a)-[r: Relation]->(b);
MERGE (a: Node {id: 1}) MERGE (b: Node {id: 2}) CREATE (a)-[r: Relation]->(b);
MERGE (a: Node {id: 2}) MERGE (b: Node {id: 3}) CREATE (a)-[r: Relation]->(b);
MERGE (a: Node {id: 3}) MERGE (b: Node {id: 4}) CREATE (a)-[r: Relation]->(b);
MERGE (a: Node {id: 3}) MERGE (b: Node {id: 5}) CREATE (a)-[r: Relation]->(b);
MERGE (a: Node {id: 4}) MERGE (b: Node {id: 5}) CREATE (a)-[r: Relation]->(b);

Detect communities

Get communities using the following query:

CALL community_detection.get()
YIELD node, community_id
RETURN node.id AS node_id, community_id
ORDER BY node_id;

Results show which nodes belong to community 1, and which to community 2:

┌─────────────────────────┬─────────────────────────┐
│ node_id                 │ community_id            │
├─────────────────────────┼─────────────────────────┤
│ 0                       │ 1                       │
│ 1                       │ 1                       │
│ 2                       │ 1                       │
│ 3                       │ 2                       │
│ 4                       │ 2                       │
│ 5                       │ 2                       │
└─────────────────────────┴─────────────────────────┘