community_detection

Community in graphs mirrors real-world communities, like social circles. In a graph, communities are sets of nodes. M. Girvan and M. E. J. Newman note that nodes in a community connect more intensely with each other than with outside nodes.

This module employs the Louvain method (opens in a new tab) for community detection based on Blondel's paper Fast unfolding of communities in large networks (opens in a new tab) using the Grappolo (opens in a new tab) parallel approach.

Louvain's algorithm is from the modularity maximization community detection family. It starts with each node in its own community, then uses a greedy approach to merge communities for the best modularity score. With an O(nlogn)\mathcal{O}(n\log{}n) runtime for nn nodes, it suits large graphs. It's further enhanced by parallelization via a distance-1 graph coloring and a coarsening technique preserving communities.

TraitValue
Module typealgorithm
ImplementationC++
Graph directionundirected
Relationship weightsweighted / unweighted
Parallelismparallel

Procedures

get()

Computes graph communities using the Louvain method.

Input:

  • weight: string (default=null) ➡ Specifies the default relationship weight. If not set, the algorithm uses the weight relationship attribute when present and otherwise treats the graph as unweighted.
  • coloring: boolean (default=False) ➡ If set, use the graph coloring heuristic for effective parallelization.
  • min_graph_shrink: integer (default=100000) ➡ The graph coarsening optimization stops upon shrinking the graph to this many nodes.
  • community_alg_threshold: double (default=0.000001) ➡ Controls how long the algorithm iterates. When the gain in modularity goes below the threshold, iteration is over. Valid values are between 0 and 1 (exclusive).
  • coloring_alg_threshold: double (default=0.01) ➡ If coloring is enabled, controls how long the algorithm iterates. When the gain in modularity goes below this threshold, a final iteration is performed using the community_alg_threshold value. Valid values are between 0 and 1 (exclusive). This parameter's value should be higher than community_alg_threshold.

Output:

  • node: Vertex ➡ Graph node.
  • community_id: integer ➡ Community ID. Defaults to 1-1 if the node does not belong to any community.

Usage:

Use the following query to detect communities:

CALL community_detection.get()
YIELD node, community_id;

get_subgraph()

Computes graph communities over a subgraph using the Louvain method.

Input:

  • subgraph_nodes: List[Node] ➡ List of nodes in the subgraph.
  • subgraph_relationships: List[Relationship] ➡ List of relationships in the subgraph.
  • weight: str (default=null) ➡ Specifies the default relationship weight. If not set, the algorithm uses the weight relationship attribute when present and otherwise treats the graph as unweighted.
  • coloring: bool (default=False) ➡ If set, use the graph coloring heuristic for effective parallelization.
  • min_graph_shrink: int (default=100000) ➡ The graph coarsening optimization stops upon shrinking the graph to this many nodes.
  • community_alg_threshold: double (default=0.000001) ➡ Controls how long the algorithm iterates. When the gain in modularity goes below the threshold, iteration is over. Valid values are between 0 and 1 (exclusive).
  • coloring_alg_threshold: double (default=0.01) ➡ If coloring is enabled, controls how long the algorithm iterates. When the gain in modularity goes below this threshold, a final iteration is performed using the community_alg_threshold value. Valid values are between 0 and 1 (exclusive). This parameter's value should be higher than community_alg_threshold.

Output:

  • node: Vertex ➡ Graph node.
  • community_id: int ➡ Community ID. Defaults to 1-1 if the node does not belong to any community.

Usage:

Use the following query to compute communities in a subgraph:

MATCH (a)-[e]-(b)
WITH COLLECT(a) AS nodes, COLLECT (e) AS relationships
CALL community_detection.get_subgraph(nodes, relationships)
YIELD node, community_id;

Example

Database state

The database contains the following data:

Created with the following Cypher queries:

MERGE (a: Node {id: 0}) MERGE (b: Node {id: 1}) CREATE (a)-[r: Relation]->(b);
MERGE (a: Node {id: 0}) MERGE (b: Node {id: 2}) CREATE (a)-[r: Relation]->(b);
MERGE (a: Node {id: 1}) MERGE (b: Node {id: 2}) CREATE (a)-[r: Relation]->(b);
MERGE (a: Node {id: 2}) MERGE (b: Node {id: 3}) CREATE (a)-[r: Relation]->(b);
MERGE (a: Node {id: 3}) MERGE (b: Node {id: 4}) CREATE (a)-[r: Relation]->(b);
MERGE (a: Node {id: 3}) MERGE (b: Node {id: 5}) CREATE (a)-[r: Relation]->(b);
MERGE (a: Node {id: 4}) MERGE (b: Node {id: 5}) CREATE (a)-[r: Relation]->(b);

Detect communities

Get communities using the following query:

CALL community_detection.get()
YIELD node, community_id
RETURN node.id AS node_id, community_id
ORDER BY node_id;

Results show which nodes belong to community 1, and which to community 2:

┌─────────────────────────┬─────────────────────────┐
│ node_id                 │ community_id            │
├─────────────────────────┼─────────────────────────┤
│ 0                       │ 1                       │
│ 1                       │ 1                       │
│ 2                       │ 1                       │
│ 3                       │ 2                       │
│ 4                       │ 2                       │
│ 5                       │ 2                       │
└─────────────────────────┴─────────────────────────┘