community_detection

Community in graphs mirrors real-world communities, like social circles. In a graph, communities are sets of nodes. M. Girvan and M. E. J. Newman note that nodes in a community connect more intensely with each other than with outside nodes.

This module employs the Louvain method (opens in a new tab) for community detection based on Blondel's paper Fast unfolding of communities in large networks (opens in a new tab) using the Grappolo (opens in a new tab) parallel approach.

Louvain's algorithm is from the modularity maximization community detection family. It starts with each node in its own community, then uses a greedy approach to merge communities for the best modularity score. With an $\mathcal{O}(n\log{}n)$ runtime for $n$ nodes, it suits large graphs. It's further enhanced by parallelization via a distance-1 graph coloring and a coarsening technique preserving communities.

Source code

Trait	Value
Module type	algorithm
Implementation	C++
Graph direction	undirected
Relationship weights	weighted / unweighted
Parallelism	parallel

Procedures

You can execute this algorithm on graph projections, subgraphs or portions of the graph.

`get()`

Computes graph communities using the Louvain method.

Input:

subgraph: Graph (OPTIONAL) ➡ A specific subgraph, which is an object of type Graph returned by the project() function, on which the algorithm is run.
weight: string (default=null) ➡ Specifies the default relationship weight. If not set, the algorithm uses the weight relationship attribute when present and otherwise treats the graph as unweighted.
coloring: boolean (default=False) ➡ If set, use the graph coloring heuristic for effective parallelization.
min_graph_shrink: integer (default=100000) ➡ The graph coarsening optimization stops upon shrinking the graph to this many nodes.
community_alg_threshold: double (default=0.000001) ➡ Controls how long the algorithm iterates. When the gain in modularity goes below the threshold, iteration is over. Valid values are between 0 and 1 (exclusive).
coloring_alg_threshold: double (default=0.01) ➡ If coloring is enabled, controls how long the algorithm iterates. When the gain in modularity goes below this threshold, a final iteration is performed using the community_alg_threshold value. Valid values are between 0 and 1 (exclusive). This parameter's value should be higher than community_alg_threshold.

Output:

node: Vertex ➡ Graph node.
community_id: integer ➡ Community ID. Defaults to $-1$ if the node does not belong to any community.

Usage:

Use the following query to detect communities:

CALL community_detection.get()
YIELD node, community_id;

`get_subgraph()`

Computes graph communities over a subgraph using the Louvain method.

Input:

subgraph: Graph (OPTIONAL) ➡ A specific subgraph, which is an object of type Graph returned by the project() function, on which the algorithm is run.
subgraph_nodes: List[Node] ➡ List of nodes in the subgraph.
subgraph_relationships: List[Relationship] ➡ List of relationships in the subgraph.
weight: str (default=null) ➡ Specifies the default relationship weight. If not set, the algorithm uses the weight relationship attribute when present and otherwise treats the graph as unweighted.
coloring: bool (default=False) ➡ If set, use the graph coloring heuristic for effective parallelization.
min_graph_shrink: int (default=100000) ➡ The graph coarsening optimization stops upon shrinking the graph to this many nodes.
community_alg_threshold: double (default=0.000001) ➡ Controls how long the algorithm iterates. When the gain in modularity goes below the threshold, iteration is over. Valid values are between 0 and 1 (exclusive).
coloring_alg_threshold: double (default=0.01) ➡ If coloring is enabled, controls how long the algorithm iterates. When the gain in modularity goes below this threshold, a final iteration is performed using the community_alg_threshold value. Valid values are between 0 and 1 (exclusive). This parameter's value should be higher than community_alg_threshold.

Output:

node: Vertex ➡ Graph node.
community_id: int ➡ Community ID. Defaults to $-1$ if the node does not belong to any community.

Usage:

Use the following query to compute communities in a subgraph:

MATCH (a)-[e]-(b)
WITH COLLECT(a) AS nodes, COLLECT (e) AS relationships
CALL community_detection.get_subgraph(nodes, relationships)
YIELD node, community_id;

Example

Database state

The database contains the following data:

Created with the following Cypher queries:

MERGE (a: Node {id: 0}) MERGE (b: Node {id: 1}) CREATE (a)-[r: Relation]->(b);
MERGE (a: Node {id: 0}) MERGE (b: Node {id: 2}) CREATE (a)-[r: Relation]->(b);
MERGE (a: Node {id: 1}) MERGE (b: Node {id: 2}) CREATE (a)-[r: Relation]->(b);
MERGE (a: Node {id: 2}) MERGE (b: Node {id: 3}) CREATE (a)-[r: Relation]->(b);
MERGE (a: Node {id: 3}) MERGE (b: Node {id: 4}) CREATE (a)-[r: Relation]->(b);
MERGE (a: Node {id: 3}) MERGE (b: Node {id: 5}) CREATE (a)-[r: Relation]->(b);
MERGE (a: Node {id: 4}) MERGE (b: Node {id: 5}) CREATE (a)-[r: Relation]->(b);

Detect communities

Get communities using the following query:

CALL community_detection.get()
YIELD node, community_id
RETURN node.id AS node_id, community_id
ORDER BY node_id;

Results show which nodes belong to community 1, and which to community 2:

┌─────────────────────────┬─────────────────────────┐
│ node_id                 │ community_id            │
├─────────────────────────┼─────────────────────────┤
│ 0                       │ 1                       │
│ 1                       │ 1                       │
│ 2                       │ 1                       │
│ 3                       │ 2                       │
│ 4                       │ 2                       │
│ 5                       │ 2                       │
└─────────────────────────┴─────────────────────────┘