community_detection
Community in graphs mirrors real-world communities, like social circles. In a graph, communities are sets of nodes. M. Girvan and M. E. J. Newman note that nodes in a community connect more intensely with each other than with outside nodes.
This module employs the Louvain method (opens in a new tab) for community detection based on Blondel's paper Fast unfolding of communities in large networks (opens in a new tab) using the Grappolo (opens in a new tab) parallel approach.
Louvain's algorithm is from the modularity maximization community detection family. It starts with each node in its own community, then uses a greedy approach to merge communities for the best modularity score. With an runtime for nodes, it suits large graphs. It's further enhanced by parallelization via a distance-1 graph coloring and a coarsening technique preserving communities.
Trait | Value |
---|---|
Module type | algorithm |
Implementation | C++ |
Graph direction | undirected |
Relationship weights | weighted / unweighted |
Parallelism | parallel |
Procedures
You can execute this algorithm on graph projections, subgraphs or portions of the graph.
get()
Computes graph communities using the Louvain method.
Input:
subgraph: Graph
(OPTIONAL) ➡ A specific subgraph, which is an object of type Graph returned by theproject()
function, on which the algorithm is run.weight: string (default=null)
➡ Specifies the default relationship weight. If not set, the algorithm uses theweight
relationship attribute when present and otherwise treats the graph as unweighted.coloring: boolean (default=False)
➡ If set, use the graph coloring heuristic for effective parallelization.min_graph_shrink: integer (default=100000)
➡ The graph coarsening optimization stops upon shrinking the graph to this many nodes.community_alg_threshold: double (default=0.000001)
➡ Controls how long the algorithm iterates. When the gain in modularity goes below the threshold, iteration is over. Valid values are between 0 and 1 (exclusive).coloring_alg_threshold: double (default=0.01)
➡ If coloring is enabled, controls how long the algorithm iterates. When the gain in modularity goes below this threshold, a final iteration is performed using thecommunity_alg_threshold
value. Valid values are between 0 and 1 (exclusive). This parameter's value should be higher thancommunity_alg_threshold
.
Output:
node: Vertex
➡ Graph node.community_id: integer
➡ Community ID. Defaults to if the node does not belong to any community.
Usage:
Use the following query to detect communities:
CALL community_detection.get()
YIELD node, community_id;
get_subgraph()
Computes graph communities over a subgraph using the Louvain method.
Input:
subgraph: Graph
(OPTIONAL) ➡ A specific subgraph, which is an object of type Graph returned by theproject()
function, on which the algorithm is run.subgraph_nodes: List[Node]
➡ List of nodes in the subgraph.subgraph_relationships: List[Relationship]
➡ List of relationships in the subgraph.weight: str (default=null)
➡ Specifies the default relationship weight. If not set, the algorithm uses theweight
relationship attribute when present and otherwise treats the graph as unweighted.coloring: bool (default=False)
➡ If set, use the graph coloring heuristic for effective parallelization.min_graph_shrink: int (default=100000)
➡ The graph coarsening optimization stops upon shrinking the graph to this many nodes.community_alg_threshold: double (default=0.000001)
➡ Controls how long the algorithm iterates. When the gain in modularity goes below the threshold, iteration is over. Valid values are between 0 and 1 (exclusive).coloring_alg_threshold: double (default=0.01)
➡ If coloring is enabled, controls how long the algorithm iterates. When the gain in modularity goes below this threshold, a final iteration is performed using thecommunity_alg_threshold
value. Valid values are between 0 and 1 (exclusive). This parameter's value should be higher thancommunity_alg_threshold
.
Output:
node: Vertex
➡ Graph node.community_id: int
➡ Community ID. Defaults to if the node does not belong to any community.
Usage:
Use the following query to compute communities in a subgraph:
MATCH (a)-[e]-(b)
WITH COLLECT(a) AS nodes, COLLECT (e) AS relationships
CALL community_detection.get_subgraph(nodes, relationships)
YIELD node, community_id;
Example
Database state
The database contains the following data:
Created with the following Cypher queries:
MERGE (a: Node {id: 0}) MERGE (b: Node {id: 1}) CREATE (a)-[r: Relation]->(b);
MERGE (a: Node {id: 0}) MERGE (b: Node {id: 2}) CREATE (a)-[r: Relation]->(b);
MERGE (a: Node {id: 1}) MERGE (b: Node {id: 2}) CREATE (a)-[r: Relation]->(b);
MERGE (a: Node {id: 2}) MERGE (b: Node {id: 3}) CREATE (a)-[r: Relation]->(b);
MERGE (a: Node {id: 3}) MERGE (b: Node {id: 4}) CREATE (a)-[r: Relation]->(b);
MERGE (a: Node {id: 3}) MERGE (b: Node {id: 5}) CREATE (a)-[r: Relation]->(b);
MERGE (a: Node {id: 4}) MERGE (b: Node {id: 5}) CREATE (a)-[r: Relation]->(b);
Detect communities
Get communities using the following query:
CALL community_detection.get()
YIELD node, community_id
RETURN node.id AS node_id, community_id
ORDER BY node_id;
Results show which nodes belong to community 1, and which to community 2:
┌─────────────────────────┬─────────────────────────┐
│ node_id │ community_id │
├─────────────────────────┼─────────────────────────┤
│ 0 │ 1 │
│ 1 │ 1 │
│ 2 │ 1 │
│ 3 │ 2 │
│ 4 │ 2 │
│ 5 │ 2 │
└─────────────────────────┴─────────────────────────┘