Skip to main content

node_similarity

docs-source

Abstract#

If we're interested in how similar two nodes in a graph are, we'll want to get a numerical value that represents the node similarity between those two nodes. There are many node similarity measures and currently this module contains the following:

  • cosine similarity
  • Jaccard similarity
  • overlap similarity

The cosine similarity is computed using the following formula:

The Jaccard similarity is computed using the following formula:

The overalap similarity is computed using the following formula:

Set A represents all neighbors of one node, set B represents all neighbors of the other node. In all the given formulas, the numerator is the cardinality of the intersection of set A and set B (in other words, the cardinality of the common neighbors set). The denominator differs but requires the cardinality of sets A and B in some way.

The algorithm is implemented so that it ignores whether the graph is directed or undirected and treats the edges as if they were undirected. It also ignores multiple edges between two nodes and treats them as if there were only one edge.

TraitValue
Module typealgorithm
ImplementationPython
Graph directionundirected
Edge weightsunweighted
Parallelismsequential

Procedures#

cosine(node1, node2, mode)#

Input:#

  • node1: Union[Vertex, Tuple[Vertex] ➡ The first node or a tuple of nodes.
  • node2: Union[Vertex, Tuple[Vertex]] ➡ The second node or a tuple of nodes.
  • mode: str("cartesian") ➡ If the given arguments are tuples, this argument determines whether to calculate the similarity between nodes pairwise ("pairwise") or calculate the similarity between one and each node ("cartesian"). The default value is "cartesian".

Output:#

  • node1: Vertex ➡ The first node.
  • node2: Vertex ➡ The second node.
  • similarity: float ➡ The cosine similarity between the first and the second node.

Usage:#

MATCH (m)WITH COLLECT(m) AS nodes1MATCH (n)WITH COLLECT(n) AS nodes2, nodes1CALL node_similarity.cosine(nodes1, nodes2) YIELD node1, node2, similarityRETURN node1, node2, similarity

jaccard(node1, node2, mode)#

Input:#

  • node1: Union[Vertex, Tuple[Vertex] ➡ The first node or a tuple of nodes.
  • node2: Union[Vertex, Tuple[Vertex]] ➡ The second node or a tuple of nodes.
  • mode: str("cartesian") ➡ If the given arguments are tuples, this argument determines whether to calculate the similarity between nodes pairwise ("pairwise") or calculate the similarity between one and each node ("cartesian"). The default value is "cartesian".

Output:#

  • node1: Vertex ➡ The first node.
  • node2: Vertex ➡ The second node.
  • similarity: float ➡ The Jaccard similarity between the first and the second node.

Usage:#

MATCH (m)WITH COLLECT(m) AS nodes1MATCH (n)WITH COLLECT(n) AS nodes2, nodes1CALL node_similarity.jaccard(nodes1, nodes2, "cartesian") YIELD node1, node2, similarityRETURN node1, node2, similarity

overlap(node1, node2, mode)#

Input:#

  • node1: Union[Vertex, Tuple[Vertex] ➡ The first node or a tuple of nodes.
  • node2: Union[Vertex, Tuple[Vertex]] ➡ The second node or a tuple of nodes.
  • mode: str("cartesian") ➡ If the given arguments are tuples, this argument determines whether to calculate the similarity between nodes pairwise ("pairwise") or calculate the similarity between one and each node ("cartesian"). The default value is "cartesian".

Output:#

  • node1: Vertex ➡ The first node.
  • node2: Vertex ➡ The second node.
  • similarity: float ➡ The overlap similarity between the first and the second node.

Usage:#

MATCH (m)WITH COLLECT(m) AS nodes1MATCH (n)WITH COLLECT(n) AS nodes2, nodes1CALL node_similarity.overlap(nodes1, nodes2, "pairwise") YIELD node1, node2, similarityRETURN node1, node2, similarity

Example - cosine similarity#

Example - Jaccard similarity#

Example - overlap similarity#