Skip to main content

pagerank_online

docs-source

Abstract

Online PageRank is a streaming algorithm made for calculating PageRank in a graph streaming scenario. Incremental- local changes are introduced in the algorithm to prevent users from recalculating PageRank values each time a change occurs in the graph (something is added or deleted).

To make it as fast as possible, the online algorithm is only the approximation of PageRank but carrying the same information - the likelihood of random walk ending in a particular vertex. The work is based on "Fast Incremental and Personalized PageRank" 1, where authors are deeply focused on providing the streaming experience of a highly popular graph algorithm.

Approximating PageRank is done simply by exploring random walks and calculating the frequency of a node within all walks. R walks are sampled by using a random walk with a stopping probability of eps.Therefore, on average, walks would have a length of 1/eps. Approximative PageRank is based on the formula below:

RankApprox(v) = X_v / (n * R / eps)

Where X_v is the number of walks where the node v appears. The theorem written in the paper explains that RankApprox(v) is sharply concentrated around its expectation, which is Rank(v).

Usage

Online PageRank should be used in a specific way. To set the parameters, the user should call a set() procedure. This function also sets the context of a streaming algorithm. get() function only returns the resulting values stored in a cache. Therefore, if you try to get values before first calling set(), the run will fail with a proper message.

To make the incremental flow, you should set the proper trigger. For that, we'll use the update() function:

CREATE TRIGGER pagerank_trigger
(BEFORE | AFTER) COMMIT
EXECUTE CALL pagerank_online.update(createdVertices, createdEdges, deletedVertices, deletedEdges) YIELD *
SET node.rank = rank;

Finally, the reset() function resets the context and enables the user to start new runs.

1 Fast Incremental and Personalized PageRank, Bahman Bahmani et al.

TraitValue
Module typealgorithm
ImplementationC++
Graph directiondirected
Edge weightsunweighted
Parallelismsequential

Procedures

set(walks_per_node, walk_stop_epsilon)

Input:

  • walks_per_node: int(10) ➡ Number of sampled walks per node.
  • walk_stop_epsilon: double(0.1) ➡ The probability of stopping when deriving the random walk. On average, it will create walks of length 1 / walk_stop_epsilon.

Output:

  • node ➡ Node in the graph, for which PageRank is calculated.
  • rank ➡ Normalized ranking of a node. Expresses the probability that a random surfer will finish in a certain node by a random walk.

Usage:

CALL pagerank.set(100, 0.2)
YIELD node, rank;

get(max_iterations, damping_factor, stop_epsilon)

* This should be used if the trigger has been set or a set function has been called before adding changes to the graph.

Output:

  • node ➡ Node in the graph, for which PageRank is calculated.
  • rank ➡ Normalized ranking of a node. Expresses the probability that a random surfer will finish in a certain node by a random walk.

Usage:

CALL pagerank.get()
YIELD node, rank;

update(created_vertices, created_edges, deleted_vertices, deleted_edges)

Input:

  • created_vertices ➡ Vertices that were created in the last transaction.
  • created_edges ➡ Edges created in a period from the last transaction.
  • deleted_vertices ➡ Vertices deleted from the last transaction.
  • deleted_edges ➡ Edges deleted from the last transaction.

Output:

  • node ➡ Node in the graph, for which PageRank is calculated.
  • rank ➡ Normalized ranking of a node. Expresses the probability that a random surfer will finish in a certain node by a random walk.

Usage:

CREATE TRIGGER pagerank_trigger
(BEFORE | AFTER) COMMIT
EXECUTE CALL pagerank_online.update(createdVertices, createdEdges, deletedVertices, deletedEdges) YIELD *
SET node.rank = rank;

Example