# pagerank_online

## Abstract

**Online PageRank** is a streaming algorithm made for calculating
PageRank in a graph streaming scenario. Incremental- local
changes are introduced in the algorithm to prevent users from recalculating
PageRank values each time a change occurs in the graph (something is added or
deleted).

To make it as fast as possible, the online algorithm is only the approximation
of PageRank but carrying the same information - the likelihood of random walk
ending in a particular vertex. The work is based on "Fast Incremental and
Personalized
PageRank"
^{1}, where authors are deeply focused on providing the streaming experience of
a highly popular graph algorithm.

Approximating PageRank is done simply by exploring random walks and calculating
the frequency of a node within all walks. `R`

walks are sampled by using a
random walk with a stopping probability of `eps`

.Therefore, on average, walks
would have a length of `1/eps`

. Approximative PageRank is based on the formula
below:

`RankApprox(v) = X_v / (n * R / eps)`

Where `X_v`

is the number of walks where the node `v`

appears. The theorem
written in the paper explains that RankApprox(v) is sharply concentrated around
its expectation, which is Rank(v).

### Usage

Online PageRank should be used in a specific way. To set the parameters, the
user should call a `set()`

procedure. This function also sets the context of a
streaming algorithm. `get()`

function only returns the resulting values stored
in a cache. Therefore, if you try to get values before first calling `set()`

,
the run will fail with a proper message.

To make the incremental flow, you should set the proper trigger. For that, we'll
use the `update()`

function:

`CREATE TRIGGER pagerank_trigger`

(BEFORE | AFTER) COMMIT

EXECUTE CALL pagerank_online.update(createdVertices, createdEdges, deletedVertices, deletedEdges) YIELD *

SET node.rank = rank;

Finally, the `reset()`

function resets the context and enables the user to start
new runs.

^{1} Fast Incremental and Personalized
PageRank,
Bahman Bahmani et al.

Trait | Value |
---|---|

Module type | algorithm |

Implementation | C++ |

Graph direction | directed |

Edge weights | unweighted |

Parallelism | sequential |

## Procedures

If you want to execute this algorithm on graph projections, subgraphs or portions of the graph, be sure to check out the guide on How to run a MAGE module on subgraphs.

`set(walks_per_node, walk_stop_epsilon)`

#### Input:

`walks_per_node: integer (default=10)`

➡ Number of sampled walks per node.`walk_stop_epsilon: double (default=0.1)`

➡ The probability of stopping when deriving the random walk. On average, it will create walks of length`1 / walk_stop_epsilon`

.

#### Output:

`node`

➡ Node in the graph, for which PageRank is calculated.`rank`

➡ Normalized ranking of a node. Expresses the probability that a random surfer will finish in a certain node by a random walk.

#### Usage:

`CALL pagerank.set(100, 0.2)`

YIELD node, rank;

`get(max_iterations, damping_factor, stop_epsilon)`

* This should be used if the trigger has been set or a `set`

function has
been called before adding changes to the graph.

#### Output:

`node`

➡ Node in the graph, for which PageRank is calculated.`rank`

➡ Normalized ranking of a node. Expresses the probability that a random surfer will finish in a certain node by a random walk.

#### Usage:

`CALL pagerank.get()`

YIELD node, rank;

`update(created_vertices, created_edges, deleted_vertices, deleted_edges)`

#### Input:

`created_vertices`

➡ Vertices that were created in the last transaction.`created_edges`

➡ Edges created in a period from the last transaction.`deleted_vertices`

➡ Vertices deleted from the last transaction.`deleted_edges`

➡ Edges deleted from the last transaction.

#### Output:

`node`

➡ Node in the graph, for which PageRank is calculated.`rank`

➡ Normalized ranking of a node. Expresses the probability that a random surfer will finish in a certain node by a random walk.

#### Usage:

`CREATE TRIGGER pagerank_trigger`

(BEFORE | AFTER) COMMIT

EXECUTE CALL pagerank_online.update(createdVertices, createdEdges, deletedVertices, deletedEdges) YIELD *

SET node.rank = rank;

## Example

- Step 1: Input graph
- Step 2: Set parameters and trigger
- Step 3: Load commands
- Step 4: Running command
- Step 5: Results

`CALL pagerank_online.set(100, 0.2) YIELD *;`

CREATE TRIGGER pagerank_trigger

BEFORE COMMIT

EXECUTE CALL pagerank_online.update(createdVertices, createdEdges, deletedVertices, deletedEdges) YIELD *

SET node.rank = rank;

`MERGE (a:Node {id: 0}) MERGE (b:Node {id: 1}) CREATE (a)-[:RELATION]->(b);`

MERGE (a:Node {id: 1}) MERGE (b:Node {id: 2}) CREATE (a)-[:RELATION]->(b);

MERGE (a:Node {id: 2}) MERGE (b:Node {id: 0}) CREATE (a)-[:RELATION]->(b);

MERGE (a:Node {id: 3}) MERGE (b:Node {id: 3}) CREATE (a)-[:RELATION]->(b);

MERGE (a:Node {id: 3}) MERGE (b:Node {id: 4}) CREATE (a)-[:RELATION]->(b);

MERGE (a:Node {id: 3}) MERGE (b:Node {id: 5}) CREATE (a)-[:RELATION]->(b);

MERGE (a:Node {id: 4}) MERGE (b:Node {id: 6}) CREATE (a)-[:RELATION]->(b);

`MATCH (node)`

RETURN node.id AS node_id, node.rank AS rank;

`+-----------+-----------+`

| node_id | rank |

+-----------+-----------+

| 0 | 0.225225 |

| 1 | 0.225225 |

| 2 | 0.225225 |

| 3 | 0.0675676 |

| 4 | 0.0765766 |

| 5 | 0.0585586 |

| 6 | 0.121622 |

+-----------+-----------+