Trophy iconTrophy iconClose icon
Tutorials

It's the most wonderful time of the year - Dynamic PageRank and a Twitter Network

by
Josip Matak
Blog hero image

Introduction

Merry Christmas!” Words that we read countless times on social networks at the most wonderful time of the year. Christmas brings with it an unprecedented madness of social networks with posts of greetings, wishes, songs, pictures - everything related to Christmas and the holidays.

Twitter is no exception. Everything is ready for Christmas madness. To cheer you up, Memgraph MAGE has prepared a Christmas analysis. In such a dynamic environment, our goal is to find the most “Christmassy” person of all!

Model it till you make it

We must agree to the conditions by which we choose the most Christmassy person. Interactions on Twitter mostly take place through commenting and retweeting. Let us choose the latter as our choice.

If we search for “#Christmas” in the Twitter search engine, we will get thousands of different posts. It will be our base for a Christmas community. Our winner definitely needs to be a user from this community!

Then, if we take the retweeting of #Christmas tweets as an interaction, we can make a graph of the Christmas community in which users share Christmas announcements with each other. The graph is simple, precisely as it should be for the analysis. There is only one entity - USER, which has a link to another if it retweeted its post.

memgraph-tutorial-twitter-pagerank-graph-schema

Image 1. Graph schema

The most Christmassy user

The Christmas spirit lasts through the holidays, and new data, in the form of retweets, arrives every moment. To deal with this we need a streaming platform for graph analysis. Memgraph will serve well, and as previously announced, MAGE has prepared a tool to calculate the winner.

PageRank is the perfect tool with which we can find the most Christmassy person. The person that is going to win is the one that got retweeted by other Christmassy people. As data comes at high speed, fortunately, MAGE has upgraded the good old PageRank in its garage and prepared a dynamic version - Dynamic Pagerank.

memgraph-tutorial-twitter-dynamic-pagerank

Image 2. Dynamic Pagerank

We will use the dynamic implementation because of its advantages and adaptability to streaming data. Here are just a few:

  1. Changes are local - to obtain a new PageRank value, it is not necessary to execute the algorithm from scratch but only around the entities where the changes occurred
  2. The update is faster than a full re-run of a static algorithm
  3. Current data is saved and available in O(1) time.

Dynamic PageRank was implemented as part of the MAGE repository in Memgraph. After installing the needed libraries, it is necessary to set a trigger that will calculate the new value in a timely manner, upon the entities arrival in the graph.

CREATE TRIGGER pagerank_online 
BEFORE COMMIT 
EXECUTE CALL pagerank_online.update(createdVertices, createdEdges, deletedVertices, deletedEdges) 
YIELD *
SET node.pagerank = rank;


CALL pagerank_online.set(100, 0.2) YIELD *;

This way, we ensure that proper data structures that keep the state of the algorithm are set, and newly added data will be recalculated and their pagerank property will be set.

And the winner is…

A stream of retweets arrived in our network for several hours, and we waited to have a satisfactory amount of data. So after a few hours, and processed N retweets, the results are as follows:

memgraph-tutorial-twitter-dynamic-pagerank-visualization

Image 3. Dynamic Pagerank in Memgraph Lab

We will use this method to retrieve PageRank properties from the data and then rank and limit them.

MATCH (n) 
RETURN n, n.pagerank AS rank
ORDER BY rank DESC
LIMIT 10;

memgraph-tutorial-twitter-dynamic-pagerank-users

Image 4. Twitter users ordered by PageRank

These people take Christmas seriously. We wish them all the best as well as you, the reader of this article. 🎄🎅🎁

Conclusion

This article describes the idea of how a smart algorithm can spare a large number of computations and deliver valuable insights into data faster than before.

Our team of engineers is currently tackling the problem of graph analytics algorithms on real-time data. If you want to discuss how to apply online/streaming algorithms on connected data, feel free to join our Discord server and message us.

MAGE shares his wisdom on a Twitter account. Get to know him better by following him 🐦

memgraph-pagerank-tutorial-mage-twitter

Last but not least, check out MAGE and don’t hesitate to give a star ⭐ or contribute with new ideas.

Table of Contents
Sign up for our Newsletter

Continue Reading