Link prediction is the process of predicting the probability of connecting the two nodes that were not previously connected in a graph. A wide range of different solutions can be applied to such a problem.
The problem is of great importance in recommendation systems, co-authorship prediction, drug discovery, and many, many more.
Solving methods range from traditional to machine learning-based. Traditional methods are mostly based either on neighborhood similarity. A link between two nodes is more likely to exist if there are many common neighbors. These are:
On the other hand, such a prediction can be learned from the node endpoints by looking at similarity metrics. The more similar the nodes are, the greater the likelihood of connectivity between them. Cosine similarity and Euclidean distance are one example of such.
Then, there are the most advanced models so far and they are based on graph embeddings, which serve as features for further classification tasks. Likewise, it is possible to drive graph neural network (GNN) methods for the same task.
Predicted relationships within the certain community
Unfortunately, Link Prediction is yet not implemented within the project MAGE. Be sure to raise the issue on the GitHub repo and ping us to speed up the development. ☝️
Our little magician Memgraph MAGE has recently received one more spell - the Online Node2Vec algorithm. Since he is still too scared to use it, you, as a brave spirit, will step up and use it on a real challenge to show MAGE how it’s done. This challenge includes building an Online Recommendation System using k-means clustering and the newborn spell - Online Node2Vec algorithm.
When discovering new drugs, methods of link prediction can help to determine the behavior of the new drug. Since there exists a database of already known side effects and reactions with other drugs, this knowledge can be used to predict newly constructed ones.
One of the most obvious tasks of link prediction is predicting new friendships/followers on social networks. This concept is based on sharing the same entities on the graph, whether they are friends, interests, or things you follow.