Skip to main content

K-means Clustering


The k-means algorithm clusters given data by trying to separate samples in n groups of equal variance by minimizing the criterion known as within-the-cluster sum-of-squares. The algorithm requires the number of clusters to be specified beforehand. In other words, the algorithm doesn’t compute the optimal number of clusters to use, but leaves that task to the user. It scales well to large numbers of samples, and it has been applied in a wide range of fields such as recommendation engines, fraud detection and market research.

k-means clustering




The K-means algorithm is implemented within the MAGE project. Be sure to check it out in the link above.

Use cases​


Clustering is heavily used in social network analysis to discover user communities, find similarities between them and much more.