K-means algorithm

From Freepedia

(Redirected from K-means)

The K-means algorithm is an algorithm to cluster objects based on attributes into k partitions. It is a variant of the expectation-maximization algorithm in which the goal is to determine the k means of data generated from gaussian distributions. It assumes that the object attributes form a vector space. The objective it tries to achieve is to minimize total intra-cluster variance, or, the function

<math>V = \sum_{i=1}^{k} \sum_{j \in S_i} |x_j - \mu_i |^2 </math>

where there are <math>k</math> clusters <math>S_i</math>, <math>i = 1, 2, ..., k</math> and <math>\mu_i</math> is the centroid or mean point of all the points <math>x_j \in S_i</math>.

The algorithm starts by partitioning the input points into k initial sets, either at random or using some heuristic. It then calculates the mean point, or centroid, of each set. It constructs a new partition by associating each point with the closest centroid. Then the centroids are recalculated for the new clusters, and algorithm repeated by alternate application of these two steps until convergence, which is obtained when the points no longer switch clusters (or alternatively centroids are no longer changed).

Although the algorithm must always converge, there is no known limit on the number of iterations required. An implementation may choose to stop the algorithm after a certain number of iterations. The convergence is not guaranteed to yield a global optimum. The quality of the final solution depends largely on the initial set of clusters, and may, in practice, be much poorer than the global optimum.

Another main drawback of the algorithm is that it has to be told the number of clusters(i.e. k) to find. If the data is not naturally clustered, you get some strange results. Also, the algorithm works well only when spherical clusters are naturally available in data.

References

J. B. MacQueen (1967): "Some Methods for classification and Analysis of Multivariate Observations, Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability", Berkeley, University of California Press, 1:281-297

External links




Views
Personal tools
In other languages
Similar Links