![]() |
Abstract:
A fundamental machine learning problem consists of clustering data by identifying the subset of exemplars or prototypes that maximizes the net similarity of data points to their exemplars. While many practical algorithms have been proposed for approximately solving this NP-hard problem, linear programming bounds indicate that for several tasks, there is substantial room for improving the net similarity. In this talk, I will present results for five readily-available data sets comparing the performance of the recently-introduced affinity propagation algorithm to other methods. The data sets include 400 Olivetti face images, 2000 HIV genomes, 8000 UCI mushroom feature vectors, 11000 USPS handwritten digits, and 17770 Netflix user-rated movies. The other methods include k-centers clustering, k-means clustering, the EM algorithm, Dasgupta and Schulman's k-log-k pruning technique, and hierarchical clustering.