Chapter 6

Data Mining Algorithms I: Clustering

Dan A. Simovici

Dan A. Simovici

Department of Mathematics and Computer Science, University of Massachusetts at Boston, Boston, MA 02125, USA

Search for more papers by this author
First published: 01 March 2007
Citations: 4

Summary

Clustering is the process of grouping together objects that are similar. The similarity between objects is evaluated by using a several types of dissimilarities (particularly, metrics and ultrametrics). After discussing partitions and dissimilarities, two basic mathematical concepts important for clustering, we focus on ultrametric spaces that play a vital role in hierarchical clustering. Several types of agglomerative hierarchical clustering are examined with special attention to the single-link and complete link clusterings. Among the nonhierarchical algorithms we present the k-means and the PAM algorithm. The well-known impossibility theorem of Kleinberg is included in order to illustrate the limitations of clustering algorithms. Finally, modalities of evaluating clustering quality are examined.

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.