Abstract
Hierarchical cluster classification involves partitioning data into a series of groups. The first group consists of n single-member “clusters”, the last consists of a single group with all n individuals. Hierarchical cluster classification could be represented by a diagram known as a dendrogram. Properties and problems of hierarchical clustering techniques are described. The two major types of algorithms that have been used to produce hierarchical classifications are agglomerative and divisive. Careful validation of solutions is a clear requirement in any clustering exercise. An example to illustrate hierarchical methods is presented.
References
- 1 Baker, F. B. & Hubert, L. J. (1976). A graph theoretic approach to goodness-of-fit in complete link hierarchical clustering, Journal of the American Statistical Association 71, 870–878.
- 2 Bruynooghe, M. (1978). Classification ascendante hiérarchique des grand ensembles des données: un algorithm rapide fondé sur la construction des voisinages réductibles, Les Cahiers de L'Analyse des Données 3, 7–33.
- 3 Calinski, T. & Harabasz, J. (1974). A dendrite method for cluster analysis, Communications in Statistics 3, 1–27.
- 4 Cheng, R. & Milligan, G. W. (1995). Mapping influence regions in hierarchical clustering, Multivariate Behavioral Research 30, 547–576.
- 5 Corbet, G. B., Cummins, J., Hedges, S. R. & Krzanowski, W. J. (1970). The taxonomic status of British water voles, genus Arvicola, Journal of Zoology 161, 301–316.
- 6 Coste, J., Spira, A., Ducimetiere, P. & Paolaggi, B. (1991). Clinical and psychological diversity of non-specific low back pain. A new approach towards the classification of clinical subgroups, Journal of Clinical Epidemiology 44, 1233–1245.
- 7 Crawford, R. M. M. & Wishart, D. (1967). A rapid multivariate method for the detection and classification of groups of ecologically related species, Journal of Ecology 55, 505–524.
- 8 Cunningham, K. M. & Ogilvie, J. C. (1972). Evaluation of hierarchical grouping techniques: a preliminary study, Computer Journal 15, 209–213.
- 9 Day, W. H. E. & Edelsbrunner, H. (1984). Efficient algorithms for agglomerative hierarchical clustering methods, Journal of Classification 1, 7–24.
- 10 Duda, R. O. & Hart, P. E. (1973). Pattern Classification and Scene Analysis. Wiley, New York.
- 11 Duflou, H., Maenhaut, W. & De Reuck, J. (1990). Application of principal component and cluster analysis to the study of the distribution of minor and trace elements in normal human brain, Chemometrics and Intelligent Laboratory Systems 9, 273–286.
- 12 Edwards, A. W. F. & Cavalli-Sforza, L. L. (1965). A method for cluster analysis, Biometrics 21, 363–375.
- 13 Everitt, B. S. (1993). Cluster Analysis. Arnold, London.
- 14
Florek, K.,
Lukaszewicz, J.,
Perkal, J.,
Steinhaus, H. &
Zubrzycki, S.
(1951).
Sur la liason et la division des points d'un ensemble fini,
Colloquium Mathematicum
2,
282–285.
10.4064/cm-2-3-4-282-285 Google Scholar
- 15 Fillenbaum, S. & Rapoport, A. (1971). Structures in the Subjective Lexicon. Academic Press, New York.
- 16 Gordon, A. D. (1980). Classification. Chapman & Hall, London.
- 17 Gordon, A. D. (1987). A review of hierarchical classification, Journal of the Royal Statistical Society, Series A 150, 119–137.
- 18
Gordon, A. D.
(1996).
Hierarchical classification, in
Clustering and Classification,
P. Arabie,
L. J. Hubert &
G. Soete, Eds.
World Scientific Publications,
River Edge.
10.1142/9789812832153_0003 Google Scholar
- 19 Gower, J. C. (1967). A comparison of some methods of cluster analysis, Biometrics 23, 623–628.
- 20 Gower, J. C. (1975). Goodness-of-fit criteria for classification and other patterned structures, in Proceedings of the Eighth International Conference on Numerical Taxonomy, pp. 38–62.
- 21 Hands, S. & Everitt, B. S. (1987). A Monte Carlo study of the recovery of cluster structure in binary data by hierarchical clustering techniques, Multivariate Behavioural Research 22, 235–243.
- 22 Hubert, L. (1973). Monotone invariant clustering procedures, Psychometrika 38, 47–62.
- 23 Jambu, M. (1978). Classification Automatique pour L'Analyse des Données, Tome 1. Dunod, Paris.
- 24 Jardine, N. & Sibson, R. (1971)., Mathematical Taxonomy. Wiley, London.
- 25 Johnson, S. C. (1967). Hierarchical clustering schemes, Psychometrika 32, 241–254.
- 26 Kuiper, F. K. & Fisher, L. (1975). A Monte Carlo comparison of six clustering procedures, Biometrics 31, 777–783.
- 27 Lance, G. N. & Williams, W. T. (1966). A generalized sorting strategy for computer classifications, Nature 212, 218.
- 28 Lance, G. N. & Williams, W. T. (1967). A general theory of classificatory sorting strategies. I. Hierarchical systems, Computer Journal 9, 373–380.
- 29 Lapointe, F. J. & Legendre, P. (1991). The generation of random ultrametric matrices representing dendrograms, Journal of Classification 8, 177–200.
- 30 Ling, R. F. (1973). Probability theory of cluster analysis, Journal of the American Statistical Association 68, 159–164.
- 31 Macnaughton-Smith, P., Williams, W. T., Dale, M. B. & Mockett, L. G. (1964). Dissimilarity analysis: a new technique of hierarchical sub-division, Nature 202, 1034–1035.
- 32 McQuitty, L. L. (1960). Hierarchical linkage analysis for the isolation of types, Educational and Psychological Measurement 20, 55–67.
- 33 McQuitty, L. L. (1966). Similarity analysis by reciprocal pairs for discrete and continuous data, Educational and Psychological Measurement 25, 825–831.
- 34 McQuitty, L. L. (1967). Expansion of similarity analysis by reciprocal pairs for discrete and continuous data, Educational and Psychological Measurement 27, 253–255.
- 35 Milligan, G. W. (1981). A review of Monte Carlo tests of cluster analysis, Multivariate Behavioural Research 16, 379–407.
- 36 Milligan, G. W. & Cooper, M. C. (1985). An examination of procedures for determining the number of clusters in a data set, Psychometrika 50, 159–179.
- 37 Mojena, R. (1977). Hierarchical grouping methods and stopping rules: an evaluation, Computer Journal 20, 359–363.
- 38 Murtagh, F. (1985). Multidimensional Clustering Algorithms. COMPSTAT Lectures 4. Physica-Verlag, Vienna.
- 39 Rohlf, F. J. (1970). Adaptive hierarchical clustering schemes, Systematic Zoology 19, 58–82.
- 40 Rohlf, F. J. & Fisher, D. R. (1968). Test for hierarchical structure in random data sets, Systematic Zoology 17, 407–412.
- 41 Scott, A. J. & Symon, M. J. (1971). On the Edwards-Cavalli-Sforza method of cluster analysis, Biometrics 27, 217–219.
- 42 Sneath, P. H. A. (1957). The application of computers to taxonomy, Journal of General Microbiology 17, 201–226.
- 43 Sokal, R. R. & Michener, C. D. (1958). A statistical method for evaluating systematic relationships, University of Kansas Science Bulletin 38, 1409–1438.
- 44 Vichi, M. (1985). On a flexible and computationally feasible divisive clustering technique, Rivista di Statistica Applicata 18, 199–208.
- 45 Ward, J. H. (1963). Hierarchical grouping to optimize an objective function, Journal of the American Statistical Association 58, 236–244.
- 46 Wastell, D. G. & Gray, R. (1987). The numerical approach to classification: a medical application to develop a typology of facial pain, Statistics in Medicine 6, 137–164.
- 47 Weide, B. (1977). A survey of analysis techniques for discrete algorithms, ACM Computer Survey 9, 291–313.
- 48 Williams, W. T. & Lambert, J. M. (1959). Multivariate methods in plant ecology I. Association analysis in plant communities, Journal of Ecology 47, 83–101.
- 49 Williams, W. T., Lance, G. N., Dale, M. B. & Clifford, H. T. (1971). Controversy concerning the criteria for taxonometric strategies, Computer Journal 14, 162–165.
- 50 Wishart, D. (1969). An algorithm for hierarchical classifications, Biometrics 25, 165–170.
- 51 Wong, M. A. (1982). A hybrid clustering method for identifying high-density clusters, Journal of the American Statistical Association 77, 841–847.
- 52 Wong, M. A. & Lane, T. (1983). A kth nearest neighbour clustering procedure, Journal of the Royal Statistical Society, Series B 45, 362–368.