Cluster Analysis of Subjects, Hierarchical Methods

Hierarchical cluster classification involves partitioning data into a series of groups. The first group consists of n single-member “clusters”, the last consists of a single group with all n individuals. Hierarchical cluster classification could be represented by a diagram known as a dendrogram. Properties and problems of hierarchical clustering techniques are described. The two major types of algorithms that have been used to produce hierarchical classifications are agglomerative and divisive. Careful validation of solutions is a clear requirement in any clustering exercise. An example to illustrate hierarchical methods is presented.

References

1 Baker, F. B. & Hubert, L. J. (1976). A graph theoretic approach to goodness-of-fit in complete link hierarchical clustering, Journal of the American Statistical Association 71, 870–878.
Web of Science® Google Scholar
2 Bruynooghe, M. (1978). Classification ascendante hiérarchique des grand ensembles des données: un algorithm rapide fondé sur la construction des voisinages réductibles, Les Cahiers de L'Analyse des Données 3, 7–33.
Google Scholar
3 Calinski, T. & Harabasz, J. (1974). A dendrite method for cluster analysis, Communications in Statistics 3, 1–27.
Google Scholar
4 Cheng, R. & Milligan, G. W. (1995). Mapping influence regions in hierarchical clustering, Multivariate Behavioral Research 30, 547–576.
10.1207/s15327906mbr3004_5
PubMed Web of Science® Google Scholar
5 Corbet, G. B., Cummins, J., Hedges, S. R. & Krzanowski, W. J. (1970). The taxonomic status of British water voles, genus Arvicola, Journal of Zoology 161, 301–316.
10.1111/j.1469-7998.1970.tb04515.x
Web of Science® Google Scholar
6 Coste, J., Spira, A., Ducimetiere, P. & Paolaggi, B. (1991). Clinical and psychological diversity of non-specific low back pain. A new approach towards the classification of clinical subgroups, Journal of Clinical Epidemiology 44, 1233–1245.
10.1016/0895-4356(91)90156-4
CAS PubMed Web of Science® Google Scholar
7 Crawford, R. M. M. & Wishart, D. (1967). A rapid multivariate method for the detection and classification of groups of ecologically related species, Journal of Ecology 55, 505–524.
10.2307/2257890
Web of Science® Google Scholar
8 Cunningham, K. M. & Ogilvie, J. C. (1972). Evaluation of hierarchical grouping techniques: a preliminary study, Computer Journal 15, 209–213.
10.1093/comjnl/15.3.209
Web of Science® Google Scholar
9 Day, W. H. E. & Edelsbrunner, H. (1984). Efficient algorithms for agglomerative hierarchical clustering methods, Journal of Classification 1, 7–24.
10.1007/BF01890115
Web of Science® Google Scholar
10 Duda, R. O. & Hart, P. E. (1973). Pattern Classification and Scene Analysis. Wiley, New York.
Google Scholar
11 Duflou, H., Maenhaut, W. & De Reuck, J. (1990). Application of principal component and cluster analysis to the study of the distribution of minor and trace elements in normal human brain, Chemometrics and Intelligent Laboratory Systems 9, 273–286.
10.1016/0169-7439(90)80078-K
CAS Web of Science® Google Scholar
12 Edwards, A. W. F. & Cavalli-Sforza, L. L. (1965). A method for cluster analysis, Biometrics 21, 363–375.
10.2307/2528096
Web of Science® Google Scholar
13 Everitt, B. S. (1993). Cluster Analysis. Arnold, London.
10.1016/0956-716X(93)90030-V
Web of Science® Google Scholar
14 Florek, K., Lukaszewicz, J., Perkal, J., Steinhaus, H. & Zubrzycki, S. (1951). Sur la liason et la division des points d'un ensemble fini, Colloquium Mathematicum 2, 282–285.
10.4064/cm-2-3-4-282-285
Google Scholar
15 Fillenbaum, S. & Rapoport, A. (1971). Structures in the Subjective Lexicon. Academic Press, New York.
Google Scholar
16 Gordon, A. D. (1980). Classification. Chapman & Hall, London.
Google Scholar
17 Gordon, A. D. (1987). A review of hierarchical classification, Journal of the Royal Statistical Society, Series A 150, 119–137.
10.2307/2981629
Web of Science® Google Scholar
18 Gordon, A. D. (1996). Hierarchical classification, in Clustering and Classification, P. Arabie, L. J. Hubert & G. Soete, Eds. World Scientific Publications, River Edge.
10.1142/9789812832153_0003
Google Scholar
19 Gower, J. C. (1967). A comparison of some methods of cluster analysis, Biometrics 23, 623–628.
10.2307/2528417
CAS PubMed Web of Science® Google Scholar
20 Gower, J. C. (1975). Goodness-of-fit criteria for classification and other patterned structures, in Proceedings of the Eighth International Conference on Numerical Taxonomy, pp. 38–62.
Google Scholar
21 Hands, S. & Everitt, B. S. (1987). A Monte Carlo study of the recovery of cluster structure in binary data by hierarchical clustering techniques, Multivariate Behavioural Research 22, 235–243.
10.1207/s15327906mbr2202_6
CAS PubMed Web of Science® Google Scholar
22 Hubert, L. (1973). Monotone invariant clustering procedures, Psychometrika 38, 47–62.
10.1007/BF02291173
Web of Science® Google Scholar
23 Jambu, M. (1978). Classification Automatique pour L'Analyse des Données, Tome 1. Dunod, Paris.
Google Scholar
24 Jardine, N. & Sibson, R. (1971)., Mathematical Taxonomy. Wiley, London.
Google Scholar
25 Johnson, S. C. (1967). Hierarchical clustering schemes, Psychometrika 32, 241–254.
10.1007/BF02289588
CAS PubMed Web of Science® Google Scholar
26 Kuiper, F. K. & Fisher, L. (1975). A Monte Carlo comparison of six clustering procedures, Biometrics 31, 777–783.
10.2307/2529565
Web of Science® Google Scholar
27 Lance, G. N. & Williams, W. T. (1966). A generalized sorting strategy for computer classifications, Nature 212, 218.
10.1038/212218a0
Web of Science® Google Scholar
28 Lance, G. N. & Williams, W. T. (1967). A general theory of classificatory sorting strategies. I. Hierarchical systems, Computer Journal 9, 373–380.
10.1093/comjnl/9.4.373
Web of Science® Google Scholar
29 Lapointe, F. J. & Legendre, P. (1991). The generation of random ultrametric matrices representing dendrograms, Journal of Classification 8, 177–200.
10.1007/BF02616238
Web of Science® Google Scholar
30 Ling, R. F. (1973). Probability theory of cluster analysis, Journal of the American Statistical Association 68, 159–164.
Web of Science® Google Scholar
31 Macnaughton-Smith, P., Williams, W. T., Dale, M. B. & Mockett, L. G. (1964). Dissimilarity analysis: a new technique of hierarchical sub-division, Nature 202, 1034–1035.
10.1038/2021034a0
CAS PubMed Web of Science® Google Scholar
32 McQuitty, L. L. (1960). Hierarchical linkage analysis for the isolation of types, Educational and Psychological Measurement 20, 55–67.
10.1177/001316446002000106
Web of Science® Google Scholar
33 McQuitty, L. L. (1966). Similarity analysis by reciprocal pairs for discrete and continuous data, Educational and Psychological Measurement 25, 825–831.
10.1177/001316446602600402
Web of Science® Google Scholar
34 McQuitty, L. L. (1967). Expansion of similarity analysis by reciprocal pairs for discrete and continuous data, Educational and Psychological Measurement 27, 253–255.
10.1177/001316446702700202
Web of Science® Google Scholar
35 Milligan, G. W. (1981). A review of Monte Carlo tests of cluster analysis, Multivariate Behavioural Research 16, 379–407.
10.1207/s15327906mbr1603_7
CAS PubMed Web of Science® Google Scholar
36 Milligan, G. W. & Cooper, M. C. (1985). An examination of procedures for determining the number of clusters in a data set, Psychometrika 50, 159–179.
10.1007/BF02294245
Web of Science® Google Scholar
37 Mojena, R. (1977). Hierarchical grouping methods and stopping rules: an evaluation, Computer Journal 20, 359–363.
10.1093/comjnl/20.4.359
Web of Science® Google Scholar
38 Murtagh, F. (1985). Multidimensional Clustering Algorithms. COMPSTAT Lectures 4. Physica-Verlag, Vienna.
Google Scholar
39 Rohlf, F. J. (1970). Adaptive hierarchical clustering schemes, Systematic Zoology 19, 58–82.
10.2307/2412027
Web of Science® Google Scholar
40 Rohlf, F. J. & Fisher, D. R. (1968). Test for hierarchical structure in random data sets, Systematic Zoology 17, 407–412.
10.2307/2412038
Web of Science® Google Scholar
41 Scott, A. J. & Symon, M. J. (1971). On the Edwards-Cavalli-Sforza method of cluster analysis, Biometrics 27, 217–219.
10.2307/2528940
Web of Science® Google Scholar
42 Sneath, P. H. A. (1957). The application of computers to taxonomy, Journal of General Microbiology 17, 201–226.
10.1099/00221287-17-1-201
CAS PubMed Web of Science® Google Scholar
43 Sokal, R. R. & Michener, C. D. (1958). A statistical method for evaluating systematic relationships, University of Kansas Science Bulletin 38, 1409–1438.
Web of Science® Google Scholar
44 Vichi, M. (1985). On a flexible and computationally feasible divisive clustering technique, Rivista di Statistica Applicata 18, 199–208.
Google Scholar
45 Ward, J. H. (1963). Hierarchical grouping to optimize an objective function, Journal of the American Statistical Association 58, 236–244.
10.1080/01621459.1963.10500845
Web of Science® Google Scholar
46 Wastell, D. G. & Gray, R. (1987). The numerical approach to classification: a medical application to develop a typology of facial pain, Statistics in Medicine 6, 137–164.
10.1002/sim.4780060206
CAS PubMed Web of Science® Google Scholar
47 Weide, B. (1977). A survey of analysis techniques for discrete algorithms, ACM Computer Survey 9, 291–313.
10.1145/356707.356711
Web of Science® Google Scholar
48 Williams, W. T. & Lambert, J. M. (1959). Multivariate methods in plant ecology I. Association analysis in plant communities, Journal of Ecology 47, 83–101.
10.2307/2257249
Web of Science® Google Scholar
49 Williams, W. T., Lance, G. N., Dale, M. B. & Clifford, H. T. (1971). Controversy concerning the criteria for taxonometric strategies, Computer Journal 14, 162–165.
10.1093/comjnl/14.2.162
Web of Science® Google Scholar
50 Wishart, D. (1969). An algorithm for hierarchical classifications, Biometrics 25, 165–170.
10.2307/2528688
Web of Science® Google Scholar
51 Wong, M. A. (1982). A hybrid clustering method for identifying high-density clusters, Journal of the American Statistical Association 77, 841–847.
10.1080/01621459.1982.10477896
Web of Science® Google Scholar
52 Wong, M. A. & Lane, T. (1983). A kth nearest neighbour clustering procedure, Journal of the Royal Statistical Society, Series B 45, 362–368.
10.1111/j.2517-6161.1983.tb01262.x
Web of Science® Google Scholar

Citing Literature

Encyclopedia of Biostatistics

Browse other articles of this reference work: