COEFFICIENTS OF ASSOCIATION AND SIMILARITY, BASED ON BINARY (PRESENCE-ABSENCE) DATA: AN EVALUATION
ZDENEK HUBÁLEK
Czechoslovak Academy of Sciences, Institute of Parasitology, Flemingovo nám. 2, 166 32 Prague, Czechoslovakia
Search for more papers by this authorZDENEK HUBÁLEK
Czechoslovak Academy of Sciences, Institute of Parasitology, Flemingovo nám. 2, 166 32 Prague, Czechoslovakia
Search for more papers by this authorSummary
Forty-three association (similarity) coefficients were collected and evaluated in this survey. Some of them are synonyms or direct correlates with earlier described indices (A8, A9, A12, A31, A33), others are mere transforms from one range of values to another (A10, A24, A33). Several coefficients are incompatible with suggested admissibility conditions of the minimum-maximum value (A13, A16, A27, A28, A29, A31), symmetry (A1, A2, A13, A16, A26), discrimination between positive and negative association (A27, A28, A31) or monotonicity with (χ2) (A19, to A24); A17 yields very low and erratic values.
As a result, 23 coefficients were excluded and the remaining 20 measures were subjected to an empirical trial on interspecific association data among fungi of the genus Chaetomium, with the use of a cluster analysis. The classification produced five main clusters of related coefficients, with several subgroups. It was then demonstrated that representative indices from different clusters yield different dendrograms of interspecific association among Chaetomium, and A34, A14, possibly also A36 and A40 seemed to be less sensible. A set of measures that generally work well (at least in the interspecific association) comprises A4 (Jaccard), A4 (Dice-Sφrensen), A7 (Kulczyński), A11 (Driver-Kroeber-Ochiai) and, with some reservation A30 (Pearson tetrachoric) and A32 (Baroni-Urbani-Buser). For some purposes, however, other ‘admissible’ coefficients would be more optimal, and the choice of a measure should be related to the nature of the data. It is tentatively suggested that three or so alternative coefficients be used and the results compared on the same data basis; moreover, significance tests on association should be carried out whenever possible.
References
- Austin, B. & Colwell, R. R. (1977). Evaluation of some coefficients for use in numerical taxonomy of micro-organisms. International Journal of Systematic Bacteriology 27, 204–210.
- Baroni-Urbani, C. & Buser, M. W. (1976). Similarity of binary data. Systematic Zoology 25, 251–259.
- Braun-Blanquet, J. (1932). Plant Sociology; the Study of Plant Communities. McGraw-Hill, New York .
- Cancela da Fonseca, J. P. (1966). L'outil statistique en biologie du sol. III. Indices d'énterêt écologique. Revue d'Écologie et Biologie du Sol 3, 381–407.
- Cattell, R. B. (1952). Factor Analysis. Harper, New York .
- Cheetham, A. H. & Hazel, J. E. (1969). Binary (presence-absence) similarity coefficients. Journal of Paleontology 43, 1130–1136.
- Clifford, H. T. & Stephenson, W. (1975). An Introduction to Numerical Classification. Academic Press, New York .
- Cole, L. C. (1949). The measurement of interspecific association. Ecology 30, 411–424.
- Cramér, H. (1924). Quoted by Goodman & Kruskal (1959).
- Dagnelie, P. (1960). Contribution à l'étude des communautés végétales par l'analyse factorielle. Bulletin du Service de la Carte Phytogéographique (Paris), Série B, 5, 7–71.
- Dagnelie, P. (1965). L'étude des communautés végétales par l'analyse statistique des liaisons entre les espèces et les variables écologiques: principes fondamentaux. Biometrics 21, 345–361.
- Dice, L. R. (1945). Measures of the amount of ecologic association between species. Ecology 26, 297–302.
- Doolittle, M. H. (1885). The verification of predictions. Bulletin of the Philosophical Society of Washington 7, 122–127.
- Driver, H. E. & Kroeber, A. L. (1932). Quantitative expression of cultural relationships. The University of California Publications in American Archaeology and Ethnology 31, 211–256.
-
Eades, D. C. (1965). The inappropriateness of the correlation coefficient as a measure of taxonomic resemblance.
Systematic Zoology
14, 98–100.
10.2307/2411731 Google Scholar
- Eyraud, H. (1936). Les principes de la mesure des correlations. Annales de l'Unicersité de Lyon, Séie III, Section A, 1, 30–47.
- Fager, E. W. & McGowan, J. A. (1963). Zooplankton species groups in the North Pacific. Science 140, 453–460.
-
Field, J. G. (1971). A numerical analysis of changes in the soft-bottom fauna along a transect across False Bay, South Africa.
Journal of Experimental Marine Biology and Ecology
7, 215–253.
10.1016/0022-0981(71)90007-4 Google Scholar
- Forbes, S. A. (1907). On the local distribution of certain Illinois fishes: an essay in statistical ecology. Bulletin of the Illinois State Laboratory for Natural History 7, 273–303.
- Forbes, S. A. (1925). Method of determining and measuring the associative relations of species. Science 61, 524.
- Gilbert, N. & Wells, T. C. E. (1966). Analysis of quadrat data. Journal of Ecology 54, 675–685.
- Goodall, D.W. (1952). Quantitative aspects of plant distribution. Biological Reviews 27, 194–245.
- Goodall, D. W. (1953). Objective methods for the classification of vegetation. I. The use of positive interspecific correlation. Australian Journal of Botany 1, 39–63.
- Goodall, D. W. (1967). The distribution of the matching coefficient. Biometrics 23, 647–656.
-
Goodall, D. W. (1973). Sample similarity and species correlations. In Ordination and Classification of Communities (ed.
R. H. Whittaker), pp. 105–156. W. Junk,
The Hague
.
10.1007/978-94-010-2701-4_6 Google Scholar
- Goodman, L. A. & Kruskal, W. H. (1954). Measures of association for cross classifications. Journal of the American Statistical Association 49, 732–764.
- Goodman, L. A. & Kruskal, W. H. (1959). Measures of association for cross classifications. II. Further discussion and references. Journal of the American Statistical Association 54, 123–163.
- Gounot, M. (1969). Méthodes d'Étude Quantitative de la Végétation. Masson et C1e, Paris .
- Greig-Smith, P. (1964). Quantitative Plant Ecology, 2nd ed. Butterworths, London .
- Hamann, U. (1961). Merkmalbestand und Verwandschaftsbeziehungen der Farinosae. Ein Beitrag zum System der Monokotyledonen. Willdenowia 2, 639–768.
- Hubálek, Z. (1974). Dispersal of fungi of the family Chaetomiaceae by free-living birds. I. A survey of records. Česká Mykologie 28, 65–79.
- Hubálek, Z. (1976). Interspecific affinity among keratinolytic fungi associated with birds. Folio Parasitologica (Praha) 23, 267–272.
- Hurálek, Z. (1978). Coincidence of fungal species associated with birds. Ecology 59, 438–442.
- Hurlbert, S. H. (1969). A coefficient of interspecific association. Ecology 50, 1–9.
- Jaccard, P. (1901). Distribution de la flore alpine dans le Bassin des Dranses et dans quelques régions voisines. Bulletin de la Société Vaudoise des Sciences Naturelles 37, 241–272.
- Janowitz, M. F. (1980). Similarity measures on binary data. Systematic Zoology 29, 342–359.
- Jardine, N. & Sibson, R. (1971). Mathematical Taxonomy. John Wiley and Sons, London-New York-Sydney-Toronto .
- Johnson, S. C. (1967). Hierarchical clustering schemes. Psychometrika 32, 241–254.
- Kendall, M. G. & Stuart, A. (19581967). The Advanced Theory of Statistics, 2nd ed. Charles Griffin, London .
- Kershaw, K. A. (1964). Quantitative and Dynamic Ecology. Edward Arnold Ltd, London .
- Kulczyński, S. (1927). Zespoly róslin w Pieninach. Bulletin International de l'Academic Polonaise des Sciences et des Lettres, Classe des Sciences Mathématiques et Naturelles, Série B (Sciences Naturelles), Supplement 11, 57–203.
- McConnaughey, B. H. (1964). The determination and analysis of plankton communities. Marine Research of Indonesia Spec., 1–40.
-
Michael, E. L. (1920). Marine ecology and the coefficient of association; a plea in behalf of quantitative biology.
Journal of Animal Ecology
8, 54–59.
10.2307/2255213 Google Scholar
-
Moore, A. W. &
Russell, J. S. (1967). Comparison of coefficients and grouping procedures in numerical analysis of soil trace element data.
Geoderma
1, 139–158.
10.1016/0016-7061(67)90006-7 Google Scholar
- Morisita, M. (1959). Measuring of interspecific association and similarity between communities. Memoirs of the Faculty of Science, Kyushu University, Series E (Biology), 3, 65–80.
- Mountford, M. D. (1962). An index of similarity and its application to classificatory problems. In Progress in Soil Zoology (ed. P. W. Murphy), pp. 43–50. Butterworths, London .
- Nash, C. B. (1950). Associations between fish species in tributaries and shore waters of western Lake Erie. Ecology 31, 561–566.
-
Ochiai, A. (1957). Zoogeographic studies on the soleoid fishes found in Japan and its neighbouring regions.
Bulletin of the Japanese Society for Scientific Fisheries
22, 526–530.
10.2331/suisan.22.526 Google Scholar
-
Pearson, K. (1900). On the correlation of characters not quantitatively measurable.
Philosophical Transactions of the Royal Society, A, 195, 1–78.
10.1098/rsta.1900.0022 Google Scholar
- Pearson, K. (1905). Quoted by Yule & Kendall (1950).
- Pearson, K. (1926). On the coefficient of racial likeness. Biometrika 18, 105–117.
- Pearson, K. & Heron, D. (1913). On theories of association. Biometrika 9, 159–315.
- Peirce, C. S. (1884). The numerical measure of the success of predictions. Science 4, 453–454.
- Pielou, E. C. (1969). An Introduction to Mathematical Ecology. Wiley-Interscience, New York .
- Poole, R. W. (1974). An Introduction to Quantitative Ecology. McGraw-Hill, New York .
- Rogers, D. J. & Tanimoto, T. T. (1960). A computer program for classifying plants. Science 132, 1115–1118.
- Russell, P. F. & Rao, T. R. (1940). On habitat and association of species of anopheline larvae in south-eastern Madras. Journal of the Malaria Institute of India 3, 153–178.
- Simpson, G. G. (1943). Mammals and the nature of continents. American Journal of Science 241, 1–31.
- Simpson, G. G. (1960). Notes on the measurement of faunal resemblance. American Journal of Science, Bradley volume, 258-A, 300–311.
- Sneath, P. H. A. (1957). Some thoughts on bacterial classification. Journal of General Microbiology 17, 184–200.
- Sneath, P. H. A. (1968). Vigour and pattern in taxonomy. Journal of General Microbiology 54, 1–11.
- Sneath, P. H. A. & Sokal, R. R. (1973). Numerical Taxonomy. W. H. Freeman, San Francisco .
- Sokal, R. R. & Michener, C. D. (1958). A statistical method for evaluating systematic relationships. The University of Kansas Scientific Bulletin 38, 1409–1438.
- Sokal, R. R. & Sneath, P. H. A. (1963). Principles of Numerical Taxonomy. W. H. Freeman and Co., San Francisco .
- Sørensen, T. (1948). A method of establishing groups of equal amplitude in plant sociology based on similarity of species content and its application to analyses of the vegetation on Danish commons. Kongelige Danske Videnskabernes Selskab, Biologiske Skrifter 5, 1–34.
- Sorgenfrei, T. (1959). Molluscan assemblages from the marine middle Miocene of South Jutland and their environments. Denmark Geologiske Undersoegelse, Ser. 2, no. 79, 403.
- Tarwid, K. (1960). Szacowanie zbieżności nisz ekologicznych gatunków droga oceny prawdopodobieństwa spotykania sie ich w polowach. Ekologia Polska Ser. B, 6, 115–130.
-
T'Mannetje, L. (1967). A comparison of eight numerical procedures applied to the classification of some African Trifolium taxa based on Rhizobium affinities.
Australian Journal of Botany
15, 521–528.
10.1071/BT9670521 Google Scholar
- Whittaker, R. H. (1967). Gradient analysis of vegetation. Biological Reviews 42, 207–264.
-
Williams, W. T. &
Dale, M. B. (1965). Fundamental problems in numerical taxonomy.
Advances in Botanical Research
2, 35–68.
10.1016/S0065-2296(08)60249-9 Google Scholar
- Williams, W. T. & Lambert, J. M. (1960). Multivariate methods in plant ecology. II. The use of an electronic digital computer. Journal of Ecology 48, 689–710.
- Williams, W. T., Lambert, J. M. & Lance, G. N. (1966). Multivariate methods in plant ecology. V. Similarity analyses and information-analysis. Journal of Ecology 54, 427–445.
- Yates, F. (1934). Contingency tables involving small numbers and χ2 test. Journal of the Royal Statistical Society, Supplement 1, 217–235.
- Yule, G. U. (1900). On the association of attributes in statistics. Philosophical Transactions of the Royal Society, A, 194, 257–319.
- Yule, G. U. (1912). On the methods of measuring association between two attributes. Journal of the Royal Statistical Society 75, 579–642.
- Yule, G. U. & Kendall, M. G. (1950). An Introduction to the Theory of Statistics, 14th ed. Charles Griffin and Co. Ltd, London .