Categorical Data Analysis
Gary G. Koch
University of North Carolina, Chapel Hill, NC, USA
Search for more papers by this authorGary G. Koch
University of North Carolina, Chapel Hill, NC, USA
Search for more papers by this authorAbstract
Categorical data consist of observations counted and subclassified into one among a set of disjoint, exhaustive groups. The groups may be defined on the basis of a single or a combination of qualitative characteristics and/or quantitative measurements. When such categorizations are to be modeled as probabilistic “responses” or “outcomes”, statistical analysis must be based upon discrete distributions and requires extensive generalization of the classical notions of regression and analysis of variance commonly used for continuous observations. This overview article reviews statistical inference for categorical data, from the basic issues in analyses of 2 × 2 contingency tables through generalized linear mixed models, with emphasis on approaches that have increasingly unified our concepts of categorical and continuous data modeling.
References
- 1 Graubard, B. I. & Korn, E. L. (1987) Choice of column scores for testing independence in ordered 2 × K contingency tables, Biometrics 43, 471–476.
- 2 Imrey, P. B., Koch, G. G. & Preisser, J. S. (1996) The evolution of categorical data modeling: a biometric perspective, in Advances in Biometry, H. A. David & P. Armitage, eds. Wiley, New York, pp. 89–114.
- 3 Lee, E. T. (1974) A computer program for linear logistic regression analysis, Computer Programs in Biomedicine 4, 80–92.
Further Reading
-
M. Aerts,
H. Geys,
G. Molenberghs &
L. M. Ryan eds.
(2002)
Topics in Modelling of Clustered Data.
Chapman & Hall/CRC,
London.
10.1201/9781420035889 Google Scholar
- Agresti, A. (1984) Analysis of Ordered Categorical Data. Wiley, New York.
- Agresti, A. (1996) An Introduction to Categorical Data Analysis. Wiley, New York.
-
Agresti, A.
(2002)
Categorical Data Analysis,
2nd Ed.
Wiley,
New York.
10.1002/0471249688 Google Scholar
- Bishop, Y. M. M., Fienberg, S. E. & Holland, P. W. (1975) Discrete Multivariate Analysis. MIT, Cambridge.
- Breslow, N. E. & Day, N. E. (1980) Statistical Methods in Cancer Research. Volume 1: The Analysis of Case-control Studies. International Agency for Research on Cancer, Lyon.
- Breslow, N. E. & Day, N. E. (1987) Statistical Methods in Cancer Research. Volume 2: The Design and Analysis Cohort Studies. International Agency for Research on Cancer, Lyon.
-
Cameron, A. C. &
Trivedi, P. K.
(1998)
Regression Analysis of Count Data.
Cambridge University Press,
New York.
10.1017/CBO9780511814365 Google Scholar
- Clayton, D. & Hills, M. (1993) Statistical Models in Epidemiology. Oxford University Press, Oxford.
- Cox, D. R. & Snell, E. J. (1989) Analysis of Binary Data, 2nd Ed. Chapman & Hall, London.
-
D. K. Dey,
S. J. Ghosh &
B. K. Mallick eds.
(2000)
Generalized Linear Models: A Bayesian Perspective.
Dekker,
New York.
10.1201/9781482293456 Google Scholar
-
Diggle, P. J.,
Heagerty, P.,
Liang, K. -Y. &
Zeger, S. L.
(2002)
Analysis of Longitudinal Data,
2nd Ed.
Clarendon,
Oxford.
10.1093/oso/9780198524847.001.0001 Google Scholar
- Fienberg, S. E. (1980) The Analysis of Cross-Classified Categorical Data, 2nd Ed. MIT, Cambridge.
- Finney, D. J. (1978) Statistical Method in Biological Assay, 3rd Ed. Griffin, London.
-
Fleiss, J. L.,
Levin, B. &
Paik, M. C.
(2003)
Statistical Methods for Rates and Proportions,
3rd Ed.
Wiley,
New York.
10.1002/0471445428 Google Scholar
- Forthofer, R. N. & Lehnen, R. G. (1981) Public Program Analysis: A New Categorical Data Approach. Lifetime Learning Publications, Belmont.
- Gokhale, D. V. & Kullback, S. (1978) The Information in Contingency Tables. Dekker, New York.
- Good, I. J. (1965) The Estimation of Probabilities. MIT, Cambridge.
- Goodman, L. A. (1978) Analyzing Qualitative/Categorical Data: Log-linear Models and Latent Structure Analysis, J. Magidson, ed. Abt, Cambridge.
- Haberman, S. (1974) The Analysis of Frequency Data. University of Chicago, Chicago.
- Haberman, S. (1978) Analysis of Qualitative Data. Volume 1: Introductory Topics. Academic Press, New York.
- Haberman, S. (1979) Analysis of Qualitative Data. Volume 2: New Developments. Academic Press, New York.
-
Hosmer, D. W. &
Lemeshow, S.
(2000)
Applied Logistic Regression,
2nd Ed.
Wiley,
New York.
10.1002/0471722146 Google Scholar
- Imrey, P. B., Koch, G. G. & Stokes, M. E. (1981) Categorical data analysis: some reflections on the log linear model and logistic regression. Part I: historical and methodological overview, International Statistical Review 49, 265–283.
- Imrey, P. B., Koch, G. G. & Stokes, M. E. (1982) Categorical data analysis: some reflections on the log linear model and logistic regression. Part II: data analysis, International Statistical Review 50, 35–63.
- Koch, G. G., Imrey, P. B., Singer, J. M., Atkinson, S. S. & Stokes, M. E. (1985) Analysis of Categorical Data. University of Montreal, Montreal.
- Kuritz, S. J., Landis, J. R. & Koch, G. G. (1988) A general overview of Mantel-Haenszel methods: applications and recent developments, Annual Review of Public Health 9, 123–160.
- Laird, N. M. (1978) Empirical Bayes methods for two-way contingency tables, Biometrika 65, 581–590.
- Lawson, A., Biggeri, A., Böhning, D., LeSaffre, E., Viel, J. -F. & Bertollini, R. (1999) Disease Mapping and Risk Assessment for Public Health. Wiley, Baffins Lane.
- Leonard, T. (1975) Bayesian estimation methods for two-way contingency tables, Journal of the Royal Statistical Society B37, 23–37.
- McCullagh, P. & Nelder, J. A. (1999) Generalized Linear Models, 3rd Ed. Chapman & Hall, London.
- McCulloch, C. E. (1994) Maximum likelihood variance components estimation for binary data, Journal of the American Statistical Association 89, 330–335.
- McCulloch, C. E. & Searle, S. R. (2001) Generalized, Linear, and Mixed Models. Wiley, New York.
-
Read, T. R. C. &
Cressie, N. A. C.
(1988)
Goodness-of-fit Statistics for Discrete Multivariate Data.
Springer-Verlag,
New York.
10.1007/978-1-4612-4578-0 Google Scholar
- Stokes, M. E., Davis, C. S. & Koch, G. G. (2001) Categorical Data Analysis Using the SAS® System, 2nd Ed. SAS Institute, Cary; Wiley, New York.
- Vonesh, E. F. & Chinchilli, V. M. (1997) Linear and Nonlinear Models for the Analysis of Repeated Measurements, Dekker, New York, 381–444.