The applications of capture-recapture models to epidemiological data
Corresponding Author
Anne Chao
Institute of Statistics, National Tsing Hua University, Hsin-Chu, Taiwan
Institute of Statistics, National Tsing Hua University, Hsin-Chu 30043, TaiwanSearch for more papers by this authorP. K. Tsay
Institute of Statistics, National Tsing Hua University, Hsin-Chu, Taiwan
Search for more papers by this authorSheng-Hsiang Lin
Institute of Statistics, National Tsing Hua University, Hsin-Chu, Taiwan
Search for more papers by this authorWen-Yi Shau
Graduate Institute of Clinical Medicine, National Taiwan University, Taipei, Taiwan
Search for more papers by this authorDay-Yu Chao
Graduate Institute of Epidemiology, National Taiwan University, Taipei, Taiwan
Search for more papers by this authorCorresponding Author
Anne Chao
Institute of Statistics, National Tsing Hua University, Hsin-Chu, Taiwan
Institute of Statistics, National Tsing Hua University, Hsin-Chu 30043, TaiwanSearch for more papers by this authorP. K. Tsay
Institute of Statistics, National Tsing Hua University, Hsin-Chu, Taiwan
Search for more papers by this authorSheng-Hsiang Lin
Institute of Statistics, National Tsing Hua University, Hsin-Chu, Taiwan
Search for more papers by this authorWen-Yi Shau
Graduate Institute of Clinical Medicine, National Taiwan University, Taipei, Taiwan
Search for more papers by this authorDay-Yu Chao
Graduate Institute of Epidemiology, National Taiwan University, Taipei, Taiwan
Search for more papers by this authorAbstract
Capture-recapture methodology, originally developed for estimating demographic parameters of animal populations, has been applied to human populations. This tutorial reviews various closed capture-recapture models which are applicable to ascertainment data for estimating the size of a target population based on several incomplete lists of individuals. Most epidemiological approaches merging different lists and eliminating duplicate cases are likely to be biased downwards. That is, the final merged list misses those who are in the population but were not ascertained in any of the lists. If there are no matching errors, then the duplicate information collected from a capture-recapture experiment can be used to estimate the number of missed under proper assumptions. Three approaches and their associated estimation procedures are introduced: ecological models; log-linear models, and the sample coverage approach. Each approach has its unique way of incorporating two types of source dependencies: local (list) dependence and dependence due to heterogeneity. An interactive program, CARE (for capture-recapture) developed by the authors is demonstrated using four real data sets. One set of data deals with infection by the acute hepatitis A virus in an outbreak in Taiwan; the other three sets are ascertainment data on diabetes, spina bifida and infants' congenital anomaly discussed in the literature. These data sets provide examples to show the usefulness of the capture-recapture method in correcting for under-ascertainment. The limitations of the methodology and some cautionary remarks are also discussed. Copyright © 2001 John Wiley & Sons, Ltd.
REFERENCES
- 1 Chao DY, Shau WY, Lu CWK, Chen KT, Chu CL, Shu HM, Horng CB. A large outbreak of hepatitis A in a college school in Taiwan: associated with contaminated food and water dissemination. Epidemiology Bulletin, Department of Health, Executive Yuan, Taiwan Government, 1997.
- 2 Hook EB, Albright SG, Cross PK. Use of Bernoulli census and log-linear methods for estimating the prevalence of spina bifida in livebirths and the completeness of vital record reports in New York State. American Journal of Epidemiology 1980; 112: 750–758.
- 3 Regal RR, Hook EB. The effects of model selection on confidence intervals for the size of a closed population. Statistics in Medicine 1991; 10: 717–721.
- 4 Bruno GB, Biggeri A, LaPorte RE, McCarty D, Merletti F, Pagono G. Application of capture-recapture to count diabetes. Diabetes Care 1994; 17: 548–556.
- 5 Wittes JT, Colton T, Sidel VW. Capture-recapture methods for assessing the completeness of cases ascertainment when using multiple information sources. Journal of Chronic Diseases 1974; 27: 25–36.
- 6 Fienberg SE. The multiple recapture census for closed populations and incomplete 2k contingency tables. Biometrika 1972; 59: 591–603.
- 7 Hook EB, Regal RR. Accuracy of alternative approaches to capture-recapture estimates of disease frequency: internal validity analysis of data from five sources. American Journal of Epidemiology 2000; 152: 771–779.
- 8 International Working Group for Disease Monitoring and Forecasting (IWGDMF) Capture-recapture and multiple-record systems estimation I: history and theoretical development. American Journal of Epidemiology 1995; 142: 1047–1058.
- 9 International Working Group for Disease Monitoring and Forecasting (IWGDMF). Capture-recapture and multiple-record systems estimation II: application in human diseases. American Journal of Epidemiology 1995; 142: 1059–1068.
- 10 Seber GAF. The Estimation of Animal Abundance, 2nd edn. Griffin: London, 1982.
- 11 Seber GAF. A review of estimating animal abundance. Biometrics 1986; 42: 267–292.
- 12 Seber GAF. A review of estimating animal abundance II. International Statistical Review 1992; 60: 129–166.
- 13 Schwarz CJ, Seber GAF. A review of estimating animal abundance III. Statistical Science 1999; 14: 427–456.
- 14 Schouten LJ, Straatman H, Kiemeney LALM, Gimbrere CHF, Verbeek ALM. The capture-recapture method for estimation of cancer registry completeness; a useful tool? International Journal of Epidemiology 1994; 23: 1111–1116.
- 15 Hook EB, Regal RR. Validity of Bernoulli census, log-linear, and truncated binomial models for correction for underestimates in prevalence studies. American Journal of Epidemiology 1982; 116: 168–176.
- 16 LaPorte RE, McCarty, DJ, Tull ES, Tajima N. Counting birds, bees and NCDs. Lancet 1992; 339: 494.
- 17 Darroch JN. The Multiple-Recapture Census I. Estimation of a closed population. Biometrika 1958; 45: 343–359.
- 18 Sekar C, Deming WE. On a method of estimating birth and death rates and the extent of registration. Journal of the American Statistical Association 1949; 44: 101–115.
- 19 Wittes JT, Sidel VW. A generalization of the simple capture-recapture model with applications to epidemiological research. Journal of Chronic Diseases 1968; 21: 287–301.
- 20 Wittes JT. Applications of a multinomial capture-recapture method to epidemiological data. Journal of the American Statistical Association 1974; 69: 93–97.
- 21 McCarty DJ, Tull ES, Moy CS, Kwoh CK, LaPorte RE. Ascertained corrected rates: Applications of capture-recapture methods. International Journal of Epidemiology 1993; 22: 559–565.
- 22 Hook EB, Regal RR. The value of capture-recapture methods even for apparently exhaustive surveys: the need for adjustment for source of ascertainment intersection in attempted complete prevalence studies. American Journal of Epidemiology 1992; 135: 1060–1067.
- 23 Hook EB, Regal RR. Capture-recapture methods in epidemiology: methods and limitation. Epidemiological Reviews 1995; 17: 243–264.
- 24 Chao A. Capture-recapture. In Encyclopedia of Biostatistics, P Armitage, T Colton (eds). Wiley: New York, 1998.
- 25 Kiemeney LALM, Schouten LJ, Straatman H. Ascertainment corrected rates (Letter to Editor). International Journal of Epidemiology 1994; 23: 203–204.
- 26 Desenclos JC, Hubert B. Limitations to the universal use of capture-recapture methods. International Journal of Epidemiology 1994; 23: 1322–1323.
- 27 Papoz L, Balkau B, Lellouch J. Case counting in epidemiology: limitation of methods based on multiple data sources. International Journal of Epidemiology 1996; 25: 474–477.
- 28 Cormack RM. Problems with using capture-recapture in epidemiology: an example of a measles epidemic. Journal of Clinical Epidemiology 1999; 52: 909–914.
- 29 Chao A, Tsay PK, Shau WY, Chao DY. Population size estimation for capture-recapture models with applications to epidemiological data. Proceedings of Biometrics Section, American Statistical Association 1996; 108–117.
- 30 Lazarsfeld PF, Henry NW. Latent Structure Analysis. Houghton Mifflin: Boston, 1968.
10.2307/2283868 Google Scholar
- 31 Hook EB, Regal RR. Effects of variation in probability of ascertainment by sources (‘variable catchability’) upon ‘capture-recapture’ estimates of prevalence. American Journal of Epidemiology 1993; 137: 1148–1166.
- 32 Darroch JN, Fienberg SE, Glonek GFV, Junker BW. A three-sample multiple-recapture approach to census population estimation with heterogeneous catchability. Journal of the American Statistical Association 1993; 88: 1137–1148.
- 33 Pollock KH. Modeling capture, recapture, and removal statistics for estimation of demographic parameters for fish and wildlife population: past, present, and future. Journal of the American Statistical Association 1991; 86: 225–238.
- 34 Otis DL, Burnham KP, White GC, Anderson DR. Statistical inference from capture data on closed animal populations. Wildlife Monographs 1978; 62: 1–135.
- 35 White GC, Anderson DR, Burnham KP, Otis DL. Capture-Recapture and Removal Methods for Sampling Closed Populations. Los Alamos National Lab, LA-8787-NERP: Los Alamos, New Mexico, USA, 1982.
- 36 Huggins RM. On the statistical analysis of capture experiments. Biometrika 1989; 76: 133–140.
- 37 Alho JM. Logistic regression in capture-recapture models. Biometrics 1990; 46: 623–635.
- 38 Huggins RM. Some practical aspects of a conditional likelihood approach to capture experiments. Biometrics 1991; 47: 725–732.
- 39 Rasch G. On general laws and the meaning of measurement in psychology. In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, J Neyman (ed.). University of California Press: 1961; 321–333.
- 40 Coull BA, Agresti A. The use of mixed logit models to reflect heterogeneity in capture-recapture studies. Biometrics 1999; 55: 294–301.
- 41 Lloyd CJ, Yip P. A unification of inference for capture-recapture studies through martingale functions. In Estimating Equations, VP Godambe (ed.). Clarendon Press: Oxford, 1991; 65–88.
- 42 Pledger S. Unified maximum likelihood estimates for closed capture-recapture models using mixtures. Biometrics 2000; 56: 434–442.
- 43 Lee SM, Chao A. Estimating population size via sample coverage for closed capture-recapture models. Biometrics 1994; 50: 88–97.
- 44 Rexstad E, Burnham KP. User's Guide for Interactive Program CAPTURE. Colorado Cooperative Fish and Wildlife Research Unit: Fort Collins, 1991.
10.2307/3809503 Google Scholar
- 45 Bishop YMM, Fienberg SE, Holland PW. Discrete Multivariate Analysis: Theory and Practice. MIT Press: Cambridge, Mass., 1975.
- 46 Burnham KP, Overton WS. Estimation of the size of a closed population when capture probabilities vary among animals. Biometrika 1978; 65: 625–633.
- 47 Fienberg SE, Johnson MS, Junker BW. Classical multilevel and Bayesian approaches to population size estimation using multiple lists. Journal of Royal Statistical Society, Series A 1999; 162: 383–405.
- 48 Chao A, Lee SM, Jeng SL. Estimating population size for capture-recapture data when capture probabilities vary by time and individual animal. Biometrics 1992; 48: 201–216.
- 49 Cormack RM. Loglinear models for capture-recapture. Biometrics 1989; 45: 395–413.
- 50 Agresti A. Simple capture-recapture models permitting unequal catchability and variable sampling effort. Biometrics 1994; 50: 494–500.
- 51 Lloyd CJ. Statistical Analysis of Categorical Data. Wiley: New York, 1999.
- 52 Norris JL, Pollock KH. Nonparametric MLE under two closed capture-recapture models with heterogeneity. Biometrics 1996; 52: 639–649.
- 53 Chao A, Tsay PK. A sample coverage approach to multiple-system estimation with application to census undercount. Journal of the American Statistical Association 1998; 93: 283–293.
- 54 Tsay PK, Chao A. Population size estimation for capture-recapture models with applications to epidemiological data. Journal of Applied Statistics 2001; 28: 25–36.
- 55 Good IJ. The population frequencies of species and the estimation of population parameters. Biometrika 1953; 40: 237–264.
- 56 Bunge J, Fitzpatrick M. Estimating the number of species: recent developments. Journal of the American Statistical Association 1993; 88: 364–373.
- 57 Efron B, Tibshirani RJ. An Introduction to the Bootstrap. Chapman and Hall: New York, 1993.
10.1007/978-1-4899-4541-9 Google Scholar
- 58 Chao A. Estimating the population size for capture-recapture data with unequal catchability. Biometrics 1987; 43: 783–791.
- 59 MathSoft. S-PLUS User's Manual, Version 4.0. MathSoft, Inc.: Seattle, WA, 1997.
- 60 Hook EB, Regal RR. Recommendations for presentation and evaluation of capture-recapture estimates in epidemiology. Journal of Clinical Epidemiology 1999; 52: 917–926.
- 61 Hook EB, Regal RR. On the need for a 16th and 17th recommendation for capture-recapture analysis. Journal of Clinical Epidemiology 2000; 53: 1275–1277.
- 62 Gutteridge W, Collin C. Capture-recapture technique: quick and cheap (Letter). British Medical Journal 1994; 308: 531.
- 63 Black JFP, McLarty DG. Capture-recapture technique: difficult to use in developing countries (Letter). British Medical Journal 1994; 308: 531.
- 64 Hay G. The selection from multiple data sources in epidemiological capture-recapture studies. Statistician 1997; 46: 515–520.
- 65 Chang YF, LaPorte RE, Aaron DJ, Songer TJ. The importance of source selection and pilot study in the capture-recapture application. Journal of Clinical Epidemiology 1999; 52: 927–928.
- 66 Ismail AA, Beeching NJ, Gill GV, Bellis MA. How many data sources are needed to determine diabetes prevalence by capture-recapture? International Journal of Epidemiology 2000; 29: 536–541.