Survival analysis with electronic health record data: Experiments with chronic kidney disease
Yolanda Hagar
Department of Applied Mathematics, University of Colorado at Boulder, Boulder, CO, USA
Search for more papers by this authorDavid Albers
Department of Biomedical Informatics, Columbia University, New York, NY, USA
Search for more papers by this authorRimma Pivovarov
Department of Biomedical Informatics, Columbia University, New York, NY, USA
Search for more papers by this authorHerbert Chase
Department of Biomedical Informatics, Columbia University, New York, NY, USA
Search for more papers by this authorCorresponding Author
Vanja Dukic
Department of Applied Mathematics, University of Colorado at Boulder, Boulder, CO, USA
Joint senior authors for this work.Vanja Dukic ([email protected]) and Noémie Elhadad ([email protected])Search for more papers by this authorCorresponding Author
Noémie Elhadad
Department of Biomedical Informatics, Columbia University, New York, NY, USA
Joint senior authors for this work.Vanja Dukic ([email protected]) and Noémie Elhadad ([email protected])Search for more papers by this authorYolanda Hagar
Department of Applied Mathematics, University of Colorado at Boulder, Boulder, CO, USA
Search for more papers by this authorDavid Albers
Department of Biomedical Informatics, Columbia University, New York, NY, USA
Search for more papers by this authorRimma Pivovarov
Department of Biomedical Informatics, Columbia University, New York, NY, USA
Search for more papers by this authorHerbert Chase
Department of Biomedical Informatics, Columbia University, New York, NY, USA
Search for more papers by this authorCorresponding Author
Vanja Dukic
Department of Applied Mathematics, University of Colorado at Boulder, Boulder, CO, USA
Joint senior authors for this work.Vanja Dukic ([email protected]) and Noémie Elhadad ([email protected])Search for more papers by this authorCorresponding Author
Noémie Elhadad
Department of Biomedical Informatics, Columbia University, New York, NY, USA
Joint senior authors for this work.Vanja Dukic ([email protected]) and Noémie Elhadad ([email protected])Search for more papers by this authorAbstract
This article presents a detailed survival analysis for chronic kidney disease (CKD). The analysis is based on the electronic health record (EHR) data comprising almost two decades of clinical observations collected at New York-Presbyterian, a large hospital in New York City with one of the oldest electronic health records in the United States. Our survival analysis approach centers around Bayesian multiresolution hazard modeling, with an objective to capture the changing hazard of CKD over time, adjusted for patient clinical covariates and kidney-related laboratory tests. Special attention is paid to statistical issues common to all EHR data, such as cohort definition, missing data and censoring, variable selection, and potential for joint survival and longitudinal modeling, all of which are discussed alone and within the EHR CKD context.
Supporting Information
Filename | Description |
---|---|
sam11236-sup-0001-appendix.pdfPDF document, 149.7 KB | Supporting Information |
Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.
REFERENCES
- 1J. S. Brownstein, K. P. Kleinman, and K. D. Mandl, Identifying pediatric age groups for influenza vaccination using a real-time regional surveillance system, Am J Epidemiol 162(7) (2005), 686–693.
- 2J. Feldman, E. P. Hoffer, G. O. Barnett, R. J. Kim, K. T. Famiglietti, and H. Chueh, Presence of key findings in the medical record prior to a documented high-risk diagnosis, J Am Med Inform Assoc 19(4) (2012), 591–596.
- 3C. P. Friedman, A. K. Wong, and D. Blumenthal, Achieving a nationwide learning health system, Sci Transl Med 2(57) (2010), 57cm29.
- 4S. A. Collins and D. K. Vawdrey, “Reading between the lines” of flow sheet data: nurses' optional documentation associated with cardiac arrest outcomes, Applied Nursing Research 25(4) (2012), 251–257.
- 5J. van der Lei, Use and abuse of computer stored medical records, Methods Inf Med 30 (1991), 79.
- 6W. Hogan and M. Wagner, Accuracy of data in computer-based patient records, J Am Med Inf Assoc 4 (1997), 342–355.
- 7G. Hripcsak, C. Knirsch, L. Zhao, A. Wilcox, and G. B. Milton, Using discordance to improve classification in narrative clinical databased: an application to community-acquired pneumonia, Comput Math Biomed Eng 37(3) (2007), 296–304.
- 8H. Sagreiya and R. B. Altman, The utility of general purpose versus specialty clinical databases for research: Warfarin dose estimation from extracted clinical variables, J Biomed Inform 43 (2010), 747–751.
- 9G. Hripcsak and D. J. Albers, Next-generation phenotyping of electronic health records, J Am Med Inform Assoc 10 (2012), 1–5.
- 10S. Kleinberg and N. Elhadad, Lessons learned in replicating data-driven experiments in multiple medical systems and patient populations, In Proceedings AMIA (American Medical Informatics Association) Annual Fall Symposium, 2013.
- 11K. J. Rothman and S. Greenland, Modern Epidemiology, Wolters Kluwer Health, Philadelphia, PA, 2008.
- 12F. Dominici, A. McDermott, and T. Hastie, Improved semi-parametric time series models of air pollution and mortality, J Am Stat Assoc 99(468) (2004), 938–948.
- 13V. Dukic, M. Hayden, A. Forgor, T. Hopson, P. Akweongo, A. Hodgson, A. Monaghan, C. Wiedinmyer, T. Yoksas, M. C. Thomson, S. Trzaska, and R. Pandya, The role of weather in meningitis outbreaks in Navrongo, Ghana: a generalized additive modeling approach, J Agric Biol Environ Stat 17(3) (2012): 442–460.
- 14J. Wennberg, Tracking Medicine: A Researcher's Quest to Understand Health Care, New York, Oxford University Press, 2010.
- 15S.-L. Normand, M. Glickman, and C. Gatsonis, Statistical methods for profiling providers of medical care: issues and applications, J Am Stat Assoc 92 (1997), 803–814.
- 16S.-L. Normand and D. M. Shahian, Statistical and clinical aspects of hospital outcomes profiling, Stat Sci 22 (2007), 206–226.
- 17 J. P. T Higgins and S. Green, eds. Cochrane Handbook for Systematic Reviews of Interventions, The Cochrane Collaboration, www.cochrane-handbook.org. Accessed August 11, 2014.
- 18V. Dukic and C. Gatsonis, Meta-analysis of diagnostic test accuracy assessment studies with varying number of thresholds, Biometrics 59(4) (2003), 936–946.
- 19K. M. Newton, P. L. Peissig, A. N. Kho, S. J. Bielinski, R. L. Berg, V. Choudhary, M. Basford, C. G. Chute, I. J. Kullo, R. Li, J. A. Pacheco, L. V. Rasmussen, L. Spangler, and J. C. Denny, Validation of electronic medical record-based phenotyping algorithms: results and lessons learned from the eMERGE network, J Am Med Inform Assoc 20(e1) (2013), e147–e154.
- 20R. J. Little, Pattern-mixture models for multivariate incomplete data, J Am Stat Assoc 88 (1993), 125–134.
- 21R. J. A. Little and D. B. Rubin, Statistical Analysis with Missing Data, New York, Wiley, 2002.
10.1002/9781119013563 Google Scholar
- 22R. Pivovarov, D. J. Albers, J. L. Sepulveda, and N. Elhadad, Identifying and mitigating biases in EHR laboratory tests, J Biomed Inform. Forthcoming (2014).
- 23M. J. Daniels and J. W. Hogan, Missing Data in Longitudinal Studies: Strategies for Bayesian Modeling and Sensitivity Analysis, Chapman and Hall/CRC, Boca Raton, FL, 2008.
10.1201/9781420011180 Google Scholar
- 24R. Pivovarov and N. Elhadad, A hybrid knowledge-based and data-driven approach to identifying semantically similar concepts, J Biomed Inform 45(3) (2012), 471–481.
- 25J. Fan, Y. Feng, and Y. Wu, High-Dimensional Variable Selection for Cox's Proportional Hazards Model, Vol. 6, Beachwood, OH, Institute of Mathematical Statistics, 2010, 70–86.
- 26J. Hogan and N. Laird, Model-based approaches to analysing incomplete longitudinal and failure time data, Stat Med 16(1–3) (1997), 259–272.
- 27L. Wu, W. Liu, G. Yi, and Y. Huang. Analysis of longitudinal and survival data: joint modeling, inference methods, and issues, J Probab Stat 2012(3) (2012), 640153.
10.1155/2012/640153 Google Scholar
- 28J. Ibrahim, M. H. Chen, and D. Sinha, Bayesian methods for joint modeling of longitudinal and survival data with applications to cancer vaccine trials, Statistica Sinica 14(3) (2004), 863–883.
- 29J. Ding and J.-L. Wang, Modeling longitudinal data with nonparametric multiplicative random effects jointly with survival data, Biometrics 64(2) (2008), 546–556.
- 30D. J. Albers and G. Hripcsak, Estimation of time-delayed mutual information from sparsely sampled sources, Chaos Solitons Fractals 45(6) (2012), 853–860.
- 31D. J. Albers and G. Hripcsak, Using time-delayed mutual information to discover and interpret temporal correlation structure in complex populations, Chaos 22(1) (2012), 013111.
- 32G. Hripcsak, D. J. Albers, and A. Perotte, Exploiting time in electronic health record correlations, J Am Med Inform Assoc 18 (2011), 109–115.
- 33C. W. Hug, Predicting the Risk and Trajectory of Intensive Care Patients Using Survival Models. PhD Thesis; MIT, 2006.
- 34G. M. Weber and I. S. Kohane, Extracting physician group intelligence from electronic health records to support evidence based medicine, PLoS ONE 8 (2013), e64933.
- 35 United States Renal Data System, Atlas of chronic kidney disease in the United States, http://www.usrds.org/atlas.aspx. Accessed June 24, 2013, 2012.
- 36 National Kidney Foundation. KDOQI clinical practice guidelines for chronic kidney disease: evaluation, classification, and stratification, http://www.kidney.org/professionals/KDOQI/guidelines_ckd/toc.htm, 2002.
- 37J. L. Gorriz and A. Martinez-Castelao, Proteinuria: detection and role in native renal disease progression, Transpl Rev 26(1) (2012), 3–13.
- 38Y. C. Li, Vitamin D: roles in renal and cardiovascular protection, Curr Opin Nephrol Hypertens 21 (2012), 72–79.
- 39J. B. Echouffo-Tcheugui and A. P. Kengne, Risk models to predict chronic kidney disease and its progression: a systematic review, PLoS Med 9 (2012), e1001344.
- 40K. S. Kinchen, J. Sadler, N. Fink, R. Brookmeyer, M. J. Klag, A. S. Levey, and N. R. Powe, The timing of specialist evaluation in chronic kidney disease and mortality, Ann Intern Med 137(6) (2002), 479–486.
- 41H. S. Chase, J. Radhakrishnan, S. Shirazian, M. K. Rao, and D. K. Vawdrey, Under-documentation of chronic kidney disease in the electronic health record in outpatients, J Am Med Inform Assoc 17(5) (2010), 588–594.
- 42A. S. Levey, J. P. Bosch, J. B. Lewis, T. Greene, N. Rogers, and D. Roth, A more accurate method to estimate glomerular filtration rate from serum creatinine: a new prediction equation, Ann Intern Med 130 (1999), 461–470.
- 43S. Abhyankar, D. Demner-Fushman, and C. McDonald, Standardizing clinical laboratory data for secondary use, J Biomed Inform 45(4) (2012), 642–650.
- 44J. J. Cimino, P. D. Clayton, G. Hripcsak, and S. B. Johnson, Knowledge-based approaches to the maintenance of a large controlled medical terminology, J Am Med Inform Assoc 1(1) (1994), 35–50.
- 45J. H. Lin and P. J. Haug, Exploiting missing clinical data in Bayesian network modeling for predicting medical problems, J Biomed Inform 41(1) (2008), 1–14.
- 46D. R. Cox, Regression models and life tables, J R Stat Soc Ser B 34(2) (1972), 187–220.
10.1111/j.2517-6161.1972.tb00899.x Google Scholar
- 47D. R. Cox and D. Oakes, Analysis of Survival Data, Chapman and Hall/CRC, London, 1984.
- 48J. D. Kalbfleisch and R. L. Prentice, The Statistical Analysis of Failure Time Data, New York, John Wiley & Sons, 1980.
- 49O. Aalen, O. Borgan, and H. Gjessing, Survival and Event History Analysis, New York, Springer-Verlag, 2008.
- 50R. J. Cook and J. F. Lawless, The Statistical Analysis of Recurrent Events, New York, Springer-Verlag, 2007.
- 51T. M. Thernau and P. M. Grambsch, Modeling Survival Data: Extending the Cox Model, New York, Springer-Verlag, 2000.
- 52J. G. Ibrahim, M.-H. Chen, and D. Sinha, Bayesian Survival Analysis, New York, Springer-Verlag, 2001.
10.1007/978-1-4757-3447-8 Google Scholar
- 53W. Nelson, Applied Lifetime Data Analysis, New York, John Wiley & Sons, 1982.
10.1002/0471725234 Google Scholar
- 54E. Marubini and M. G. Valsecchi, Analysing Survival Data from Clinical Trials and Observational Studies, Chichester, UK, John Wiley & Sons, Inc., 1995.
- 55L. M. Leemis, Reliability: Probabilistic Models and Statistial Methods, Lawrence Leemis, Williamsburg, VA, 2009.
- 56C. E. Ebeling, An Introduction to Reliability and Maintainability Engineering, Waveland Press, Long Grove, IL, 2009.
- 57A. Antoniadis, G. Gregoire, and G. Nason, Density and hazard rate estimation for right-censored data by using wavelet methods, J R Stat Soc Ser B, 61(1) (1999), 63–84.
- 58R. J. Gray, Some diagnostic methods for Cox regression models through hazard smoothing, Biometrics 46(1) (1990), 93–102.
- 59R. J. Gray, Flexible methods for analyzing survival data using splines, with application to breast cancer prognosis, J Am Stat Assoc 87(420) (1992), 942–951.
- 60R. J. Gray, Hazard rate regression using ordinary nonparametric regression smoothers, Journal of Computational and Graphical Statistics 5(2) (1996), 190–207.
10.1080/10618600.1996.10474704 Google Scholar
- 61N. Hjort, Nonparametric Bayes estimators based on beta processes in models for life history data, Ann Stat 18(3) (1990), 1259–1294.
- 62J. Lee and Y. Kim, A New Algorithm to Generate Beta Processes, Technical Report, Department of Statistics, Pennsylvania State University, 2002.
- 63E. Arjas and D. Gasbarra, Nonparametric Bayesian inference from right censored survival data, using the Gibbs sampler, Stat Sin 4 (1994), 505–524.
- 64L. Nieto-Barajas and S. Walker, Markov beta and gamma processes for modeling hazard rates, Scand J Stat 29 (2002), 413–424.
- 65D. Sinha and D. K. Dey, Semiparametric Bayesian analysis of survival data, J Am Stat Assoc 92 (1997), 1195–1212.
- 66P. Müller and A. Rodriguez, Nonparametric Bayesian Inference, Beachwood, OH, Institute of Mathematical Statistics and American Statistical Assocation, 2013.
10.1214/13-BA811 Google Scholar
- 67E. D. Kolaczyk, Bayesian multiscale models for Poisson processes, J Am Stat Assoc 94(477) (1999), 920–933.
- 68P. Bouman, V. Dukic, and X. L. Meng, A Bayesian multiresolution hazard model with application to and AIDS reporting delay study, Stat Sin 15 (2005), 325–357.
- 69P. Bouman, J. Dignam, V. Dukic, and X. L. Meng, A multiresolution hazard model for multi-center survival studies: application to Tamoxifen treatment in early stage breast cancer, J Am Stat Assoc 102(480) (2007), 1145–1157.
- 70V. Dukic and J. Dignam, Bayesian hierarchical multiresolution hazard model for the study of time-dependent failure patterns in early stage breast cancer, Bayesian Anal 2(3) (2007), 591–610.
- 71J. J. Dignam, V. Dukic, S. J. Anderson, E. P. Mamounas, D. L. Wickerham, and N. Wolmark, Hazard of recurrence and adjuvant treatment effects over time in lymph node-negative breast cancer, Breast Cancer Res Treat 116 (2009), 595–602.
- 72Y. Chen, J. Dignam, and V. Dukic, Pruned multiresolution hazard (PMRH) models for time-to-event data, Bayesian Anal (2013) (under review).
- 73S. Geman and D. Geman, Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images, IEEE Trans Inf Technol Biomed 6(6) (1984), 721–741.
- 74 R Development Core Team, R: A Language and Environment for Statistical Computing, Vienna, Austria, R Foundation for Statistical Computing, 2008. ISBN 3-900051-07-0.
10.1890/0012-9658(2002)083[3097:CFHIWS]2.0.CO;2 Google Scholar
- 75J. Geweke, Evaluating the accuracy of sampling-based approaches to calculating posterior moments. In Bayesian Statistics 4, (ed. J. M. Bernardo, J. O. Berger, A. P. Dawid, and A. F. M. Smith). Clarendon Press, Oxford, UK, 1992.
10.1093/oso/9780198522669.003.0010 Google Scholar
- 76K.-M. Leung, R. M. Elashoff, and A. A. Afifi, Censoring issues in survival analysis, Annu Rev Public Health 18 (1997), 83–104.
- 77B. K. Mahmoodi, R. T. Gansevoort, N. J. Veeger, A. G. Matthews, G. Navis, H. L. Hillege, J. van der Meer, and Prevention of Renal and Vascular End-stage Disease (PREVEND) Study Group, Microalbuminuria and risk of venous thromboembolism, J Am Med Assoc 301 (2009), 1790–1797.
- 78N. Halbesma, D. F. Jansen, M. W. Heymans, R. P. Stolk, P. E. de Jong, R. T. Gansevoort, and the PREVEND Study Group, Development and validation of a general population renal risk score, Clin J Am Soc Nephrol 6(7) (2011), 1731–1738.
- 79G. Schwarz, Estimating the dimension of a model, Ann Stat 6 (1978), 461–464.
- 80M. J. Fischer, A. S. Go, C. M. Lora, L. Ackerson, J. Cohan, J. W. Kusek, A. Mercado, A. Ojo, A. C. Ricardo, L. K. Rosen, K. Tao, D. Xie, H. I. Feldman, and J. P. Lash, CRIC and H-CRIC study groups: CKD in Hispanics: baseline characteristics from the CRIC (Chronic Renal Insufficiency Cohort) and Hispanic-CRIC studies, Am J Kidney Dis 58 (2011), 214–227.