This article presents a detailed survival analysis for chronic kidney disease (CKD). The analysis is based on the electronic health record (EHR) data comprising almost two decades of clinical observations collected at New York-Presbyterian, a large hospital in New York City with one of the oldest electronic health records in the United States. Our survival analysis approach centers around Bayesian multiresolution hazard modeling, with an objective to capture the changing hazard of CKD over time, adjusted for patient clinical covariates and kidney-related laboratory tests. Special attention is paid to statistical issues common to all EHR data, such as cohort definition, missing data and censoring, variable selection, and potential for joint survival and longitudinal modeling, all of which are discussed alone and within the EHR CKD context.

Supporting Information

REFERENCES

1J. S. Brownstein, K. P. Kleinman, and K. D. Mandl, Identifying pediatric age groups for influenza vaccination using a real-time regional surveillance system, Am J Epidemiol 162(7) (2005), 686–693.
10.1093/aje/kwi257
PubMed Web of Science® Google Scholar
2J. Feldman, E. P. Hoffer, G. O. Barnett, R. J. Kim, K. T. Famiglietti, and H. Chueh, Presence of key findings in the medical record prior to a documented high-risk diagnosis, J Am Med Inform Assoc 19(4) (2012), 591–596.
10.1136/amiajnl-2011-000375
PubMed Web of Science® Google Scholar
3C. P. Friedman, A. K. Wong, and D. Blumenthal, Achieving a nationwide learning health system, Sci Transl Med 2(57) (2010), 57cm29.
10.1126/scitranslmed.3001456
CAS Web of Science® Google Scholar
4S. A. Collins and D. K. Vawdrey, “Reading between the lines” of flow sheet data: nurses' optional documentation associated with cardiac arrest outcomes, Applied Nursing Research 25(4) (2012), 251–257.
10.1016/j.apnr.2011.06.002
PubMed Web of Science® Google Scholar
5J. van der Lei, Use and abuse of computer stored medical records, Methods Inf Med 30 (1991), 79.
CAS PubMed Web of Science® Google Scholar
6W. Hogan and M. Wagner, Accuracy of data in computer-based patient records, J Am Med Inf Assoc 4 (1997), 342–355.
10.1136/jamia.1997.0040342
CAS PubMed Web of Science® Google Scholar
7G. Hripcsak, C. Knirsch, L. Zhao, A. Wilcox, and G. B. Milton, Using discordance to improve classification in narrative clinical databased: an application to community-acquired pneumonia, Comput Math Biomed Eng 37(3) (2007), 296–304.
PubMed Web of Science® Google Scholar
8H. Sagreiya and R. B. Altman, The utility of general purpose versus specialty clinical databases for research: Warfarin dose estimation from extracted clinical variables, J Biomed Inform 43 (2010), 747–751.
10.1016/j.jbi.2010.03.014
CAS PubMed Web of Science® Google Scholar
9G. Hripcsak and D. J. Albers, Next-generation phenotyping of electronic health records, J Am Med Inform Assoc 10 (2012), 1–5.
Google Scholar
10S. Kleinberg and N. Elhadad, Lessons learned in replicating data-driven experiments in multiple medical systems and patient populations, In Proceedings AMIA (American Medical Informatics Association) Annual Fall Symposium, 2013.
Google Scholar
11K. J. Rothman and S. Greenland, Modern Epidemiology, Wolters Kluwer Health, Philadelphia, PA, 2008.
CAS PubMed Web of Science® Google Scholar
12F. Dominici, A. McDermott, and T. Hastie, Improved semi-parametric time series models of air pollution and mortality, J Am Stat Assoc 99(468) (2004), 938–948.
10.1198/016214504000000656
Web of Science® Google Scholar
13V. Dukic, M. Hayden, A. Forgor, T. Hopson, P. Akweongo, A. Hodgson, A. Monaghan, C. Wiedinmyer, T. Yoksas, M. C. Thomson, S. Trzaska, and R. Pandya, The role of weather in meningitis outbreaks in Navrongo, Ghana: a generalized additive modeling approach, J Agric Biol Environ Stat 17(3) (2012): 442–460.
10.1007/s13253-012-0095-9
Web of Science® Google Scholar
14J. Wennberg, Tracking Medicine: A Researcher's Quest to Understand Health Care, New York, Oxford University Press, 2010.
Google Scholar
15S.-L. Normand, M. Glickman, and C. Gatsonis, Statistical methods for profiling providers of medical care: issues and applications, J Am Stat Assoc 92 (1997), 803–814.
10.1080/01621459.1997.10474036
Web of Science® Google Scholar
16S.-L. Normand and D. M. Shahian, Statistical and clinical aspects of hospital outcomes profiling, Stat Sci 22 (2007), 206–226.
10.1214/088342307000000096
Web of Science® Google Scholar
17 J. P. T Higgins and S. Green, eds. Cochrane Handbook for Systematic Reviews of Interventions, The Cochrane Collaboration, www.cochrane-handbook.org. Accessed August 11, 2014.
Google Scholar
18V. Dukic and C. Gatsonis, Meta-analysis of diagnostic test accuracy assessment studies with varying number of thresholds, Biometrics 59(4) (2003), 936–946.
10.1111/j.0006-341X.2003.00108.x
CAS PubMed Web of Science® Google Scholar
19K. M. Newton, P. L. Peissig, A. N. Kho, S. J. Bielinski, R. L. Berg, V. Choudhary, M. Basford, C. G. Chute, I. J. Kullo, R. Li, J. A. Pacheco, L. V. Rasmussen, L. Spangler, and J. C. Denny, Validation of electronic medical record-based phenotyping algorithms: results and lessons learned from the eMERGE network, J Am Med Inform Assoc 20(e1) (2013), e147–e154.
10.1136/amiajnl-2012-000896
PubMed Web of Science® Google Scholar
20R. J. Little, Pattern-mixture models for multivariate incomplete data, J Am Stat Assoc 88 (1993), 125–134.
10.1080/01621459.1993.10594302
Web of Science® Google Scholar
21R. J. A. Little and D. B. Rubin, Statistical Analysis with Missing Data, New York, Wiley, 2002.
10.1002/9781119013563
Google Scholar
22R. Pivovarov, D. J. Albers, J. L. Sepulveda, and N. Elhadad, Identifying and mitigating biases in EHR laboratory tests, J Biomed Inform. Forthcoming (2014).
10.1016/j.jbi.2014.03.016
PubMed Web of Science® Google Scholar
23M. J. Daniels and J. W. Hogan, Missing Data in Longitudinal Studies: Strategies for Bayesian Modeling and Sensitivity Analysis, Chapman and Hall/CRC, Boca Raton, FL, 2008.
10.1201/9781420011180
Google Scholar
24R. Pivovarov and N. Elhadad, A hybrid knowledge-based and data-driven approach to identifying semantically similar concepts, J Biomed Inform 45(3) (2012), 471–481.
10.1016/j.jbi.2012.01.002
PubMed Web of Science® Google Scholar
25J. Fan, Y. Feng, and Y. Wu, High-Dimensional Variable Selection for Cox's Proportional Hazards Model, Vol. 6, Beachwood, OH, Institute of Mathematical Statistics, 2010, 70–86.
Google Scholar
26J. Hogan and N. Laird, Model-based approaches to analysing incomplete longitudinal and failure time data, Stat Med 16(1–3) (1997), 259–272.
10.1002/(SICI)1097-0258(19970215)16:3<259::AID-SIM484>3.0.CO;2-S
CAS PubMed Google Scholar
27L. Wu, W. Liu, G. Yi, and Y. Huang. Analysis of longitudinal and survival data: joint modeling, inference methods, and issues, J Probab Stat 2012(3) (2012), 640153.
10.1155/2012/640153
Google Scholar
28J. Ibrahim, M. H. Chen, and D. Sinha, Bayesian methods for joint modeling of longitudinal and survival data with applications to cancer vaccine trials, Statistica Sinica 14(3) (2004), 863–883.
Web of Science® Google Scholar
29J. Ding and J.-L. Wang, Modeling longitudinal data with nonparametric multiplicative random effects jointly with survival data, Biometrics 64(2) (2008), 546–556.
10.1111/j.1541-0420.2007.00896.x
CAS PubMed Web of Science® Google Scholar
30D. J. Albers and G. Hripcsak, Estimation of time-delayed mutual information from sparsely sampled sources, Chaos Solitons Fractals 45(6) (2012), 853–860.
10.1016/j.chaos.2012.03.003
PubMed Web of Science® Google Scholar
31D. J. Albers and G. Hripcsak, Using time-delayed mutual information to discover and interpret temporal correlation structure in complex populations, Chaos 22(1) (2012), 013111.
10.1063/1.3675621
PubMed Web of Science® Google Scholar
32G. Hripcsak, D. J. Albers, and A. Perotte, Exploiting time in electronic health record correlations, J Am Med Inform Assoc 18 (2011), 109–115.
10.1136/amiajnl-2011-000463
PubMed Web of Science® Google Scholar
33C. W. Hug, Predicting the Risk and Trajectory of Intensive Care Patients Using Survival Models. PhD Thesis; MIT, 2006.
Google Scholar
34G. M. Weber and I. S. Kohane, Extracting physician group intelligence from electronic health records to support evidence based medicine, PLoS ONE 8 (2013), e64933.
10.1371/journal.pone.0064933
PubMed Web of Science® Google Scholar
35 United States Renal Data System, Atlas of chronic kidney disease in the United States, http://www.usrds.org/atlas.aspx. Accessed June 24, 2013, 2012.
Google Scholar
36 National Kidney Foundation. KDOQI clinical practice guidelines for chronic kidney disease: evaluation, classification, and stratification, http://www.kidney.org/professionals/KDOQI/guidelines_ckd/toc.htm, 2002.
Google Scholar
37J. L. Gorriz and A. Martinez-Castelao, Proteinuria: detection and role in native renal disease progression, Transpl Rev 26(1) (2012), 3–13.
10.1016/j.trre.2011.10.002
PubMed Web of Science® Google Scholar
38Y. C. Li, Vitamin D: roles in renal and cardiovascular protection, Curr Opin Nephrol Hypertens 21 (2012), 72–79.
10.1097/MNH.0b013e32834de4ee
CAS PubMed Web of Science® Google Scholar
39J. B. Echouffo-Tcheugui and A. P. Kengne, Risk models to predict chronic kidney disease and its progression: a systematic review, PLoS Med 9 (2012), e1001344.
10.1371/journal.pmed.1001344
PubMed Web of Science® Google Scholar
40K. S. Kinchen, J. Sadler, N. Fink, R. Brookmeyer, M. J. Klag, A. S. Levey, and N. R. Powe, The timing of specialist evaluation in chronic kidney disease and mortality, Ann Intern Med 137(6) (2002), 479–486.
10.7326/0003-4819-137-6-200209170-00007
PubMed Web of Science® Google Scholar
41H. S. Chase, J. Radhakrishnan, S. Shirazian, M. K. Rao, and D. K. Vawdrey, Under-documentation of chronic kidney disease in the electronic health record in outpatients, J Am Med Inform Assoc 17(5) (2010), 588–594.
10.1136/jamia.2009.001396
PubMed Web of Science® Google Scholar
42A. S. Levey, J. P. Bosch, J. B. Lewis, T. Greene, N. Rogers, and D. Roth, A more accurate method to estimate glomerular filtration rate from serum creatinine: a new prediction equation, Ann Intern Med 130 (1999), 461–470.
10.7326/0003-4819-130-6-199903160-00002
CAS PubMed Web of Science® Google Scholar
43S. Abhyankar, D. Demner-Fushman, and C. McDonald, Standardizing clinical laboratory data for secondary use, J Biomed Inform 45(4) (2012), 642–650.
10.1016/j.jbi.2012.04.012
PubMed Web of Science® Google Scholar
44J. J. Cimino, P. D. Clayton, G. Hripcsak, and S. B. Johnson, Knowledge-based approaches to the maintenance of a large controlled medical terminology, J Am Med Inform Assoc 1(1) (1994), 35–50.
10.1136/jamia.1994.95236135
CAS PubMed Web of Science® Google Scholar
45J. H. Lin and P. J. Haug, Exploiting missing clinical data in Bayesian network modeling for predicting medical problems, J Biomed Inform 41(1) (2008), 1–14.
10.1016/j.jbi.2007.06.001
PubMed Web of Science® Google Scholar
46D. R. Cox, Regression models and life tables, J R Stat Soc Ser B 34(2) (1972), 187–220.
10.1111/j.2517-6161.1972.tb00899.x
Google Scholar
47D. R. Cox and D. Oakes, Analysis of Survival Data, Chapman and Hall/CRC, London, 1984.
Google Scholar
48J. D. Kalbfleisch and R. L. Prentice, The Statistical Analysis of Failure Time Data, New York, John Wiley & Sons, 1980.
CAS Google Scholar
49O. Aalen, O. Borgan, and H. Gjessing, Survival and Event History Analysis, New York, Springer-Verlag, 2008.
10.1007/978-0-387-68560-1
Web of Science® Google Scholar
50R. J. Cook and J. F. Lawless, The Statistical Analysis of Recurrent Events, New York, Springer-Verlag, 2007.
Google Scholar
51T. M. Thernau and P. M. Grambsch, Modeling Survival Data: Extending the Cox Model, New York, Springer-Verlag, 2000.
Google Scholar
52J. G. Ibrahim, M.-H. Chen, and D. Sinha, Bayesian Survival Analysis, New York, Springer-Verlag, 2001.
10.1007/978-1-4757-3447-8
Google Scholar
53W. Nelson, Applied Lifetime Data Analysis, New York, John Wiley & Sons, 1982.
10.1002/0471725234
Google Scholar
54E. Marubini and M. G. Valsecchi, Analysing Survival Data from Clinical Trials and Observational Studies, Chichester, UK, John Wiley & Sons, Inc., 1995.
Google Scholar
55L. M. Leemis, Reliability: Probabilistic Models and Statistial Methods, Lawrence Leemis, Williamsburg, VA, 2009.
Google Scholar
56C. E. Ebeling, An Introduction to Reliability and Maintainability Engineering, Waveland Press, Long Grove, IL, 2009.
Google Scholar
57A. Antoniadis, G. Gregoire, and G. Nason, Density and hazard rate estimation for right-censored data by using wavelet methods, J R Stat Soc Ser B, 61(1) (1999), 63–84.
10.1111/1467-9868.00163
Web of Science® Google Scholar
58R. J. Gray, Some diagnostic methods for Cox regression models through hazard smoothing, Biometrics 46(1) (1990), 93–102.
10.2307/2531633
PubMed Web of Science® Google Scholar
59R. J. Gray, Flexible methods for analyzing survival data using splines, with application to breast cancer prognosis, J Am Stat Assoc 87(420) (1992), 942–951.
10.1080/01621459.1992.10476248
Web of Science® Google Scholar
60R. J. Gray, Hazard rate regression using ordinary nonparametric regression smoothers, Journal of Computational and Graphical Statistics 5(2) (1996), 190–207.
10.1080/10618600.1996.10474704
Google Scholar
61N. Hjort, Nonparametric Bayes estimators based on beta processes in models for life history data, Ann Stat 18(3) (1990), 1259–1294.
10.1214/aos/1176347749
Web of Science® Google Scholar
62J. Lee and Y. Kim, A New Algorithm to Generate Beta Processes, Technical Report, Department of Statistics, Pennsylvania State University, 2002.
Google Scholar
63E. Arjas and D. Gasbarra, Nonparametric Bayesian inference from right censored survival data, using the Gibbs sampler, Stat Sin 4 (1994), 505–524.
Web of Science® Google Scholar
64L. Nieto-Barajas and S. Walker, Markov beta and gamma processes for modeling hazard rates, Scand J Stat 29 (2002), 413–424.
10.1111/1467-9469.00298
Web of Science® Google Scholar
65D. Sinha and D. K. Dey, Semiparametric Bayesian analysis of survival data, J Am Stat Assoc 92 (1997), 1195–1212.
10.1080/01621459.1997.10474077
Web of Science® Google Scholar
66P. Müller and A. Rodriguez, Nonparametric Bayesian Inference, Beachwood, OH, Institute of Mathematical Statistics and American Statistical Assocation, 2013.
10.1214/13-BA811
Google Scholar
67E. D. Kolaczyk, Bayesian multiscale models for Poisson processes, J Am Stat Assoc 94(477) (1999), 920–933.
Google Scholar
68P. Bouman, V. Dukic, and X. L. Meng, A Bayesian multiresolution hazard model with application to and AIDS reporting delay study, Stat Sin 15 (2005), 325–357.
Web of Science® Google Scholar
69P. Bouman, J. Dignam, V. Dukic, and X. L. Meng, A multiresolution hazard model for multi-center survival studies: application to Tamoxifen treatment in early stage breast cancer, J Am Stat Assoc 102(480) (2007), 1145–1157.
10.1198/016214506000000951
CAS PubMed Web of Science® Google Scholar
70V. Dukic and J. Dignam, Bayesian hierarchical multiresolution hazard model for the study of time-dependent failure patterns in early stage breast cancer, Bayesian Anal 2(3) (2007), 591–610.
10.1214/07-BA223
PubMed Google Scholar
71J. J. Dignam, V. Dukic, S. J. Anderson, E. P. Mamounas, D. L. Wickerham, and N. Wolmark, Hazard of recurrence and adjuvant treatment effects over time in lymph node-negative breast cancer, Breast Cancer Res Treat 116 (2009), 595–602.
10.1007/s10549-008-0200-5
CAS PubMed Web of Science® Google Scholar
72Y. Chen, J. Dignam, and V. Dukic, Pruned multiresolution hazard (PMRH) models for time-to-event data, Bayesian Anal (2013) (under review).
Google Scholar
73S. Geman and D. Geman, Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images, IEEE Trans Inf Technol Biomed 6(6) (1984), 721–741.
Google Scholar
74 R Development Core Team, R: A Language and Environment for Statistical Computing, Vienna, Austria, R Foundation for Statistical Computing, 2008. ISBN 3-900051-07-0.
10.1890/0012-9658(2002)083[3097:CFHIWS]2.0.CO;2
Google Scholar
75J. Geweke, Evaluating the accuracy of sampling-based approaches to calculating posterior moments. In Bayesian Statistics 4, (ed. J. M. Bernardo, J. O. Berger, A. P. Dawid, and A. F. M. Smith). Clarendon Press, Oxford, UK, 1992.
10.1093/oso/9780198522669.003.0010
Google Scholar
76K.-M. Leung, R. M. Elashoff, and A. A. Afifi, Censoring issues in survival analysis, Annu Rev Public Health 18 (1997), 83–104.
10.1146/annurev.publhealth.18.1.83
CAS PubMed Web of Science® Google Scholar
77B. K. Mahmoodi, R. T. Gansevoort, N. J. Veeger, A. G. Matthews, G. Navis, H. L. Hillege, J. van der Meer, and Prevention of Renal and Vascular End-stage Disease (PREVEND) Study Group, Microalbuminuria and risk of venous thromboembolism, J Am Med Assoc 301 (2009), 1790–1797.
10.1001/jama.2009.565
CAS PubMed Web of Science® Google Scholar
78N. Halbesma, D. F. Jansen, M. W. Heymans, R. P. Stolk, P. E. de Jong, R. T. Gansevoort, and the PREVEND Study Group, Development and validation of a general population renal risk score, Clin J Am Soc Nephrol 6(7) (2011), 1731–1738.
10.2215/CJN.08590910
CAS PubMed Web of Science® Google Scholar
79G. Schwarz, Estimating the dimension of a model, Ann Stat 6 (1978), 461–464.
10.1214/aos/1176344136
PubMed Web of Science® Google Scholar
80M. J. Fischer, A. S. Go, C. M. Lora, L. Ackerson, J. Cohan, J. W. Kusek, A. Mercado, A. Ojo, A. C. Ricardo, L. K. Rosen, K. Tao, D. Xie, H. I. Feldman, and J. P. Lash, CRIC and H-CRIC study groups: CKD in Hispanics: baseline characteristics from the CRIC (Chronic Renal Insufficiency Cohort) and Hispanic-CRIC studies, Am J Kidney Dis 58 (2011), 214–227.
10.1053/j.ajkd.2011.05.010
PubMed Web of Science® Google Scholar

Citing Literature

Volume7, Issue5

Special Issue on Observational Healthcare Data

October 2014

Pages 385-403

Survival analysis with electronic health record data: Experiments with chronic kidney disease

Abstract

Supporting Information

REFERENCES

Citing Literature

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

Survival analysis with electronic health record data: Experiments with chronic kidney disease

Abstract

Supporting Information

REFERENCES

Citing Literature

References

Related

Information