Integrating database knowledge and epidemiological design to improve the implementation of data mining methods that evaluate vaccine safety in large healthcare databases
Corresponding Author
Jennifer C. Nelson
Biostatistics Unit, Group Health Research Institute, Seattle, WA 98101, USA
Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
Jennifer Clark Nelson ([email protected])Search for more papers by this authorSusan M. Shortreed
Biostatistics Unit, Group Health Research Institute, Seattle, WA 98101, USA
Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
Search for more papers by this authorOnchee Yu
Biostatistics Unit, Group Health Research Institute, Seattle, WA 98101, USA
Search for more papers by this authorDo Peterson
Biostatistics Unit, Group Health Research Institute, Seattle, WA 98101, USA
Search for more papers by this authorRoger Baxter
Vaccine Study Center and Division of Research, Northern California Kaiser Permanente, Oakland, CA 94612, USA
Search for more papers by this authorBruce Fireman
Vaccine Study Center and Division of Research, Northern California Kaiser Permanente, Oakland, CA 94612, USA
Search for more papers by this authorNed Lewis
Vaccine Study Center and Division of Research, Northern California Kaiser Permanente, Oakland, CA 94612, USA
Search for more papers by this authorDave McClure
Epidemiology Research Center, Marshfield Clinic Research Foundation, Marshfield, WI 54449, USA
Search for more papers by this authorEric Weintraub
Centers for Disease Control and Prevention, Atlanta, GA 30333, USA
Search for more papers by this authorStan Xu
Kaiser Permanente Institute for Health Research, Denver, CO 80231, USA
Search for more papers by this authorLisa A. Jackson
Biostatistics Unit, Group Health Research Institute, Seattle, WA 98101, USA
Department of Epidemiology, University of Washington, Seattle, WA 98195, USA
Search for more papers by this authoron behalf of the Vaccine Safety Datalink project
Search for more papers by this authorCorresponding Author
Jennifer C. Nelson
Biostatistics Unit, Group Health Research Institute, Seattle, WA 98101, USA
Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
Jennifer Clark Nelson ([email protected])Search for more papers by this authorSusan M. Shortreed
Biostatistics Unit, Group Health Research Institute, Seattle, WA 98101, USA
Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
Search for more papers by this authorOnchee Yu
Biostatistics Unit, Group Health Research Institute, Seattle, WA 98101, USA
Search for more papers by this authorDo Peterson
Biostatistics Unit, Group Health Research Institute, Seattle, WA 98101, USA
Search for more papers by this authorRoger Baxter
Vaccine Study Center and Division of Research, Northern California Kaiser Permanente, Oakland, CA 94612, USA
Search for more papers by this authorBruce Fireman
Vaccine Study Center and Division of Research, Northern California Kaiser Permanente, Oakland, CA 94612, USA
Search for more papers by this authorNed Lewis
Vaccine Study Center and Division of Research, Northern California Kaiser Permanente, Oakland, CA 94612, USA
Search for more papers by this authorDave McClure
Epidemiology Research Center, Marshfield Clinic Research Foundation, Marshfield, WI 54449, USA
Search for more papers by this authorEric Weintraub
Centers for Disease Control and Prevention, Atlanta, GA 30333, USA
Search for more papers by this authorStan Xu
Kaiser Permanente Institute for Health Research, Denver, CO 80231, USA
Search for more papers by this authorLisa A. Jackson
Biostatistics Unit, Group Health Research Institute, Seattle, WA 98101, USA
Department of Epidemiology, University of Washington, Seattle, WA 98195, USA
Search for more papers by this authoron behalf of the Vaccine Safety Datalink project
Search for more papers by this authorAbstract
Large healthcare databases maintained by health plans have been widely used to conduct customized protocol-based epidemiological safety studies as well as targeted routine sequential monitoring of suspected adverse events for newly licensed vaccines. These databases also offer a rich data source to discover vaccine-related adverse events not known prior to licensure using data mining methods, but they remain relatively under-utilized for this purpose. Initial safety applications of data mining methods using ‘big healthcare data’ are promising, but stronger integration of database expertize, epidemiological design, and statistical analysis strategies are needed to better leverage the available information, reduce bias, and improve reporting transparency. We enumerate major methodological challenges in mining large healthcare databases for vaccine safety research, describe existing strategies that have been used to address these issues, and identify opportunities for methodological advancements that emphasize the importance of adapting techniques used in customized protocol-based vaccine safety assessments. Investment in such research methods and in the development of deeper collaborations between database safety experts and data mining methodologists has great potential to improve existing safety surveillance programs and further increase public confidence in the safety of newly licensed vaccines.
REFERENCES
- 1R. T. Chen, F. DeStefano, R. L. Davis, L. A. Jackson, R. S. Thompson, J. P. Mullooly, S. B. Black, H. R. Shinefield, C. M. Vadheim, J. I. Ward, and S. M. Marcy, The Vaccine Safety Datalink: immunization research in health maintenance organizations in the USA, Bull World Health Organ 78(2) (2000), 186–194.
- 2J. Baggs, J. Gee, E. Lewis, G. Fowler, P. Benson, T. Lieu, A. Naleway, N. P. Klein, R. Baxter, E. Belongia, J. Glanz, S. J. Hambidge, S. J. Jacobsen, L. Jackson, J. Nordin, and E. Weintraub, The Vaccine Safety Datalink: a model for monitoring immunization safety, Pediatrics 127(Suppl 1) (2011), S45–S53.
- 3T. A. Lieu, M. Kulldorff, R. L. Davis, E. M. Lewis, E. Weintraub, K. Yih, R. Yin, J. S. Brown, and R. Platt, Real-time vaccine safety surveillance for the early detection of adverse events, Med Care 45(10 Supl 2) (2007) S89–S95.
- 4W. K. Yih, J. D. Nordin, M. Kulldorff, E. Lewis, T. A. Lieu, P. Shi, and E. S. Weintraub, An assessment of the safety of adolescent and adult tetanus-diphtheria-acellular pertussis (Tdap) vaccine, using active surveillance for adverse events in the Vaccine Safety Datalink, Vaccine 27(32) (2009), 4257–4262.
- 5E. A. Belongia, S. A. Irving, I. M. Shui, M. Kulldorff, E. Lewis, R. Yin, T. A. Lieu, E. Weintraub, W. K. Yih, R. Li, and J. Baggs, Real-time surveillance to assess risk of intussusception and other adverse events after pentavalent, bovine-derived rotavirus vaccine, Pediatr Infect Dis J 29(1) (2010), 1–5.
- 6N. P. Klein, K. Yih, M. Marin, K. R. Broder, J. Iskander, and D. Snider, Update: recommendations from the Advisory Committee on Immunization Practices (ACIP) regarding administration of combination MMRV vaccine, MMWR Morb Mortal Wkly Rep (2008).
- 7J. Gee, A. Naleway, I. Shui, J. Baggs, R. Yin, R. Li, M. Kulldorff, E. Lewis, B. Fireman, M. F. Daley, N. P. Klein, and E. S. Weintraub, Monitoring the safety of quadrivalent human papillomavirus vaccine: findings from the Vaccine Safety Datalink, Vaccine 29(46) (2011), 8279–8284.
- 8G. M. Lee, S. K. Greene, E. S. Weintraub, J. Baggs, M. Kulldorff, B. H. Fireman, R. Baxter, S. J. Jacobsen, S. Irving, M. F. Daley, R. Yin, A. Naleway, J. D. Nordin, L. Li, N. McCarthy, C. Vellozzi, F. Destefano, T. A. Lieu, and Vaccine Safety Datalink P, H1N1 and seasonal influenza vaccine safety in the vaccine safety datalink project, Am J Prev Med 41(2) (2011), 121–128.
- 9H. F. Tseng, L. S. Sy, I. L. Liu, L. Qian, S. M. Marcy, E. Weintraub, K. Yih, R. Baxter, J. M. Glanz, J. Donahue, A. Naleway, J. Nordin, and S. J. Jacobsen, Postlicensure surveillance for pre-specified adverse events following the 13-valent pneumococcal conjugate vaccine in children, Vaccine 31(22) (2013), 2578–2583.
- 10J. C. Nelson, O. Yu, C. P. Dominguez-Islas, A. J. Cook, D. Peterson, S. K. Greene, W. K. Yih, M. F. Daley, S. J. Jacobsen, N. P. Klein, E. S. Weintraub, K. R. Broder, and L. A. Jackson, Adapting group sequential methods to observational postlicensure vaccine safety surveillance: results of a pentavalent combination DTaP-IPV-Hib vaccine safety study, Am J Epidemiol 177(2) (2013), 131–141.
- 11R. E. Behrman, J. S. Benner, J. S. Brown, M. McClellan, J. Woodcock, and R. Platt, Developing the sentinel system—a national resource for evidence development, N Engl J Med 364(6) (2011), 498–499.
- 12S. J. W. Evans, Waller P. C., and S. Davis, Use of proportional reporting ratios (PRRs) for signal generation from spontaneous adverse drug reaction reports, Pharmacoepidemiol Drug Saf 10(6) (2001), 483–486.
- 13W. DuMouchel, Bayesian data mining in large frequency tables, with an application to the FDA spontaneous reporting system, Am Stat 53(3) (1999), 177–190.
- 14A. Bate, M. Lindquist, I. R. Edwards, S. Olsson, R. Orre, A. Lansner, and R. M. De Freitas, A Bayesian neural network method for adverse drug reaction signal generation, Eur J Clin Pharmacol 54(4) (1998), 315–321.
- 15J. M. Overhage, P. B. Ryan, C. G. Reich, A. G. Hartzema, and P. E. Stang, Validation of a common data model for active safety surveillance research, J Am Med Inform Assoc 19(1) (2012), 54–60.
- 16P. B. Ryan, D. Madigan, P. E. Stang, J. M. Overhage, J. A. Racoosin, and A. G. Hartzema, Empirical assessment of methods for risk identification in healthcare data: results from the experiments of the observational medical outcomes partnership, Stat Med 31(30) (2012), 4401–4415.
- 17P. Ryan, Statistical challenges in systematic evidence generation through analysis of observational healthcare data networks, Stat Methods Med Res 22(1) (2013), 3–6.
- 18J. M. Overhage and L. M. Overhage, Sensible use of observational clinical data, Stat Methods Med Res 22(1) (2013), 7–13.
- 19P. Ryan, M. A. Suchard, M. Schuemie, and D. Madigan, Learning from epidemiology: interpreting observational database studies for the effects of medical products, Stat Biopharm Res (2013), 170–179.
- 20P. C. Tang, M. Ralston, M. F. Arrigotti, L. Qureshi, and J. Graham, Comparison of methodologies for calculating quality measures based on administrative data versus clinical data from an electronic health record system: implications for performance measures, J Am Med Inform Assoc 14(1) (2007), 10–15.
- 21D. Madigan, P. B. Ryan, M. Schuemie, P. E. Stang, J. M. Overhage, A. G. Hartzema, M. A. Suchard, W. Dumouchel, and J. A. Berlin, Evaluating the impact of database heterogeneity on observational study results, Am J Epidemiol 178 (2013), 645–651.
- 22M. A. Brookhart, T. Sturmer, R. J. Glynn, J. Rassen, and S. Schneeweiss, Confounding control in healthcare database research: challenges and potential approaches, Med Care 48(6 Suppl) (2010), S114–S120.
- 23D. Madigan, P.B. Ryan, and M. Schuemie, Does design matter? Systematic evaluation of the impact of analytical choices on effect estimates in observational studies, Ther Adv Drug Saf 4 (2013), 53–62.
- 24O. Caster, G. N. Noran, D. Madigan, and A. Bate, Large-scale regression-based pattern discovery: The example of screening the WHO global drug safety database, Stat Anal Data Mining 3(4) (2010), 197–208.
10.1002/sam.10078 Google Scholar
- 25J. M. Glanz, S. R. Newcomer, K. J. Narwaney, S. J. Hambidge, M. F. Daley, N. M. Wagner, D. L. McClure, S. Xu, A. Rowhani-Rahbar, G. M. Lee, J. C. Nelson, J. G. Donahue, A. L. Naleway, J. D. Nordin, M. M. Lugg, and E. S. Weintraub, A population-based cohort study of undervaccination in 8 managed care organizations across the United States, JAMA Pediatr 167(3) (2013), 274–281.
- 26M. A. Brookhart, A. R. Patrick, C. Dormuth, J. Avorn, W. Shrank, S. M. Cadarette, and D. H. Solomon, Adherence to lipid-lowering therapy and the use of preventive health services: an investigation of the healthy user effect, Am J Epidemiol 166(3) (2007), 348–354.
- 27R. J. Glynn, S. Schneeweiss, P. S. Wang, R. Levin, and J. Avorn, Selective prescribing led to overestimation of the benefits of lipid-lowering drugs, J Clin Epidemiol 59(8) (2006), 819–828.
- 28S. Schneeweiss and P. S. Wang, Association between SSRI use and hip fractures and the effect of residual confounding bias in claims database studies, J Clin Psychopharmacol 24(6) (2004), 632–638.
- 29L. A. Jackson, J. C. Nelson, P. Benson, K. M. Neuzil, R. J. Reid, B. M. Psaty, S. R. Heckbert, E. B. Larson, and N. S. Weiss, Functional status is a confounder of the association of influenza vaccine and risk of all cause mortality in seniors, Int J Epidemiol 35(2) (2006), 345–352.
- 30L. A. Jackson, M. L. Jackson, J. C. Nelson, K. M. Neuzil, and N. S. Weiss, Evidence of bias in estimates of influenza vaccine effectiveness in seniors, Int J Epidemiol 35(2) (2006), 337–344.
- 31 CDC, Continued shortage of Haemophilus influenzae type b (Hib) conjugate vaccines and potential implications for Hib surveillance—United States 2008, MMWR 57 (2008), 1252–1255.
- 32I. M. Shui, M. D. Rett, E. Weintraub, M. Marcy, A. A. Amato, S. I. Sheikh, D. Ho, G. M. Lee, W. K. Yih, and Vaccine Safety Datalink Research T, Guillain-Barre syndrome incidence in a large United States cohort (2000–2009), Neuroepidemiology 39(2) (2012), 109–115.
- 33W. K. Yih, M. Kulldorff, B. H. Fireman, I. M. Shui, E. M. Lewis, N. P. Klein, J. Baggs, E. S. Weintraub, E. A. Belongia, A. Naleway, J. Gee, R. Platt, and T. A. Lieu, Active surveillance for adverse events: the experience of the Vaccine Safety Datalink project, Pediatrics 127(Suppl 1) (2011), S54–S64.
- 34D. J. Hand, Mining medical data, Stat Methods Med Res 9(4) (2000), 305–307.
- 35E. K. France, J. M. Glanz, S. Xu, R. L. Davis, S. B. Black, H. R. Shinefield, K. M. Zangwill, S. M. Marcy, J. P. Mullooly, L. A. Jackson, and R. Chen, Safety of the trivalent inactivated influenza vaccine among children: a population-based study, Arch Pediatr Adolesc Med 158(11) (2004), 1031–1036.
- 36S. J. Hambidge, J. M. Glanz, E. K. France, D. McClure, S. Xu, K. Yamasaki, L. Jackson, J. P. Mullooly, K. M. Zangwill, S. M. Marcy, S. B. Black, E. M. Lewis, H. R. Shinefield, E. Belongia, J. Nordin, R. T. Chen, D. K. Shay, R. L. Davis, and F. DeStefano, Safety of trivalent inactivated influenza vaccine in children 6 to 23 months old, JAMA 296(16) (2006), 1990–1997.
- 37J. M. Glanz, The safety of trivalent inactivated influenza vaccine in children ages 24 to 59 months, Arch Pediatr Adolesc Med 165(8) (2011), 749–755.
- 38R. Baxter, S. L. Toback, F. Sifakis, J. Hansen, J. Bartlett, L. Aukes, N. Lewis, X. Wu, and C. S. Ambrose, A postmarketing evaluation of the safety of Ann Arbor strain live attenuated influenza vaccine in children 5 through 17 years of age, Vaccine 30(19) (2012), 2989–2998.
- 39N. P. Klein, J. Hansen, C. Chao, C. Velicer, M. Emery, J. Slezak, N. Lewis, K. Deosaransingh, L. Sy, B. Ackerson, T. C. Cheetham, K. L. Liaw, H. Takhar, and S. J. Jacobsen, Safety of quadrivalent human papillomavirus vaccine administered routinely to females, Arch Pediatr Adolesc Med 166(12) (2012), 1140–1148.
- 40R. Baxter, T. N. Tran, J. Hansen, M. Emery, B. Fireman, J. Bartlett, N. Lewis, and P. Saddier, Safety of Zostavax—a cohort study in a managed care organization, Vaccine 30(47) (2012), 6636–6641.
- 41H. Svanstrom, T. Callreus, and A. Hviid, Temporal data mining for adverse events following immunization in nationwide Danish healthcare databases, Drug Saf 33(11) (2010), 1015–1025.
- 42J. R. Curtis, H. Cheng, E. Delzell, D. Fram, M. Kilgore, K. Saag, H. Yun, and W. Dumouchel, Adaptation of Bayesian data mining algorithms to longitudinal claims data: coxib safety as an example, Med Care 46(9) (2008), 969–975.
- 43M. Kulldorff, I. Dashevsky, T. R. Avery, A. K. Chan, R. L. Davis, D. Graham, R. Platt, S. E. Andrade, D. Boudreau, M. J. Gunter, L. J. Herrinton, P. A. Pawloski, M. A. Raebel, D. Roblin, and J. S. Brown, Drug safety data mining with a tree-based scan statistic, Pharmacoepidemiol Drug Saf 22(5) (2013), 517–523.
- 44N.-K. Choi, Y. Chang, J.-Y. Kim, Y.-K. Choi, and B.-J. Park, Comparison and validation of data-mining indices for signal detection: using the Korean national health insurance claims database, Pharmacoepidemiol Drug Saf 20(12) (2011), 1278–1286.
- 45N. K. Choi, Y. Chang, Y. K. Choi, S. Hahn, and B. J. Park, Signal detection of rosuvastatin compared to other statins: data-mining study using national health insurance claims database, Pharmacoepidemiol Drug Saf 19(3) (2010), 238–246.
- 46A. Bate, I. Edwards, J. Edwards, E. Swahn, G. N. Noren, and M. Lindquist, Knowledge finding in IMS Disease Analyser Mediplus UK database—effective data mining in longitudinal patient safety data, Drug Saf 27 (2000), 917–918.
- 47H. W. Jin, J. Chen, H. He, G. J. Williams, C. Kelman, and C. M. O'Keefe, Mining unexpected temporal associations: applications in detecting adverse drug reactions, IEEE Trans Inf Technol Biomed 12(4) (2008), 488–500.
- 48 HCUP Clinical Classifications Software (CCS) for ICD-9-CM. Healthcare Cost and Utilization Project (HCUP). 2006–2009. Agency for Healthcare Research and Quality, Rockville, MD. http://www.hcup-us.ahrq.gov/toolssoftware/ccs/ccs.jsp. Accessed June 6, 2014.
- 49I. Zorych, D. Madigan, P. Ryan, and A. Bate, Disproportionality methods for pharmacovigilance in longitudinal observational databases, Stat Methods Med Res 22(1) (2013), 39–56.
- 50M. Kulldorff, Z. Fang, and S. J. Walsh, A tree-based scan statistic for database disease surveillance, Biometrics 59(2) (2003), 323–331.
- 51C. Reich, P. B. Ryan, P. E. Stang, and M. Rocca, Evaluation of alternative standardized terminologies for medical conditions within a network of observational healthcare databases, J Biomed Inform 45(4) (2012), 689–696.
- 52N. Pratt, E. E. Roughead, E. Ramsay, A. Salter, and P. Ryan, Risk of hospitalization for hip fracture and pneumonia associated with antipsychotic prescribing in the elderly: a self-controlled case-series analysis in an Australian health care claims database, Drug Saf 34(7) (2011), 567–575.
- 53S. Xu, L. Zhang, J. C. Nelson, C. Zeng, J. Mullooly, D. McClure, and J. Glanz, Identifying optimal risk windows for self-controlled case series studies of vaccine safety, Stat Med 30(7) (2011), 742–752.
- 54S. Xu, S. J. Hambidge, D. L. McClure, M. F. Daley, and J. M. Glanz, A scan statistic for identifying optimal risk windows in vaccine safety studies using self-controlled case series design, Stat Med 32(19) (2013), 3290–3299.
- 55A. Rowhani-Rahbar, N. P. Klein, C. L. Dekker, K. M. Edwards, C. D. Marchant, C. Vellozzi, B. Fireman, J. J. Sejvar, N. A. Halsey, and R. Baxter, Biologically plausible and evidence-based risk intervals in immunization safety research, Vaccine 31(1) (2012), 271–277.
- 56O. Caster, G. N. Noren, D. Madigan, and A. Bate, Large-scale regression-based pattern discovery in international adverse drug reaction surveillance, In Proceedings of the KDD-08 Workshop on Mining Medical Data, 2008.
- 57A. Genkin, D. D. Lewis, and D. Madigan, Large-scale Bayesian logistic regression for text categorization, Technometrics 49 (2007), 291–304.
- 58P. Rosenbaum, and D. Rubin, The central role of the propensity score in observational studies for causal effects, Biometrika 70(1) (1983), 41–55.
- 59S. Schneeweiss, J. A. Rassen, R. J. Glynn, J. Avorn, H. Mogun, and M. A. Brookhart, High-dimensional propensity score adjustment in studies of treatment effects using health care claims data, Epidemiology 20(4) (2009), 512–522.
- 60S. Toh, Garcia Rodriguez L. A., and M. A. Hernán, Confounding adjustment via a semi-automated high-dimensional propensity score algorithm: an application to electronic medical records, Pharmacoepidemiol Drug Saf 20(8) (2011), 849–857.
- 61D. F. McCaffrey, G. Ridgeway, and A. R. Morral, Propensity score estimation with boosted regression for evaluating causal effects in observational studies, Psychol Methods 9(4) (2004), 403425.
- 62D. Westreich, J. Lessler, and M. J. Funk, Propensity score estimation: neural networks, support vector machines, decision trees (CART), and meta-classifiers as alternatives to logistic regression, J Clin Epidemiol 63(8) (2010), 826–833.
- 63B. K. Lee, J. Lessler, and E. A. Stuart, Improving propensity score weighting using machine learning, Stat Med 29(3) (2010), 337–346.
- 64M. Lindquist, Use of triage strategies in the WHO signal-detection process, Drug Saf 30(7) (2007), 635–637.
- 65M. Stahl, M. Lindquist, I. R. Edwards, and E. G. Brown, Introducing triage logic as a new strategy for the detection of signals in the WHO Drug Monitoring Database, Pharmacoepidemiol Drug Saf 13(6) (2004), 355–363.
- 66R. C. Zink, R. D. Wolfinger, and G. Mann, Summarizing the incidence of adverse events using volcano plots and time intervals, Clin Trials 10 (2013), 398–406.
- 67P. Coloma, G. TrifirÃ, V. Patadia, and M. Sturkenboom, Postmarketing safety surveillance, Drug Saf 36(3) (2013), 183–197.
- 68M. Hauben, L. Reich, C. M. Gerrits, and D. Madigan, Detection of spironolactone-associated hyperkalaemia following the Randomized Aldactone Evaluation Study (RALES), Drug Saf 30(12) (2007), 1143–1149.
- 69M. Hauben and A. Bate, Decision support methods for the detection of adverse events in post-marketing data, Drug Discov Today 14(7–8) (2009), 343–357.