Volume 7, Issue 5 pp. 404-412
Original Article

Big data, big results: Knowledge discovery in output from large-scale analytics

Tyler H. McCormick

Corresponding Author

Tyler H. McCormick

Department of Statistics, University of Washington, Seattle, WA 98195, USA

Tyler H. McCormick ([email protected])Search for more papers by this author
Rebecca Ferrell

Rebecca Ferrell

Department of Statistics, University of Washington, Seattle, WA 98195, USA

Search for more papers by this author
Alan F. Karr

Alan F. Karr

National Institute of Statistical Sciences, Research Triangle Park, NC 27709, USA

Search for more papers by this author
Patrick B. Ryan

Patrick B. Ryan

Janssen Research and Development & OMOP, Titusville, NJ 08560, USA

Search for more papers by this author
First published: 16 September 2014
Citations: 11

Abstract

Observational healthcare data, such as electronic health records and administrative claims databases, provide longitudinal clinical information at the individual level. These data cover tens of millions of patients and present unprecedented opportunities to address such issues as post-market safety of medical products. Analyzing patient-level databases yields population-level inferences, or ‘results’, such as the strength of association between medical product exposure and subsequent outcomes, often with thousands of drugs and outcomes. In this article, by contrast, we study ‘big results’, which are the product of applying thousands of alternative analysis strategies to five large patient databases. These results were produced by the Observational Medical Outcomes Partnership. All together, there are more than 6 million results, comprising risk assessments for 399 medical product–outcome pairs analyzed across five observational databases using seven statistical methods, each of which has between a few dozen and a few hundred variants representing parameters or ‘tuning variables’. We focus on the value of knowledge discovery methods and the challenges in extracting clinically relevant knowledge from big results. We believe our analyses are both scientifically and methodologically valuable as they reveal information about how methods/algorithms perform under various circumstances, as well as provide a basis for comparison of these methods.

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.