Predictive Model Detects Candidates for Genetic Testing
Using patient EHRs, a predictive algorithm could identify patients with a potential genetic disease who might benefit from testing
Most rare diseases are genetic in ori-gin, and although the numbers of individuals affected by each disorder are relatively small, when combined, they affect up to 5–10% of the global popu-lation. Many of these diseases have yet to be discovered or characterized, and it can take years to reach an accurate diagnosis. Even for patients with known genetic diseases, clinical symptoms can be heterogenous, making it difficult to identify the underlying etiology.
Genetic and genomic testing has become a standard procedure used to diagnose a patient with an appar-ent genetic condition. However, despite recent advances in technology and knowledge, current approaches to deter-mine which patients should be test-ed lack consistency and, under current practice, genetic testing is not equally or fully provided to those who might stand to benefit most. Even when it comes to the numerous conditions for which genetic testing is explicitly recommend-ed and covered, most patients are still not appropriately tested.
Mining EHRs
Using thousands of deidentified patient records, researchers created a predic-tive algorithm trained to build a model of which patients should receive a genet-ic test. Compared with current methods used to determine eligibility for genet-ic testing, the model was able to identi-fy more patients for testing. Moreover, it showed an increase in the proportion of those tested who have a genetic disease.
The algorithm could be easily repro-duced at other facilities that use EHRs. “The current model is based sole-ly on diagnostic codes that are stan-dard across most EHRs,” says research team member Douglas M. Ruderfer, PhD, Associate Professor of Medicine at Vanderbilt. “Further, a high-performing and easily portable regression model exists and was used for the external vali-dation at Massachusetts General Hospi-tal, an outside facility.”
While these results are promising, Dr. Ruderfer says that the test is not ready to be used in routine clinical practice. “We are currently planning a random-ized controlled trial to assess the clinical impact of this model in practice,” he says. There is no particular patient population to focus on when implementing the algo-rithm, he says; rather, it is about the size of the group. “The larger the propor-tion of undiagnosed patients, the more potential impact the model should have,” he explains. “Under-diagnosis could be a result of reduced resources or less famil-iarity with genetic testing.”
The Study
Longitudinal clinical data stored in the EHR have been used to identify patients at risk for other types of health condi-tions, including recent data showing that specific genetic diseases can be detected by looking for patients who present with many of the expected symptoms. Even though each genetic disease may pres-ent with its own recognizable phenotyp-ic profile, a recurring pattern of multiple phenotypes that are often rare and affect multiple systems throughout can often be detected.
Dr. Ruderfer and colleagues hypoth-esized that this collection of rare and diverse phenotypes “is a hallmark sig-nature of patients with a genetic dis-ease and can be captured from data in the EHR.” They tested their hypothesis by building a machine learning-based prediction model to identify patients with clinical profiles suggestive of a sus-pected genetic disorder. The model was then trained and tested on 2,286 patients who received a chromosomal microarray (CMA) test and 9,144 demographically matched controls using only diagnostic billing information from the EHR.
The majority (95%) of those who had received the CMA test were under 20 years of age (mean age, 8.1 years), and most were male (61.3%) and white (75.6%). About a quarter of the patients (24%, n = 550) had an abnormal result, including 250 with at least 1 gain and 257 with at least 1 loss. Within this group, 37% (201 of 550) had a potential diag-nosis included in the CMA report. While most of the genomic coordinates were unique, several known recurrent syn-dromes were observed more frequent-ly, including 22q11.2 deletion syndrome (also referred to as DiGeorge syndrome), one type of Charcot-Marie-Tooth dis-ease, and 16p11.2 deletion syndrome.
Next, the researchers wanted to dis-tinguish individuals who had received a CMA from matched controls to capture the clinical suspicion of a genetic dis-ease, but in “an automated and system-ized” manner. The model was trained on 80% of the data (1,818 cases; 7,326 con-trols) and then applied to the highest area under the precision–recall curve to the remaining 20% for testing (468 cases; 1,818 controls). The trained algorithm successfully identified 87% of cases where genetic testing had been ordered and 96% where genetic testing had not been ordered.
The algorithm was evaluated for its ability to identify patients with specific genetic diseases in 6,445 patients using Vanderbilt’s DNA biobank, and it was suc-cessful in identifying patients with patho-genic copy number variations. Research-ers then validated it across a larger pop-ulation at Vanderbilt, and externally at a second hospital.
“This is an important problem, and the Vanderbilt group is starting to attack it,” says Wendy Chung, MD, PhD, Kennedy Family Professor of Pediatrics and Medicine and Chief of Clinical Genetics at Columbia University in New York City, weighing in on the study. “The principles are generalizable, and there will need to be iterative refine-ments to improve the positive predictive value, but this is a good start.”