Volume 188, Issue 4 pp. 1024-1025
THE AJMG SEQUENCE: DECODING NEWS AND TRENDS FOR THE MEDICAL GENETICS COMMUNITY BY ROXANNE NELSON
Full Access

Predictive Model Detects Candidates for Genetic Testing

First published: 14 March 2022

Using patient EHRs, a predictive algorithm could identify patients with a potential genetic disease who might benefit from testing

Most rare diseases are genetic in ori-gin, and although the numbers of individuals affected by each disorder are relatively small, when combined, they affect up to 5–10% of the global popu-lation. Many of these diseases have yet to be discovered or characterized, and it can take years to reach an accurate diagnosis. Even for patients with known genetic diseases, clinical symptoms can be heterogenous, making it difficult to identify the underlying etiology.

Genetic and genomic testing has become a standard procedure used to diagnose a patient with an appar-ent genetic condition. However, despite recent advances in technology and knowledge, current approaches to deter-mine which patients should be test-ed lack consistency and, under current practice, genetic testing is not equally or fully provided to those who might stand to benefit most. Even when it comes to the numerous conditions for which genetic testing is explicitly recommend-ed and covered, most patients are still not appropriately tested.

There is currently no system in place to identify those patients most likely to have a rare genetic disease—one that could also guide genetic testing decision-making to improve diagnostic outcomes, reduce associated healthcare costs and the bur-den on patients, and enable opportuni-ties for improved care. However, research-ers at Vanderbilt University in Nashville are now exploring a potential solution: a method that would draw on information routinely available in patients’ electronic health records (EHRs) to streamline the process of characterizing rare disease gene variants (Morley et al, 2021).

Mining EHRs

Using thousands of deidentified patient records, researchers created a predic-tive algorithm trained to build a model of which patients should receive a genet-ic test. Compared with current methods used to determine eligibility for genet-ic testing, the model was able to identi-fy more patients for testing. Moreover, it showed an increase in the proportion of those tested who have a genetic disease.

The algorithm could be easily repro-duced at other facilities that use EHRs. “The current model is based sole-ly on diagnostic codes that are stan-dard across most EHRs,” says research team member Douglas M. Ruderfer, PhD, Associate Professor of Medicine at Vanderbilt. “Further, a high-performing and easily portable regression model exists and was used for the external vali-dation at Massachusetts General Hospi-tal, an outside facility.”

While these results are promising, Dr. Ruderfer says that the test is not ready to be used in routine clinical practice. “We are currently planning a random-ized controlled trial to assess the clinical impact of this model in practice,” he says. There is no particular patient population to focus on when implementing the algo-rithm, he says; rather, it is about the size of the group. “The larger the propor-tion of undiagnosed patients, the more potential impact the model should have,” he explains. “Under-diagnosis could be a result of reduced resources or less famil-iarity with genetic testing.”

The Study

Longitudinal clinical data stored in the EHR have been used to identify patients at risk for other types of health condi-tions, including recent data showing that specific genetic diseases can be detected by looking for patients who present with many of the expected symptoms. Even though each genetic disease may pres-ent with its own recognizable phenotyp-ic profile, a recurring pattern of multiple phenotypes that are often rare and affect multiple systems throughout can often be detected.

Dr. Ruderfer and colleagues hypoth-esized that this collection of rare and diverse phenotypes “is a hallmark sig-nature of patients with a genetic dis-ease and can be captured from data in the EHR.” They tested their hypothesis by building a machine learning-based prediction model to identify patients with clinical profiles suggestive of a sus-pected genetic disorder. The model was then trained and tested on 2,286 patients who received a chromosomal microarray (CMA) test and 9,144 demographically matched controls using only diagnostic billing information from the EHR.

The majority (95%) of those who had received the CMA test were under 20 years of age (mean age, 8.1 years), and most were male (61.3%) and white (75.6%). About a quarter of the patients (24%, n = 550) had an abnormal result, including 250 with at least 1 gain and 257 with at least 1 loss. Within this group, 37% (201 of 550) had a potential diag-nosis included in the CMA report. While most of the genomic coordinates were unique, several known recurrent syn-dromes were observed more frequent-ly, including 22q11.2 deletion syndrome (also referred to as DiGeorge syndrome), one type of Charcot-Marie-Tooth dis-ease, and 16p11.2 deletion syndrome.

Next, the researchers wanted to dis-tinguish individuals who had received a CMA from matched controls to capture the clinical suspicion of a genetic dis-ease, but in “an automated and system-ized” manner. The model was trained on 80% of the data (1,818 cases; 7,326 con-trols) and then applied to the highest area under the precision–recall curve to the remaining 20% for testing (468 cases; 1,818 controls). The trained algorithm successfully identified 87% of cases where genetic testing had been ordered and 96% where genetic testing had not been ordered.

The algorithm was evaluated for its ability to identify patients with specific genetic diseases in 6,445 patients using Vanderbilt’s DNA biobank, and it was suc-cessful in identifying patients with patho-genic copy number variations. Research-ers then validated it across a larger pop-ulation at Vanderbilt, and externally at a second hospital.

“This is an important problem, and the Vanderbilt group is starting to attack it,” says Wendy Chung, MD, PhD, Kennedy Family Professor of Pediatrics and Medicine and Chief of Clinical Genetics at Columbia University in New York City, weighing in on the study. “The principles are generalizable, and there will need to be iterative refine-ments to improve the positive predictive value, but this is a good start.”

    The full text of this article hosted at iucr.org is unavailable due to technical difficulties.