Volume 64, Issue 7 pp. 1900-1909
RESEARCH ARTICLE

Long-term epilepsy outcome dynamics revealed by natural language processing of clinic notes

Kevin Xie

Kevin Xie

Department of Bioengineering, University of Pennsylvania, Philadelphia, Pennsylvania, USA

Center for Neuroengineering and Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, USA

Search for more papers by this author
Ryan S. Gallagher

Ryan S. Gallagher

Center for Neuroengineering and Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, USA

Department of Neurology, University of Pennsylvania, Philadelphia, Pennsylvania, USA

Search for more papers by this author
Russell T. Shinohara

Russell T. Shinohara

Penn Statistics in Imaging and Visualization Center, Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania, USA

Center for Biomedical Image Computing and Analytics, University of Pennsylvania, Philadelphia, Pennsylvania, USA

Search for more papers by this author
Sharon X. Xie

Sharon X. Xie

Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania, USA

Search for more papers by this author
Chloe E. Hill

Chloe E. Hill

Department of Neurology, University of Michigan, Ann Arbor, Michigan, USA

Search for more papers by this author
Erin C. Conrad

Erin C. Conrad

Center for Neuroengineering and Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, USA

Department of Neurology, University of Pennsylvania, Philadelphia, Pennsylvania, USA

Search for more papers by this author
Kathryn A. Davis

Kathryn A. Davis

Center for Neuroengineering and Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, USA

Department of Neurology, University of Pennsylvania, Philadelphia, Pennsylvania, USA

Search for more papers by this author
Dan Roth

Dan Roth

Department of Computer and Information Science, University of Pennsylvania, Philadelphia, Pennsylvania, USA

Search for more papers by this author
Brian Litt

Brian Litt

Department of Bioengineering, University of Pennsylvania, Philadelphia, Pennsylvania, USA

Center for Neuroengineering and Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, USA

Department of Neurology, University of Pennsylvania, Philadelphia, Pennsylvania, USA

Search for more papers by this author
Colin A. Ellis

Corresponding Author

Colin A. Ellis

Center for Neuroengineering and Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, USA

Department of Neurology, University of Pennsylvania, Philadelphia, Pennsylvania, USA

Correspondence

Colin A. Ellis, Hospital of the University of Pennsylvania, 3400 Spruce St., 3 West Gates Building, Philadelphia, PA 19104, USA.

Email: [email protected]

Search for more papers by this author
First published: 28 April 2023
Citations: 4

Abstract

Objective

Electronic medical records allow for retrospective clinical research with large patient cohorts. However, epilepsy outcomes are often contained in free text notes that are difficult to mine. We recently developed and validated novel natural language processing (NLP) algorithms to automatically extract key epilepsy outcome measures from clinic notes. In this study, we assessed the feasibility of extracting these measures to study the natural history of epilepsy at our center.

Methods

We applied our previously validated NLP algorithms to extract seizure freedom, seizure frequency, and date of most recent seizure from outpatient visits at our epilepsy center from 2010 to 2022. We examined the dynamics of seizure outcomes over time using Markov model-based probability and Kaplan–Meier analyses.

Results

Performance of our algorithms on classifying seizure freedom was comparable to that of human reviewers (algorithm F1 = .88 vs. human annotator κ = .86). We extracted seizure outcome data from 55 630 clinic notes from 9510 unique patients written by 53 unique authors. Of these, 30% were classified as seizure-free since the last visit, 48% of non-seizure-free visits contained a quantifiable seizure frequency, and 47% of all visits contained the date of most recent seizure occurrence. Among patients with at least five visits, the probabilities of seizure freedom at the next visit ranged from 12% to 80% in patients having seizures or seizure-free at the prior three visits, respectively. Only 25% of patients who were seizure-free for 6 months remained seizure-free after 10 years.

Significance

Our findings demonstrate that epilepsy outcome measures can be extracted accurately from unstructured clinical note text using NLP. At our tertiary center, the disease course often followed a remitting and relapsing pattern. This method represents a powerful new tool for clinical research with many potential uses and extensions to other clinical questions.

CONFLICT OF INTEREST STATEMENT

None of the authors has any conflict of interest to disclose. We confirm that we have read the Journal's position on issues involved in ethical publication and affirm that this report is consistent with those guidelines.

DATA AVAILABILITY STATEMENT

Our models are available on the Hugging Face Hub at: https://huggingface.co/CNT-UPenn. Our code is available on GitHub at: https://github.com/penn-cnt/Text_Mining_Epilepsy_Outcomes. We cannot make our clinical data available publicly due to patient privacy.

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.

click me