An artificial intelligence platform provides an accurate interpretation of esophageal motility from Functional Lumen Imaging Probe Panometry studies
Wenjun Kou and Priyanka Soni Co-first authors.
Abstract
Background
Functional lumen imaging probe (FLIP) Panometry is performed at the time of sedated endoscopy and evaluates esophageal motility in response to distension. This study aimed to develop and test an automated artificial intelligence (AI) platform that could interpret FLIP Panometry studies.
Methods
The study cohort included 678 consecutive patients and 35 asymptomatic controls that completed FLIP Panometry during endoscopy and high-resolution manometry (HRM). “True” study labels for model training and testing were assigned by experienced esophagologists per a hierarchical classification scheme. The supervised, deep learning, AI model generated FLIP Panometry heatmaps from raw FLIP data and based on convolutional neural networks assigned esophageal motility labels using a two-stage prediction model. Model performance was tested on a 15% held-out test set (n = 103); the remainder of the studies were utilized for model training (n = 610).
Key Results
“True” FLIP labels across the entire cohort included 190 (27%) “normal,” 265 (37%) “not normal/not achalasia,” and 258 (36%) “achalasia.” On the test set, both the Normal/Not normal and the achalasia/not achalasia models achieved an accuracy of 89% (with 89%/88% recall, 90%/89% precision, respectively). Of 28 patients with achalasia (per HRM) in the test set, 0 were predicted as “normal” and 93% as “achalasia” by the AI model.
Conclusions
An AI platform provided accurate interpretation of FLIP Panometry esophageal motility studies from a single center compared with the impression of experienced FLIP Panometry interpreters. This platform may provide useful clinical decision support for esophageal motility diagnosis from FLIP Panometry studies performed at the time of endoscopy.
Abbreviations
-
- AI
-
- Artificial intelligence
-
- CCv4.0
-
- Chicago Classification v4.0
-
- DI
-
- Distensibility index
-
- EGJ
-
- Esophagogastric junction
-
- EGJOO
-
- Esophagogastric junction outflow obstruction
-
- FLIP
-
- Functional lumen imaging probe
-
- HRM
-
- High-resolution manometry
-
- RACs
-
- Repetitive Antegrade Contraction
-
- TBE
-
- Timed barium esophagram
Key points
- Artificial intelligence (AI) could augment clinical interpretation of esophageal motility studies; this study aimed to develop an AI platform to interpret functional lumen imaging probe (FLIP) Panometry studies.
- The AI platform interpreted FLIP Panometry studies using a simple, clinically-pragmatic diagnostic scheme with accuracies of 89% compared against the impressions of experienced esophagologists.
- AI may provide useful clinical decision support for esophageal motility diagnoses from FLIP Panometry studies performed at the time of endoscopy.
1 INTRODUCTION
An evaluation for esophageal motility disorders is recommended for evaluation of esophageal dysphagia or chest pain when a mechanical esophageal obstruction is not detected on upper endoscopy.1 While esophageal manometry is the conventional test to diagnosis esophageal motility disorders, functional lumen imaging probe (FLIP) Panometry, represents a novel method to evaluate esophageal motility.2-4 We described classifying esophageal motility using FLIP Panometry and demonstrated that the FLIP Panometry motility classifications frequently paralleled the esophageal motility diagnoses provided by high-resolution manometry (HRM) and the Chicago Classification v4.0 (CCv4.0).4, 5 In particular, a normal FLIP Panometry was associated with a normal (or equivocal) HRM diagnosis in 95% of patients while more than 99% of patients with a manometric diagnosis of achalasia had an abnormal FLIP Panometry study.
In clinical practice, FLIP Panometry and HRM may be used in a complementary manner for esophageal motility diagnosis, particularly as recommended by CCv4.0 when there is an inconclusive HRM diagnosis, such as esophagogastric junction (EGJ) outflow obstruction (EGJOO).5, 6 In some clinical scenarios, FLIP Panometry could potentially be utilized as the primary method for esophageal motility evaluation, such as if FLIP Panometry is normal (essentially ruling out a major esophageal motility disorder), or if a patient is unable to tolerate a transnasal HRM catheter.4, 7 Because FLIP is performed during endoscopy, it offers an advantage over HRM by measuring esophageal motility comfortably in a sedated patient, as well as providing esophageal motility evaluation concurrently with endoscopy.8 While the FLIP Panometry motility classification is based on pattern recognition of contractile response (secondary peristalsis) patterns and EGJ-distensibility metrics, there is a component of subjective interpretation that can sometimes be challenging based on the dynamic aspect of the FLIP study when performed during the endoscopic encounter.
Overall, an automated decision support tool to facilitate interpretation of esophageal motility findings from the FLIP Panometry study is appealing. We recently demonstrated that artificial intelligence (AI) models were able to accurately identify esophageal motility diagnoses from raw HRM data.9, 10 This study aimed to develop and test an AI platform that could predict FLIP Panometry motility diagnoses.
2 METHODS
2.1 Subjects
Consecutive, adult patients (ages 18–89 years) that underwent evaluation of esophageal symptoms between November 2012 and December 2019 and completed FLIP during upper endoscopy and HRM suitable for CCv4.0 were included (Table 1; Figure 1); this study cohort (patients and controls) have been previously described.4, 11, 12 Patients with previous foregut surgery (including previous pneumatic dilation) or esophageal mechanical obstructions including esophageal stricture, eosinophilic esophagitis, severe reflux esophagitis (Los Angeles-classification C or D), hiatal hernia >3 cm were excluded as these are potential causes of secondary esophageal motor abnormalities and preclude application of CCv4.0 (Figure 1).5 Additional baseline clinical evaluation with timed barium esophagram (TBE) was obtained at the discretion of the primary treating gastroenterologist. The study protocol was approved by the Northwestern University Institutional Review Board (STU00210464) as minimal risk with a waiver of informed consent for analysis of deidentified, coded patient data.
Variables | Patient cohort | Controls | Training cohort | Test cohort |
---|---|---|---|---|
n, total | 678 | – | 610 | 103 |
Patients, n (%) | – | – | 579 (95) | 99 (96) |
Controls, n (%) | – | 35 | 31 (5) | 4 (4) |
Age, mean (SD), years | 54 (17) | 30 (6) | 53 (17) | 50 (17) |
Sex, female | 389 (57) | 25 (71) | 360 (59) | 49 (52) |
Indication | ||||
Dysphagia | 612 (90) | 0 | 521 (90) | 91 (92) |
Reflux symptoms | 40 (6) | 0 | 35 (6) | 5 (5) |
Chest pain | 15 (2) | 0 | 13 (2) | 2 (2) |
Other | 11 (2) | 35 (100) | 10 (2) | 1 (1) |
Endoscopic sedation, n (%) | ||||
Conscious (midazolam/fentanyl) | 544 (80) | 0 | 495 (81) | 84 (82) |
Monitored anesthesia care (propofol) | 134 (20) | 35 (100) | 115 (19) | 19 (18) |
FLIP Panometry (true labels) | ||||
Two-stage classifications | ||||
Normal | 190 (28) | 35 | 192 (32) | 33 (32) |
Not normal/not achalasia | 182 (27) | 0 | 151 (25) | 31 (30) |
Not normal/achalasia | 306 (25) | 0 | 267 (44) | 39 (38) |
Motility classification | ||||
Normal | 183 (27) | 35 (100) | 186 (31) | 32 (31) |
Weak | 43 (6) | 0 | 33 (5) | 10 (10) |
Obstruction with weak contractile response | 235 (35) | 0 | 203 (33) | 32 (31) |
Spastic-Reactive | 77 (11) | 0 | 67 (11) | 10 (10) |
Inconclusive | 140 (21) | 0 | 121 (20) | 19 (18) |
EGJ opening classification | ||||
Normal | 230 (34) | 35 (100) | 223 (37) | 42 (41) |
Borderline normal | 80 (12) | 0 | 69 (11) | 11 (11) |
Borderline-reduced | 79 (12) | 0 | 68 (11) | 11 (11 |
Reduced | 289 (43) | 0 | 250 (41) | 39 (38) |
Contractile response pattern | ||||
Normal | 105 (16) | 31 (89) | 113 (19) | 23 (22) |
Borderline | 130 (19) | 4 (11) | 121 (20) | 13 (13) |
Impaired/disordered | 175 (26) | 0 | 147 (24) | 28 (27) |
Absent | 191 (28) | 0 | 162 (27) | 29 (28) |
Spastic-reactive | 77 (11) | 0 | 67 (11) | 10 (10) |
High-resolution manometry | ||||
Chicago Classification v4.0 | ||||
Type I achalasia | 58 (9) | 0 | 53 (9) | 5 (5) |
Type II achalasia | 129 (19) | 0 | 111 (18) | 18 (18) |
Type III achalasia | 40 (6) | 0 | 35 (6) | 5 (5) |
EGJOO-conclusive | 18 (3) | 0 | 15 (3) | 3 (3) |
EGJOO-inconclusive (inconclusive TBE) | 45 (7) | 0 | 42 (7) | 3 (3) |
EGJOO-inconclusive (no TBE) | 76 (11) | 0 | 61 (10) | 15 (15) |
Hypercontractile esophagus | 15 (2) | 0 | 12 (2) | 3 (3) |
Distal esophageal spasm | 15 (2) | 0 | 15 (3) | 0 |
Absent contractility | 17 (3) | 0 | 12 (2) | 5 (5) |
Ineffective esophageal motility | 47 (7) | 3 (9) | 41 (7) | 9 (9) |
Normal motility | 218 (32) | 32 (91) | 213 (35) | 37 (36) |
Timed barium esophagram | ||||
[n (%) completed TBE] | [318 (47)] | 0 | [274 (45)] | [44 (43)] |
5 min column >5 cm | 130 (41) | – | 111 (41) | 19 (43) |
1 min column >5 cm or Tablet impaction | 77 (24) | – | 68 (25) | 9 (21) |
Normal | 111 (35) | – | 95 (35) | 16 (36) |
- Abbreviations: EGJOO, EGJ outflow obstruction; TBE, timed barium esophagram.
- Note: There were no significant differences (i.e., p-values >0.05) on comparisons between the training and test cohorts for any of the included variables. Values reflect n (%) unless otherwise specified.

2.2 FLIP study protocol and analysis
The FLIP study using 16-cm FLIP (EndoFLIP® EF-322 N; Medtronic, Inc, Shoreview, MN) was performed during sedated endoscopy as previously described.3, 13, 14 The FLIP study included stepwise 10-mL FLIP distensions (each stepwise distension volume being maintained for 30–60 s) with the FLIP catheter positioned across the EGJ (1–3 intragastric channels). The manual FLIP Panometry analysis and data labeling was performed remote from endoscopy using a customized program (available open source at http://www.wklytics.com/nmgi) and focused on the 50, 60, and 70 mL fill volumes. The FLIP Panometry analysis was performed blinded to other clinical data, including endoscopy, HRM and TBE results, as previously described and summarized in Table 2.4, 11, 12 The FLIP Panometry labels were assigned by agreement between experienced raters (DAC and JEP). The contractile response pattern was based on review of esophageal contractility during the 50, 60, and 70 mL fil volumes with specific features and patterns of contractility (such as repetitive antegrade contractions or sustained occluding contractions) that were then applied to assign a contractile response (CR) pattern.4, 11 EGJ opening was classified by applying the EGJ-distensibility index (DI) at the 60 mL FLIP fill volume and the maximum EGJ diameter that was achieved during the 60 mL or 70 mL fill volume.4, 15 The contractile response pattern and EGJ opening classification were then applied to assign a FLIP Panometry motility classification (Table 2).4, 15 The FLIP Panometry studies were then classified as “normal” or “not normal” and then the not normal studies as suspected “achalasia” or “not achalasia” based on previous findings evaluating the association of FLIP Panometry motility with HRM/CCv4.0 diagnoses, Figure 2.4
Definition | |
---|---|
FLIP panometry contractile response patterns | |
Normal contractile response | Repetitive Antegrade Contraction (RACs), defined by the RAC Rule of 6 s:
|
Borderline contractile response |
|
Impaired/Disordered contractile response |
|
Absent contractile response |
|
Spastic-reactive contractile response |
|
FLIP Panometry EGJ opening classification | |
Reduced EGJ opening |
|
Borderline EGJ opening |
|
Normal EGJ opening |
|
FLIP panometry motility classifications | |
Normal |
|
Weak |
|
Obstruction with weak contractile response |
|
Spastic-reactive |
|
Inconclusive |
|

2.3 FLIP Panometry motility interpretation models
For the AI models, the data were processed by taking the raw diameter readings from the FLIP study and transforming it into a FLIP Panometry heatmap (Figure 3), which allowed utilization of convolutional neural networks/layers for image inspection.

- Stage (1) “Normal” versus “not normal”: A custom multiheaded convolutional neural network was developed with the goal of each head of the model capturing different patterns identified as important in “Normal” studies. Each of the inputs were passed using various kernel sizes and strides to capture the distinguishing patterns.
- Stage (2) Not normal: suspected “achalasia” versus “not achalasia”: To account for the removal of normal studies from the data, transfer learning/pretraining was leveraged to create a base model to build upon. The VGG16 network pretrained on the ImageNet dataset was selected as our base and then an additional set of linear layers were added for fine-tuning on the Stage 2 labels.
The performance of each of the AI models for the primary analysis was tested on a 15% held-out test set (n = 103) that utilized stratified sampling to maintain proportionate sampling distribution of each label; the remainder of the study cohort (n = 610) was utilized for model training (Figure 1).
Additional clinical factors, including HRM/CCv4.0 motility diagnoses and TBE findings were examined relative to the AI model labeling, with a focus on studies with “inaccurate” predictions from the models.
2.4 HRM and TBE protocol and analysis
HRM studies and interpretation were completed as previously described using a solid-state assembly with 36 circumferential pressure sensors at 1-cm intervals (Medtronic Inc, Shoreview, MN) according to the CCv4.0.4-6 HRM studies were interpreted independent of FLIP results.
Timed barium esophagram (TBE) with barium tablet was obtained in patients at the discretion of the patients' treating physicians. The barium column height above the EGJ was measured from images obtained at 1, 2 and 5 min after ingestion of 200 mL barium. If liquid barium cleared, a 12.5 mm barium tablet was administered. TBE results were categorized for analysis based on the findings of greatest severity by: (a) 5-min column height >5 cm, (b) 1-min column height >5 cm or impaction of a 12.5 mm barium tablet (i.e., inability of the barium tablet to pass), or (c) “normal” (i.e., not meeting preceding severity criteria).
For studies with an HRM/CCV4.0 classification of EGJOO (i.e., an “inconclusive” HRM diagnosis in isolation), timed barium esophagram (TBE) findings were applied when available. Patients with HRM-EGJOO were further defined as “conclusive EGJOO” when the TBE had either (a) 5-min column height >5 cm or (b) a 1-min column height >5 cm and also impaction of a 12.5 mm barium tablet. Patients were otherwise labeled as “inconclusive EGJOO” with other “inconclusive TBE” findings, or if TBE was not completed.
2.5 Statistical analysis
To describe the clinical characteristics of the cohort (as well as among specified subgroups), results were reported as n (%), mean (standard deviation; SD), or median (interquartile range; IQR) as appropriate based on the variable type and depending on data distribution. Comparisons of categorical variables were performed between subject subgroups using Chi-square tests. Comparisons of continuous variables were performed between subject subgroups using ANOVA/t-tests for normally distributed variables and using Kruskal-Wallis/Mann–Whitney U for non-normally distributed variables. Statistical significance was considered at a two-tailed p-value <0.05. For comparisons involving significant differences between more than two groups, post-hoc comparison testing was completed using a Bonferroni correction to address multiple comparisons.
The performance of the AI models was tested on a 15% held-out test set (n = 103) that utilized stratified sampling to maintain sampling distribution of each label. AI model performance was evaluated on recall, precision, and overall area-under-the-curve metrics on the held-out test set.
3 RESULTS
3.1 Subjects
678 patients, mean (SD) age 54 (16) years, 57% female and 35 asymptomatic controls, mean (SD) age 30 (6) years, 71% female were included (Table 1). The majority (90%) of patients were evaluated for dysphagia. Among the entire subject cohort, the “true” FLIP labels were “normal” in 190 (27%) studies, “not normal”/“not achalasia” in 265 (37%) studies, and “not normal”/suspected “achalasia” in 258 (36%) studies. The FLIP Panometry motility classifications included 218 normal (31%; including all 35 controls), 43 (6%) weak, 235 (33%) obstruction with weak contractile response, 77 (11%) spastic-reactive, and 140 (20%) inconclusive. The most common HRM diagnoses were achalasia (227 patients; 32% of the cohort) and normal motility (250 patients and 32/35 controls; 35% of the cohort). 318 (47%) patients completed a TBE.
The training and test cohorts were similar with regard to demographics, true FLIP Panometry motility labels, HRM motility classifications, and proportions of TBE completion and findings (Table 1).
3.2 Performance of two-stage FLIP prediction model
The normal/not normal model achieved 89% accuracy (95% confidence interval 0.83–0.95) with 89% weighted average recall/sensitivity and 90% weighted average precision on the held-out test set (Figure 4). All four asymptomatic controls in the held-out test set were predicted as “normal,” whereas there were 0 patients with achalasia (achalasia on HRM or suspected “achalasia” on FLIP Panometry) that the model predicted as “normal;” Table 3.

Two-stage model prediction | Normal | Not normal, Not achalasia | Not normal, Achalasia |
---|---|---|---|
Total, n (%) | 38 (37) | 23 (22) | 42 (41) |
Patients*, n (%) | 34 (34) | 23 (23) | 42 (42) |
Controls*, n (%) | 4 (100) | 0 | 0 |
Indication (n, % patients) | |||
Dysphagia* | 29 (85) | 21 (91) | 41 (97)a |
Reflux symptoms | 4 (12) | 1 (4) | 0 |
Chest pain | 1 (3) | 0 | 1 (2) |
Other | 0 | 1 (4) | 0 |
FLIP Panometry (true labels) | |||
Motility classification* | |||
Normal | 29 (73) | 3 (13)a | 0a,b |
Weak | 6 (16) | 4 (17) | 0a,b |
Obstruction with weak contractile response | 0 | 1 (4) | 31 (74)a,b |
Spastic-Reactive | 0 | 6 (26)a | 4 (10) |
Inconclusive | 3 (8) | 9 (39)a | 7 (17) |
EGJ opening classification* | |||
Normal | 35 (92) | 7 (30)a | 0a,b |
Borderline normal | 3 (8) | 6 (26)a | 2 (5)a |
Borderline-reduced | 0 | 6 (26)a | 5 (12) |
Reduced | 0 | 4 (17)a | 35 (83)a,b |
Contractile response pattern* | |||
Normal | 23 (61) | 0a | 0a |
Borderline | 7 (18) | 6 (26) | 0a,b |
Impaired/disordered | 7 (18) | 7 (30) | 14 (33) |
Absent | 1 (3) | 4 (17) | 24 (57)a,b |
Spastic-reactive | 0 | 6 (26)a | 4 (10) |
High-resolution manometry | |||
Chicago Classification v4.0* | |||
Type I achalasia | 0 | 0 | 5 (12)a,b |
Type II achalasia | 0 | 2 (9) | 16 (38)a,b |
Type III achalasia | 0 | 0 | 5 (12) |
EGJOO-conclusive | 0 | 0 | 3 (7) |
EGJOO-inconclusive (inconclusive TBE) | 1 (3) | 2 (9) | 0 |
EGJOO-inconclusive (no TBE) | 2 (5) | 6 (26) | 7 (17) |
Hypercontractile esophagus | 2 (5) | 1 (4) | 0 |
Distal esophageal spasm | 0 | 0 | 0 |
Absent contractility | 4 (11) | 1 (4) | 0 |
Ineffective esophageal motility | 3 (8) | 4 (17) | 2 (5) |
Normal motility | 26 (68) | 7 (30)a | 4 (10)a |
Timed barium esophagram | |||
[n (%) completed TBE] | 13 (34) | 8 (35) | 23 (55) |
5 min column >5 cm* | 1 (8) | 1 (13) | 17 (74)a |
1 min column >5 cm or Tablet impaction | 4 (31) | 2 (25) | 3 (13) |
Normal* | 8 (62) | 5 (63) | 3 (13)a |
- Note: Values reflect n (%) of predicted label unless otherwise specified.
- On post-hoc, pairwise comparisons after Bonferroni correction: ap < 0.05 on comparison with “Normal;” bp < 0.05 on comparison with “Not normal; Not achalasia.”
- Abbreviations: EGJOO, EGJ outflow obstruction; TBE, timed barium esophagram.
- * p < 0.05 on comparison across three groups.
There were 65 patients from the held-out set with a predicted “not normal” FLIP that were then tested in stage 2, the achalasia/not achalasia model. This model achieved an accuracy of 89% (95% confidence interval 0.81–0.97), 88% weighted average recall/sensitivity, and 89% weighted average precision (Figure 4). Of the 28 patients with HRM/CCv4.0 diagnosis of achalasia in the test cohort, 93% were predicted as suspected achalasia by the model, as were 3/3 patients with a diagnosis of conclusive EGJOO (conclusive based on HRM and TBE).
3.3 Evaluation of “inaccurate” model predictions
“Inaccurate” model predictions from the test cohort inaccurate case are described in Figure 4 and each of the 18 individual studies are detailed in Table 4. Of the 8 studies inaccurately predicted normal, none had a “true” FLIP label of achalasia, nor did any have a HRM/CCv4.0 diagnosis of achalasia (achalasia subtype I, II, III or conclusive EGJOO); Figure 4. 6/8 had a true FLIP motility classification of “weak” (i.e., with an abnormal contractile response and normal EGJ opening), while the remaining 2/8 had an inconclusive FLIP motility classification (both had borderline-normal EGJ opening).
Model prediction | “True” label | FLIP motility classification | EGJ opening classification | Contractile response pattern | EGJ-DI (mm2/mmHg) | Maximum EGJ diameter (mm) | HRM/CCv4.0 diagnosis |
---|---|---|---|---|---|---|---|
Normal | Not achalasia | Weak | Normal | Impaired/disordered | 5.4 | 19.7 | Normal |
Normal | Not achalasia | Weak | Normal | Impaired/disordered | 5.5 | 16.4 | Normal |
Normal | Not achalasia | Weak | Normal | Impaired/disordered | 2.9 | 18.5 | Normal |
Normal | Not achalasia | Weak | Normal | Impaired/disordered | 2.6 | 17.2 | EGJOO—inconclusivea |
Normal | Not achalasia | Weak | Normal | Impaired/disordered | 7.6 | 19.1 | Absent contractility |
Normal | Not achalasia | Weak | Normal | Absent | 5.9 | 17.5 | Absent contractility |
Normal | Not achalasia | Inconclusive | Borderline normal | Impaired/disordered | 4.3 | 15.2 | Normal |
Normal | Not achalasia | Inconclusive | Borderline normal | Impaired/disordered | 4.0 | 14.7 | Absent |
Not achalasia | Normal | Normal | Normal | Borderline | 2.2 | 18.5 | Normal |
Not achalasia | Normal | Normal | Normal | Borderline | 3.9 | 21.5 | Normal |
Not achalasia | Normal | Normal | Normal | Borderline | 3.6 | 19.2 | Normal |
Not achalasia | Achalasia | Spastic-reactive | Reduced | Spastic-reactive | 1.9 | 9.8 | Type II achalasia |
Not achalasia | Achalasia | Obstruction w/weak contractile response | Reduced | Absent | 1.9 | 11.1 | Type II achalasia |
Achalasia | Not achalasia | Inconclusive | Borderline-reduced | Impaired/disordered | 0.8 | 12.5 | Type II achalasia |
Achalasia | Not achalasia | Inconclusive | Borderline-reduced | Impaired/disordered | 2.2 | 11.6 | Normal |
Achalasia | Not achalasia | Inconclusive | Borderline normal | Absent | 2.7 | 15.1 | Normal |
Achalasia | Not achalasia | Inconclusive | Borderline normal | Impaired/disordered | 1.2 | 14.9 | EGJOO—inconclusiveb |
Achalasia | Not achalasia | Inconclusive | Borderline-reduced | Impaired/disordered | 1.7 | 13.8 | EGJOO—inconclusiveb |
- Note: Each row represents one patient/FLIP study with an inaccurate model prediction.
- Abbreviations: CCv4.0, Chicago Classification version 4.0; EGJ, esophagogastric junction; DI, distensibility index; HRM, high-resolution manometry.
- a Timed barium esophagram (TBE) was normal.
- b TBE was not completed.
Additionally, none of the 5 studies inaccurately predicted as “suspected achalasia” had a true FLIP label of “normal.” Further, all five such studies had a true FLIP motility classification of “inconclusive” with borderline EGJ opening and abnormal contractile responses (Table 4).
4 DISCUSSION
The major finding of this study, based on a cohort of 678 esophageal motility patients and 35 healthy controls, was that an AI model accurately interpreted esophageal motility classifications from FLIP Panometry studies with 89% accuracy in both stages of a two-stage model using a simple, clinically pragmatic classification scheme for esophageal motility disorders. Even when the AI model was “inaccurate,” the model interpretation was generally reasonable (e.g., often to adjacent classifications), and furthermore, the “inaccuracy” would typically be associated with minimal clinical consequence (i.e., no patients with achalasia were predicted as “normal”). Overall, this study suggests that an AI model for FLIP Panometry interpretation can provide reliable decision support for FLIP Panometry studies performed clinically during endoscopy.
There has been increasing use of AI and machine learning in medicine and gastroenterology, such as for assistance of visual inspection of endoscopic images for colon polyps or Barrett's esophagus.16, 17 The model described here transformed raw FLIP data into FLIP Panometry heatmaps that allowed for image inspection by deep-learning AI models, similar to the those utilized on endoscopy images. The supervised, deep-learning model demonstrated herein utilized convolutional neural networks and careful parameter selection of the convolutional layers to inspect the FLIP studies in a manner somewhat similar to a clinician. While this study is the first to develop and test an AI model for FLIP Panometry motility interpretation, we have recently described using other AI approaches for HRM interpretation.9, 18 Similar to using deep-learning algorithms to interpret the graphical depiction of esophageal motility provided by esophageal pressure topography, the models described interpreted esophageal motility from esophageal diameter topography (and pressure) from FLIP Panometry data. With both manometry and FLIP, these AI approaches offer the potential to provide clinical decision support for esophageal motility interpretation.
We previously described using feature-based machine learning on FLIP Panometry metrics to facilitate prediction of HRM-based achalasia subtypes with 90% and 70% accuracy in train and test cohorts, respectively.19 Recognizing that there are advantages to both image-based deep learning and feature-based AI/machine learning approaches, we demonstrated that a multistage model integrating a balanced combination of deep-learning and feature-based AI/machine learning models accurately predicted motility diagnosis on HRM.10 Given the quantifiable physiomarkers provided with FLIP Panometry, an appealing future direction with FLIP Panometry motility interpretation will be to develop hybrid models with a similar framework.
FLIP Panometry done concurrently with endoscopy may be applied in clinical practice such that when a FLIP Panometry study is normal (at either an index endoscopy or following an inconclusive initial evaluation, e.g., EGJOO on HRM), there is a low probability for a major motility disorder. Hence achalasia, the most important diagnosis in an esophageal motility evaluation, is essentially ruled out and management may be directed toward GERD or other syndrome; Figure 5.4, 7 Given the low probability for a major esophageal motor disorder, manometry may be avoided in this scenario (or at least delayed if symptoms persist after initial treatment). Conversely, when FLIP Panometry is abnormal, the FLIP results can be incorporated with the existing clinical impression based on clinical presentation and endoscopic findings (i.e., a pretest probability) to support (or refute) achalasia. If there is a high pretest probability for achalasia (e.g., supportive clinical history and endoscopy) and a FLIP prediction of achalasia, treatment could potentially be pursued without need for HRM (especially if the patient is unable to tolerate HRM). Though if needed, the clinical motility diagnosis can then be ultimately reached via complementary application of other data, for example, HRM and/or TBE.

With regard to the potential application of the described AI model in clinical practice, it is expected that less experienced FLIP Panometry users may rely more heavily on the model interpretations, while more experienced FLIP Panometry users will use the model interpretations as reassurance when their independent interpretation agrees with the model, or prompt additional scrutiny of the FLIP study when interpretations differ. In either case, the results of this study demonstrated that a “normal” FLIP interpretation resulted in zero patients with achalasia being “missed” and an initial management plan targeting GERD would have been reasonable for all such patients, even the few patients eventually diagnosed with a disorder of primary peristalsis. Further, a prediction for an abnormal FLIP Panometry would likely lead to application of complementary data with TBE or HRM (Figure 5). Notably, all patients with an HRM/CCv4.0 diagnosis of achalasia in the held-out test set were predicted as “not normal” and 93% of these patients had a predicted FLIP diagnosis of “suspected achalasia.” Overall, application of the model predictions to clinical practice would be expected to facilitate accurate detection (or exclusion) of esophageal motility disorders via FLIP Panometry and provide clinical decision support when interpreting FLIP Panometry studies done concurrently with endoscopy.
There are several strengths of this study investigating novel AI models for FLIP classification including a large clinical cohort (including healthy controls) and comprehensive motility testing independent of FLIP (HRM interpreted per CCv4.0 criteria and a subset with TBE) to complement the clinical relevance of the model predictions. However, the study has limitations as well. While we explored the clinical relevance of the model predictions using additional data independent of FLIP (e.g., HRM and TBE), this at times may be an unfair measure by which to judge the AI model interpretations, noting that there is no singular “gold standard” test for esophageal motility disorders. Establishing an absolute ground truth for esophageal motility disorders can be challenging being that FLIP, HRM, and TBE each have inherent limitations and none are perfect tests. Thus, instead of representing a limitation of model performance, discrepancies may reflect that motility on FLIP and HRM sometimes differ, for example, secondary peristalsis (FLIP) and primary peristalsis (HRM) can differ within individuals.11, 20, 21 Another limitation is that this work describes a single center study, thus a multicenter study is ongoing to test this model on external patient cohorts to further validate this model and demonstrate generalizability. Additionally, future work is needed to help develop models for prediction of targeted treatment outcomes, which likely will involve incorporation of data from complimentary tests. AI/machine learning offers a promising approach to facilitate this multimodal integration.
In conclusion, AI models were able to accurately interpret esophageal motility per a simple, clinically pragmatic classification of esophageal motility disorders using FLIP Panometry studies from a single center suggesting that this technology has significant potential to provide clinical decision support for FLIP Panometry studies done concurrently with endoscopy. Future work is anticipated to refine both FLIP Panometry and clinical motility diagnosis labeling, as well as incorporating longitudinal treatment outcomes, to further develop advanced models for diagnosis of esophageal motility disorders. The promise of AI for clinical decision support is apparent and represents potential for exciting advances in gastroenterology and medicine.
DISCLOSURE
WK, PS, MWK, ME, JEP, PJK, DAC, and Northwestern University hold shared intellectual property rights and a licensing agreement with Medtronic Inc. WK: Bristol-Myers Squibb (Consulting). PJK: AstraZeneca (consulting), Ironwood (Consulting); Reckitt (Consulting), Johnson and Johnson (consulting). JEP: Sandhill Scientific/Diversatek (Consulting, Speaking, Grant), Takeda (Speaking), Astra Zeneca (Speaking), Medtronic (Speaking, Consulting, Patent, License), Torax (Speaking, Consulting), Ironwood (Consulting). DAC: Medtronic (Speaking, Consulting); Phathom Pharmaceuticals (Consulting)
AUTHOR CONTRIBUTIONS
WK and PS contributed to drafting of the manuscript, data analysis, data interpretation, and approval of the final version. MWK contributed to data analysis, data interpretation, editing the manuscript critically and approval of the final version. ME contributed to study concept, data analysis, data interpretation, and approval of the final version. PJK contributed to editing the manuscript critically and approval of the final version. JEP contributed to study concept and design, obtaining funding, data interpretation, editing the manuscript critically, and approval of the final version. DAC contributed to study concept and design, data analysis, data interpretation, drafting of the manuscript, obtaining funding, and approval of the final version.
FUNDING INFORMATION
This work was supported by P01 DK117824 (JEP) from the Public Health service, American College of Gastroenterology Junior Faculty Development Award (DAC), and gifts from Joe and Nives Rizza and The Todd and Renee Schilling Charitable Fund.