Artificial intelligence and colonoscopy: Current status and future perspectives
Abstract
Background and Aim
Application of artificial intelligence in medicine is now attracting substantial attention. In the field of gastrointestinal endoscopy, computer-aided diagnosis (CAD) for colonoscopy is the most investigated area, although it is still in the preclinical phase. Because colonoscopy is carried out by humans, it is inherently an imperfect procedure. CAD assistance is expected to improve its quality regarding automated polyp detection and characterization (i.e. predicting the polyp's pathology). It could help prevent endoscopists from missing polyps as well as provide a precise optical diagnosis for those detected. Ultimately, these functions that CAD provides could produce a higher adenoma detection rate and reduce the cost of polypectomy for hyperplastic polyps.
Methods and Results
Currently, research on automated polyp detection has been limited to experimental assessments using an algorithm based on ex vivo videos or static images. Performance for clinical use was reported to have >90% sensitivity with acceptable specificity. In contrast, research on automated polyp characterization seems to surpass that for polyp detection. Prospective studies of in vivo use of artificial intelligence technologies have been reported by several groups, some of which showed a >90% negative predictive value for differentiating diminutive (≤5 mm) rectosigmoid adenomas, which exceeded the threshold for optical biopsy.
Conclusion
We introduce the potential of using CAD for colonoscopy and describe the most recent conditions for regulatory approval for artificial intelligence-assisted medical devices.
Introduction
Colorectal cancer (CRC) is a major cause of cancer-related death in both eastern and western countries. Colonoscopy with complete resection of neoplastic lesions (e.g. adenomas) is considered a reliable measure to reduce both the incidence and mortality of CRC.1, 2 A large cohort study conducted in the USA showed a roughly 70% reduction in deaths as a result of CRC after screening colonoscopy3 was instituted. The quality of the colonoscopy, however, varies according to the expertise of the endoscopist. Poorly conducted colonoscopy could impair CRC prevention. It has been established that the adenoma detection rate during colonoscopy is inversely associated with the incidence of post-colonoscopy CRC and with CRC-related mortality.4, 5 Therefore, standardizing the quality of colonoscopy by reducing the number of missed adenomas/polyps is a desirable goal.
The other concern regarding the quality of the colonoscopy is the accuracy of the optical diagnosis of recognized colorectal polyps. Under specific conditions, endoscopists are allowed to adopt visual inspection as a measure to optically predict the polyp's pathology as a substitute for histopathological evaluation. This practice was introduced by the American Society of Gastrointestinal Endoscopy, which proposed the Preservation and Incorporation of Valuable endoscopic Innovations (PIVI) for optical biopsy of diminutive polyps in which a “diagnose-and-leave” strategy is allowed for hyperplastic polyps if the negative predictive value (NPV) for diminutive rectosigmoid adenomas is >90% when diagnosed with high confidence using an advanced endoscopic modality.6 The accuracy of such optical diagnosis, however, has been reported to be limited, especially in community-based practices or in non-expert hands, which hinders its practical implementation.7-9
Computer-aided detection and characterization of colorectal polyps is now attracting increased attention. Concept and use of basic technology of computer-aided diagnosis (CAD) for colonoscopy has long been explored.10 Until the early 2010s, the research had been limited mainly to engineering fields because of the capability of computer algorithms and computer power. With emergence of deep learning algorithms and significant advancements in computer power (e.g. development of clock frequency of the central processing unit and the graphic processing unit of computers), CAD assistance during colonoscopy, which can be used in real time, is now being realized.11
The major roles of CAD for colonoscopy include automated polyp detection and characterization. By indicating the presence and location of polyps in real time during colonoscopy, CAD potentially draws the endoscopist's attention to polyps that are displayed on the monitor but could be overlooked visually, which, in turn, would result in a higher adenoma detection rate. In addition, by outputting the predicted pathology or endoscopic classification of the detected polyps, CAD accelerates accurate optical biopsy characterization of the colorectal polyps, which could lead to a significant reduction in the number of unnecessary polypectomies of non-neoplastic polyps (which may be misdiagnosed because of endoscopists’ lack of expertise).
In this review, we assess the current situation of incorporating the use of CAD during colonoscopy by referring mainly to physician-initiated studies in this field. We also explore the possibility of clinical implementation of this technology by addressing the situation of regulatory approval, which is a mandatory step that must be addressed.
Automated Polyp Detection
Retrospective/experimental studies
The initial studies regarding polyp detection in white light colonoscopic images were reported by two engineering groups in 2003.12, 13 These pilot studies were based on using wavelet transformation as a classifier for extracted images, which provided good accuracy (>90%) in a preliminary setting. Following these studies, various technological ideas and modifications were applied to the diagnostic algorithms by other engineering research groups.14-22 Most of these modified models were evaluated using images from public polyp databases (e.g. CVC-ColonDB and ASU-Mayo), providing a sensitivity of 48–90%. This range appears to indicate an acceptable performance, but it could not necessarily be generalized because these databases were based on a limited number of images from fewer than 20 polyps. In contrast to these evaluation methods, one Spanish group based the studies on the creation of energy maps using the original 24 videos containing 31 polyps, which provided satisfactory results of >70% sensitivity and specificity.23 That study was also distinct in that it was the first physician-initiated study in this field. However, there are various barriers that could interfere with practical realization of such a “hand-crafted” algorithm: calculation speed; computer power; relatively low sensitivity; and the false-positive rate caused by the presence of vessels, stools, and folds. This situation was suddenly changed with the emergence of deep learning, a machine-learning method in which artificial intelligence (AI) automatically searches and defines important features of given images.
Recently, three physician-initiated studies that addressed automated polyp detection using a deep-learning method were published. Misawa et al.24 developed a 3-D convolutional network model for automated polyp detection that worked nearly in real time (Fig. 1). They confirmed a sensitivity of 90% and a specificity 63% using 50 polyp videos and 85 non-polyp videos as test sets. Subsequently, Urban et al.25 developed a CAD model that had excellent diagnostic capability in an experimental setting: its area under the receiver operating characteristic (ROC) curve for polyp recognition was 0.991, and its accuracy was 96% (Fig. 2). They also assessed the efficacy of their algorithm by comparing assessments of nine videos with CAD versus without CAD. Their results showed that the assessment with CAD identified nine more polyps than that without CAD (45 vs. 36, respectively). Most recently, Wang et al.26 reported a CAD model that provided >90% sensitivity and specificity for video-based analysis (Fig. 3). The strength of their study was that they evaluated their model using a large number of images, patients, video records, and polyps, which contributed to the reliability of the acquired data. (They obtained 27 113 static images from 1138 patients, video recordings of 138 polyps from 111 patients, and video recordings of 54 colonoscopies that contained polyps as the test sets.)



Prospective studies with in vivo use of AI
To date, no prospective studies assessing the real-time use of an automated polyp detection system have been reported.
Automated Polyp Characterization
Retrospective/experimental studies
Magnifying narrow-band imaging
Application of CAD to magnifying narrow-band imaging (NBI; Olympus Corp., Tokyo, Japan) has been the area most eagerly investigated in this field. The first application of CAD was reported by Tischendorf et al.27 and Gross et al.,28 who provided diagnostic accuracies of 85.3% and 93.1%, respectively. They used the similar algorithm, which was based on a sequence of extracting nine vessel features (e.g. length, brightness, perimeter) from magnifying NBI images and classifying these features into a two-class pathological prediction (i.e. neoplastic or not neoplastic) using a support vector machine. Gross et al. showed that CAD had an accuracy of 93.1%, which was superior to that of non-experts and supporting the hypothesis that CAD could be a powerful support for novice endoscopists. Following these studies, a research group at Hiroshima University in Japan played a significant role in the development of CAD models.29-35 Unlike the previous studies, they adopted a histogram of visual words to the algorithm to make a more robust system for image analysis. Their achievement was notable because they realized real-time prediction of polyp pathology.
Recently, two research teams conducted retrospective studies on newly developed CAD systems based on a deep-learning algorithm. Byrne et al. assessed their model using 125 unaltered endoscopic videos containing diminutive polyps (Fig. 4). The AI model did not generate sufficient confidence to predict the histology of 15% of the polyps. For the remaining 106 diminutive polyps, however, the sensitivity for identifying adenomas was 98%, specificity 83%, NPV 97%, and positive predictive value (PPV) 90%.36 Similarly, Chen et al.37 assessed their model using 284 diminutive polyps. The model identified neoplastic or hyperplastic polyps with 96.3% sensitivity, 78.1% specificity, PPV 89.6%, and NPV 91.5%. Both studies met the PIVI-2 threshold required for the diagnose-and-leave strategy for diminutive hyperplastic polyps.6

Magnifying chromoendoscopy
Magnifying chromoendoscopy with indigo carmine or crystal violet helps endoscopists recognize the surface structure of colorectal polyps, contributing to highly accurate predictions of lesion pathology during pit pattern diagnosis (sensitivity 97.8% and specificity 91.4% when carried out by experts).38, 39 CAD systems for pit pattern diagnosis generally use one of two methods: quantitative analysis of pit structures or texture analysis of the entire endoscopic image. The first method is based on extraction of pit structures followed by their automated quantitative evaluation (e.g. of the area, perimeter, major/minor fit ellipse, circularity). Takemura et al.40 achieved an overall accuracy of 98.5% when applying this method. An alternative method—texture analysis—was described by Hafner et al.,41 who used an algorithm that used texture image features in the wavelet domain. Although both CAD methods for classifying pit patterns have shown excellent diagnostic accuracy experimentally, neither has been further evaluated to date.
Endocytoscopy
Endocytoscopy (H290ECI; Olympus Corp., Tokyo, Japan) and confocal laser endomicroscopy (Cellvizio; Mauna Kea Technologies Inc., Paris, France) are newly introduced in vivo contact microscopic imaging modalities. They allow endoscopists to obtain real-time cellular images with 500-fold or 1000-fold magnification power, respectively, during colonoscopy.42, 43 These devices are considered ideal for partnering with the CAD system because they always provide focused, fixed-size images, which contribute to easier robust image analysis using CAD.
Endocytoscopy with CAD has been intensively investigated by a Japanese study group. Their first model was based on automated extraction of nuclear areas stained with methylene blue followed by quantitative analysis of six nuclear features. Their accuracy for identifying neoplastic changes was 89.2%.44 In their follow-up study, reported in 2016 and 2018, the diagnostic algorithm was improved by adding texture analysis for feature extraction and a support vector machine as a classifier, thereby producing an output image of the predicted pathology along with the probability of the diagnosis.45, 46 They also developed a more user-friendly CAD system based on endocytoscopy combined with NBI that required no prior staining.47 This system focused on the analysis of microvessels on the surface of polyps and provided an overall accuracy of 90.0%. Apart from differentiating between neoplastic and non-neoplastic lesions, they explored the possibility of using CAD to identify invasive cancer (Fig. 5).48 According to that pilot study, a newly developed algorithm could differentiate invasive cancers with sensitivity, specificity, accuracy, PPV, and NPV values of 89.4%, 98.9%, 94.1%, 98.8%, and 90.1%, respectively.

Confocal endomicroscopy
There have been four studies to assess confocal endomicroscopy. Two were focused on automated pathological prediction49, 50 and two exclusively on quantitative image quality control steps that aided the physician's interpretation of confocal endomicroscopic images.51, 52 The former two studies were conducted by Andre et al., who reported 89.6% accuracy in the differentiation of adenoma from non-neoplastic polyps,49 and Stefanescu et al., who showed 84.5% accuracy in distinguishing advanced CRC from normal colon mucosa.50 Both diagnostic algorithms were based on k-nearest neighbor classification and neural network analysis. Unfortunately, they were evaluated in experimental settings only, with no follow-up clinical trials.
Laser-induced fluorescence spectroscopy
Kuiper et al.53 and Rath et al.54 reported a CAD system using laser-induced autofluorescence spectroscopy. This device predicts polyp pathology (i.e. neoplastic or non-neoplastic) by analyzing light that is emitted by tissue that had absorbed modality-induced laser light. The device is incorporated into a standard biopsy forceps. Thus, endoscopists using these forceps can identify diminutive polyps, obtain the pathology predicted by CAD, and then resect them, as necessary, with the same biopsy forceps. Notably, both investigative groups evaluated the performance of this CAD system in real time. (See the section Prospective study with in vivo use of AI.)
Autofluorescence endoscopy
Another endoscopic modality in this research area is autofluorescence imaging (AFI; Olympus Corp.). The AFI endoscope analyzes natural tissue fluorescence emitted by endogenous fluorophores in the colorectal mucosa upon excitation by light. AFI provides a green/red image that can be analyzed by a computer algorithm.55 Japanese medical groups conducted experimental research based on prototype software that calculates the green/red ratio of the acquired image56-58 and evaluated its performance prospectively. (See the section Prospective study with in vivo use of AI.)
White light endoscopy
White light endoscopy is the most widely available endoscopic modality. Hence, it would be beneficial if CAD could be designed for use with white light endoscopy. It has not been easy, however, to achieve high accuracy with this combination. Komeda et al. applied a deep-learning algorithm for white light endoscopy but reported that its accuracy was limited to 75.1%. More investigation is therefore required. This endeavor seems questionable, however, considering that an optical diagnosis using white light endoscopy is usually inferior to that using NBI or chromoendoscopy with or without magnification.59
Prospective study with in vivo use of AI
Although many retrospective/experimental studies were conducted in this field, only five prospective studies evaluated real-time use of AI during colonoscopy.33, 53, 54, 57, 60 Kominami et al.33 prospectively evaluated the CAD model designed for magnifying NBI. In that trial, 41 patients with 118 colorectal lesions underwent real-time assessment by CAD during colonoscopy. Accuracy of the CAD diagnosis was 93.2%, and NPV was 93.3%, which met the PIVI-2 criteria.33 Moreover, the recommendation for follow-up colonoscopy based on pathology and the real-time CAD prediction of pathology were identical in 92.7% of cases, which also met the PIVI-2 criteria for the “resect-and-discard” strategy. Mori et al.60 recently conducted a larger-scale prospective study exploring the efficacy of an endocytoscopy-based CAD system. They included 791 patients and assessed 466 diminutive polyps using CAD. The NPV for diminutive rectosigmoid adenomas proved to be 93.7% in the worst-case scenario, which exceeded the PIVI-2 threshold.6 In addition, Rath et al.54 evaluated the performance of CAD with laser-induced fluorescence spectroscopy, which showed a high NPV for adenomatous polyp histology of 96.1% in a single-center prospective trial that included 27 patients with 137 diminutive polyps that met the PIVI-2 criteria. Conversely, in another prospective study with the same device, Kuiper et al.53 reported an NPV limited to 73.5% for identifying small adenomas. Studying the performance of CAD for autofluorescence endoscopy, Aihara et al. showed that the data provided from the prospective trial in 2013 were excellent (sensitivity, specificity, PPV, and NPV of 94.2%, 88.9, 95.6%, and 85.2%, respectively57).
Combination of Automated Detection and Characterization
Ideally, a fully automated diagnosis powered by AI is desirable for standardizing the quality of colonoscopies (i.e. a combination of automated polyp detection followed by immediate polyp characterization). The Japanese study group proposed simultaneous polyp detection and characterization with the use of their developed technologies: (i) an algorithm based on a deep-learning algorithm to detect polyps in white light images; and (ii) another algorithm that predicted the polyp's pathology that was designed for endocytoscopic images when the tip of the endoscope contacted the polyp and the endoscopists captured a photograph of it (Video S1).61 Although their report had no accompanying statistical analysis, it might be promising given that both automated polyp detection and characterization are essential elements in clinical colonoscopy practice.
Future Directions
What evidence is required for the future?
Although the results from previous studies appear promising, supporting evidence of CAD combined with colonoscopy still lacks significance because most of the studies were conducted in a retrospective method, which could be subject to considerable selection bias. It could affect the study results in favor of CAD. We did find some well-designed, prospective studies of automated polyp characterization that were statistically more reliable than the retrospective studies because they reduced the possibility of bias in lesion selection, taking into consideration missing data and evaluating the success rate of image acquisition/data interpretation as well as the accuracy of the CAD system.33, 54, 57, 60, 62 The number of such studies is limited, however, so we should further explore as to how to construct robust evidence to promote the implementation of CAD in colonoscopy practice. Following are the recommended study designs for this purpose.
First, set up a prospective evaluation with real-time use of CAD. Second, use a comparative study design (i.e. CAD vs without CAD). There are two options for this purpose—a randomized controlled trial and a single-arm study to evaluate the diagnostic value that CAD adds. Third, establish a robust end-point. From this point of view, the adenoma detection rate is preferred over the rate of polyps missed, the sensitivity, or the false-positive rate for assessing an automated polyp-detection system. Similarly, the preferable end-points for evaluating the automated polyp characterization system are those of PIVI-1 (e.g. accuracy of the surveillance interval prediction) and PIVI-2. Fourth, conduct the study in an international, multicenter setting to ensure reproducibility of the results. This point is especially important when researching CAD because the evaluation should be conducted outside the institution where the “machine” images are obtained.63 Apart from the general study design, the efficacy of CAD should be evaluated for all types of colorectal lesions. For example, the previous studies hardly assessed the performance of CAD for depressed-type neoplasms, which are considered to harbor more malignant potential than other morphological types.64 In addition, other types of important lesions such as sessile serrated lesions, colitis associated cancer, or hereditary polyposis have not yet been explored as targets of CAD, but should also be addressed.
Requirement for regulatory approval and the strategy to address it
Obtaining regulatory approval is a mandatory step toward using CAD systems in colonoscopy practice. Regulatory assessment of the AI-assisted device is a relatively new area for authorities, so the assessment rules vary significantly from one country to another. Recently, the U.S. Food and Drug Administration moved to reclassify CAD software for radiology to allow an easier regulatory path to the marketplace.65 CAD that is to be used for mammography will require class II approval (it previously required class III approval). It is a good move toward easier approval of AI-assisted medical devices, although Park and Han cautioned that clinical validation of CAD will be crucial even after regulatory approval.63, 66 In Japan, the Pharmaceuticals and Medical Devices Agency (PMDA), a regulatory body, issued a statement on the science of AI-assisted medical devices and systems in 2018. In this document, the PMDA classified the risks of using the CAD system into five categories, in which the fifth harbors a risk of treatment failure as a result of the misdiagnosis provided by CAD.67 Considering this risk stratification and other factors, PMDA requests the applicants to carry out a retrospective or prospective evaluation of the CAD system. Currently, only two CAD systems designed for use with laser-induced fluorescence spectroscopy54, 62 and endocytoscopy42-45 are approved by regulatory bodies (WavSTAT4; Pentax Corp., Tokyo, Japan, EndoBRAIN; Cybernet System Corp., Tokyo, Japan), but we expect that much more CAD systems designed for colonoscopy will be approved by regulatory bodies in a few years.
Conclusions
Artificial intelligence-assisted colonoscopy is no doubt an attractive option for standardizing endoscopy practice, which is necessarily imperfect because of human error. Several related technologies and their supporting evidence are growing significantly, and expectations of endoscopy societies are significantly increasing. We must keep in mind two concerns. First, the supporting evidence on the efficacy of CAD for colonoscopy is currently weak because of the limited quality of previous study designs. Second, no recognized CAD systems have been implemented in the clinical setting for colonoscopy, so its practical usefulness is unknown. Thus, in the future, we must conduct high-quality clinical trials to accumulate this evidence, and we must also understand how to obtain regulatory approval for earlier clinical use. With these problems appropriately addressed, CAD can definitively open the door to next-generation colonoscopy.
Acknowledgment
We thank Nancy Schatken, BS, MT(ASCP), from Edanz Group (www.edanzediting.com/ac), for editing a draft of this manuscript.
Conflicts of Interest
SK, YM, and MM are inventors of the patented “Image-processing instrument and method” (No. 6059271 in Japan), with inventors’ premiums paid by Showa University. SK, YM, and MM have received speaking honoraria from Olympus Corp. KM received research funding from Cybernet Corp. None of the other authors has conflicts of interest relating to the present study.