Feasibility of deep learning-based fully automated classification of microsatellite instability in tissue slides of colorectal cancer
Funding information: National Research Foundation of Korea (NRF) grants funded by the Korean government, Grant/Award Number: NRF-2017R1D1A1B03030998
Abstract
High levels of microsatellite instability (MSI-H) occurs in about 15% of sporadic colorectal cancer (CRC) and is an important predictive marker for response to immune checkpoint inhibitors. To test the feasibility of a deep learning (DL)-based classifier as a screening tool for MSI status, we built a fully automated DL-based MSI classifier using pathology whole-slide images (WSIs) of CRCs. On small image patches of The Cancer Genome Atlas (TCGA) CRC WSI dataset, tissue/non-tissue, normal/tumor and MSS/MSI-H classifiers were applied sequentially for the fully automated prediction of the MSI status. The classifiers were also tested on an independent cohort. Furthermore, to test how the expansion of the training data affects the performance of the DL-based classifier, additional classifier trained on both TCGA and external datasets was tested. The areas under the receiver operating characteristic curves were 0.892 and 0.972 for the TCGA and external datasets, respectively, by a classifier trained on both datasets. The performance of the DL-based classifier was much better than that of previously reported histomorphology-based methods. We speculated that about 40% of CRC slides could be screened for MSI status without molecular testing by the DL-based classifier. These results demonstrated that the DL-based method has potential as a screening tool to discriminate molecular alteration in tissue slides.
Abstract
What's new?
Microsatellite instability (MSI) levels are an important predictive biomarker for response to immune checkpoint inhibitors in colorectal cancer. To test the feasibility of a deep learning (DL)-based classifier as a screening tool for MSI status, here the authors built a fully-automated DL-based MSI classifier using pathology whole-slide images of hematoxylin and eosin-stained tissue slides of colorectal cancer. By automatically removing artefacts and selecting tumour patches with high tumour probability, the DL-based system could screen out a considerable number of tissue slides for their MSI status, demonstrating its potential as a screening tool for molecular alterations in tissue slides.
CONFLICT OF INTEREST
The authors declare no conflicts of interest.
Open Research
DATA AVAILABILITY STATEMENT
The slide images of TCGA datasets were downloaded from the Genomic Data Commons portal (https://portal.gdc.cancer.gov/). The SMH datasets are available from the corresponding author (HJ.J.) upon reasonable request and through collaborative investigations. The source codes for the classifiers are available as open-source Python code on GitHub: https://github.com/jajman/ColonMSI/.