Volume 71, Issue 4 pp. 505-508

Editorial

Open Access

Deep learning in image segmentation for cancer

Robba Rai BAppSc (DiagRad), MHlthSc (MRI), PhD,

Corresponding Author

Robba Rai BAppSc (DiagRad), MHlthSc (MRI), PhD

[email protected]

orcid.org/0000-0002-0626-6438

South Western Sydney Clinical School, University of New South Wales, Liverpool, New South Wales, Australia

Liverpool and Macarthur Cancer Therapy Centre, Liverpool Hospital, Liverpool, New South Wales, Australia

Ingham Institute for Applied Medical Research, Liverpool, New South Wales, Australia

Correspondence

Robba Rai, Cancer Services Locked Bag 7103, Liverpool BC1871, NSW, Australia. Tel: +612 8738 9568; E-mail: [email protected].

Search for more papers by this author

Robba Rai BAppSc (DiagRad), MHlthSc (MRI), PhD,

Corresponding Author

Robba Rai BAppSc (DiagRad), MHlthSc (MRI), PhD

[email protected]

orcid.org/0000-0002-0626-6438

South Western Sydney Clinical School, University of New South Wales, Liverpool, New South Wales, Australia

Liverpool and Macarthur Cancer Therapy Centre, Liverpool Hospital, Liverpool, New South Wales, Australia

Ingham Institute for Applied Medical Research, Liverpool, New South Wales, Australia

Correspondence

Robba Rai, Cancer Services Locked Bag 7103, Liverpool BC1871, NSW, Australia. Tel: +612 8738 9568; E-mail: [email protected].

Search for more papers by this author

First published: 06 November 2024

https://doi.org/10.1002/jmrs.839

Share a link

Email
Wechat
Bluesky

Graphical Abstract

This article discusses the role of deep learning (DL) in cancer imaging, focusing on its applications for automatic image segmentation. It highlights two studies that demonstrate how U-Net- and convolutional neural networks–based architectures have improved the speed and accuracy of body composition analysis in CT scans and rectal tumour segmentation in MRI images. While the results are promising, the article stresses the need for further research to address issues like image quality variability across different imaging systems.

Introduction

Cancer is one of the leading causes of death worldwide. In 2020, it accounted for nearly 10 million deaths globally.¹ Medical imaging plays an important role in all stages of cancer management and treatment and is used to highlight the differences between normal tissue and regions suggestive of a neoplastic process.² It is also used in a quantitative manner which is useful in characterising tumour types and measuring response during cancer treatment. The Quantitative Imaging Biomarkers Alliance (QIBA), a consortium of the Radiological Society of North America (RSNA), defines quantitative imaging as ‘the extraction of quantifiable features from medical images for the assessment of normal or the severity, degree of change, or status of a disease, injury or chronic condition relative to normal’.³ Advances in imaging technologies allow for early detection of diseases with precision, better tumour characterisation and assessment of changes in disease over time or in response to therapy.⁴

Imaging modalities such as computed tomography (CT), positron emission tomography (PET), single-proton emission computed tomography (SPECT) and magnetic resonance imaging (MRI) are often used in cancer care for prediction, screening, biopsy guidance, staging, prognosis, therapy planning, treatment guidance, response assessment as well as detection of recurrence and palliation.⁵ Tumours are inherently spatially and temporally heterogeneous structures and can cause small populations of cancer cells that are resistant to treatment,⁶ resulting in treatment failure and drug resistance,⁷ tumour recurrence and poor prognosis⁸. Tumour heterogeneity in medical imaging can often be subjectively and qualitatively assessed by correlating varying grey levels and tumour borders with tumour invasiveness into surrounding stroma,⁹ oedema and regions of necrosis.¹⁰ Medical imaging is an essential diagnostic tool of all cancer treatment decisions and in certain circumstances can be used alone, without histopathology for oncological treatment decision-making.¹¹

The use of standard-of-care scans to acquire more quantitative information has become popular in recent years. The advantage of using standard-of-care scans is that they are in high abundance in hospital imaging archival systems, which is useful for retrospective research. However, there are challenges to using standard-of-care medical images for more quantitative purposes, due to variability in manual segmentation, image processing time and computing power and different imaging interpretations among physicians. Also, using images acquired from different imaging modalities across different centres and hospitals can inherently cause image quality differences which can impact image interpretation. Manual segmentation is a time-consuming and laborious task and is prone to interobserver variability. In contexts such as radiation therapy where different imaging modalities are used to segment tumour volumes only and avoid healthy tissues, inaccurate segmentation can result in geographic miss during treatment and can potentially reduce local control, impacting overall survival outcomes.¹² Standard-of-care scans can also be used to extract different types of imaging features such as texture and shape features; as in the field of radiomics, which can be used to characterise and classify tumours by phenotype.⁶ However, this requires high computational power and processing times to compute such a large volume of data.

Deep learning (DL) is emerging as a valuable tool for quantitative image analysis as it is a fast and efficient way to process copious amounts of data and allows for automation of these tasks.¹³ It has many applications for medical imaging, including but not limited to image segmentation, and has been shown to increase the accuracy and speed of image segmentation tasks for anatomical object localisation.¹⁴

This editorial aims to highlight two articles recently published in the Journal of Medical Radiation Sciences using DL to enhance different areas of cancer management, specifically segmentation of images on medical imaging scans.

Deep Learning

Deep learning (DL) is one of the technologies in the field of artificial intelligence (AI) that has become a promising method to analyse large volumes of medical images in a more efficient and rigorous way. DL models are composed of multiple layers of artificial neurons. These layers progressively extract higher-level features from raw input data, allowing the system to make predictions or decisions with greater predictive power. In DL, the layers include an input, one or more hidden layers and an output layer.¹⁵ The input layer receives the input data (training datasets) allowing the hidden layers to process the input data from weighted connections. During training, input data are provided to the model along with the correct output labels. The weights in the neural network are initialised (often randomly), and the network makes a prediction based on the input data. Initial training usually involves a large dataset that has been manually labelled by humans. For example, in supervised learning, humans may label images, sentences or data points with the correct category, which the model uses to learn.

These connections are referred to as deep neural networks (DNNs) and mimic the way the neural networks function in the human brain. The DNN systems continue to learn from input data due to their interconnected nature and this allows for more accurate results or predictions in the output layer.¹⁶

Applications of DL in medical imaging are vast and can significantly impact image analysis and interpretation. Some applications of DL in medical imaging include image classification, segmentation, detection, reconstruction and registration.

There are different types of DL architectures, including convolutional neural networks (CNNs), recurrent neural networks (RNNs), generative adversarial networks (GANs) and transformers.

CNNs and U-Nets

The most common type of architecture employed for medical imaging segmentation is CNNs.¹⁶ CNNs are designed for grid-like data such as medical images. CNNs work by sliding a filter to detect features and identify patterns within images, creating spatial hierarchies in the data. This in turn will create a feature map that will serve as input for the next layer. This process will gradually create a hierarchical representation of the image. Each layer will have a specific filter that slides (or convolves) over the input image, and this creates a more complex feature map with more complex patterns. Pooling layers are used to down sample the data extracted from the convolution process, and this is performed to reduce the chance of overfitting. Overfitting occurs when a DL model performs very well on the training data but performs poorly on new, unseen data. This is problematic as the goal of DL is to build models that work well on new, unseen data – not just the data they have been trained on. In the final layers of the CNN, this model will make a final decision to classify that object (based on output from previous layers).

U-Nets are a type of CNN that focuses on up sampling each operation to improve the resolution of the output and is commonly used for multiclass semantic segmentation. Instead of pooling layers that only down sample data in a traditional CNN as mentioned earlier, the U-Net architecture will down sample and up sample data using a combination of convolutions, max pooling and skip connections. This architecture allows the U-Net to maintain spatial information, making it effective even with smaller training datasets.¹⁷

In a U-Net architecture, the left side of the U, or the encoder (contracting path), is responsible for the initial down sampling of the input image and allows the system to know what the object is within an image. The resolution of the input image is down sampled using a 2 × 2 max pooling unit which halves the resolution at every successive level of the U-Net. As the resolution halves the number of feature channels doubles. This technique allows the U-Net to learn more complex relationships within the image input. As we reach the bottom of the U-Net, the right side of the U-Net, or the decoder (expansive path), uses deconvolutions to up sample the images back to their original input resolution. At each level, the resolution doubles and the number of feature channels halves. This upsampling technique allows the U-Net to know where the object is within an image and its size. Skip connections transfer feature maps from the encoder to the decoder at matching spatial scales. These connections allow the decoder to incorporate both high-level and low-level features, which improves the precision and localisation of the output.

In the article by Cao et al.,¹⁸ DL was used to assess accuracy of an AI model that automatically segments and quantifies body composition using CT of the lumbar spine in colorectal cancer patients. This study used a two-dimensional U-Net architecture. Body composition analysis has become increasingly significant, as it has been reported to be associated with clinical outcomes such as survival and has also been identified as an effective predictor of chemotherapy toxicity responses. The study aimed to evaluate the accuracy of their AI-generated model for the automated quantification of body composition from L3 CT slices and compare its accuracy to the ground truth acquired from manual readings by experienced segmentation-trained clinicians with consultant surgeon and radiologist oversight. They used a training dataset of 270 CT slices and validation datasets of 68 slices derived from 116 colorectal cancer patients. As with the typical development of DL models, the training dataset was used to build the U-Net-based DL model and the validation dataset was used to assess the performance of the final model against the ground truth (manual segmentations from expert readers). Overall, the U-Net-based DL model performed well in segmenting subcutaneous adipose tissue (SAT), visceral adipose tissue (VAT) and skeletal muscles. The averaged dice coefficient was 0.98 for all measurements. The differences between AI segmentation and ground truth assessments by an experienced human reader were also very small, with the average area difference being less than 5 cm² and the average radiodensity difference of less than 1 Hounsfield unit (HU). The time saving is useful in a clinical setting with the study finding the AI model (0.17 ± 0.04 s) performing 4000 times faster per slice than a human reader (829.23 ± 127.09 s). However, the impact of different HU from varying kVp and CT systems was not considered in this study.

In the second article of this edition, Zhang et al.¹⁹ aimed to evaluate the automatic segmentation performance of MRI rectal tumours based on CNNs and propose a novel CNN model (named AttSEResUNet) with spatial attention and channel attention. This was compared to a standard U-Net, ResUNet and U- with AG module. A total of 65 patients were enrolled in the study. Forty-five patients were used for training the U-Net models and the remaining 20 were used for testing and validating the systems. The authors used T2-weighted (T2-w) imaging as this is widely available and most used clinically for preoperative staging and treatment response evaluation. Two radiologists with 7 and 15 years of experience were asked to manually segment the tumour volumes on T2-w imaging. The proposed algorithm (AttSEResUNet) performed well in comparison to observer 1 and 2 contours with a dice similarity coefficient (DSC) of 0.839 and 0.856 respectively. It also had a perfect lesion recognition rate of 100% and a false recognition rate of 0. It outperformed the other three models in all metrics when compared to the ground truth observer contours. The limitation of this study is the small sample size of only 65 patients. However, the authors performed data enhancement to increase the number of samples during the training phase of creating the model. Again, similarly to Cao et al.,¹⁸ the impact of image acquisition variability was not assessed in this study. MRI is more flexible compared to CT in terms of image acquisition, and different scanners, different field strengths and parameters can impact the quality of the scans acquired which could impact the final output from the U-Net system.

Conclusion

Deep learning is a promising and exciting method for enhancing quantitative image analysis in cancer. The literature shows the advantages of DL methods, particularly U-Net and CNNs in semantic image segmentation. The role of image segmentation for automatic contouring has the potential to speed up and automate many workflows that have traditionally been time-consuming and laborious. With all new technologies, caution is needed to ensure there is no generalisation of results. The impact of image quality acquired from an imaging modality using a different scanner or imaging protocol must be investigated further to ensure results from DL systems are accurate, consistent and reproducible.

Conflict of Interest

The authors declare no conflict of interest.

Open Research

Data Availability Statement

Data sharing is not applicable to this article as no new data were created or analyzed in this study.

References

1 WHO. Cancer. [Internet], 2022. Available from: https://www.who.int/news-room/fact-sheets/detail/cancer. (Date Accessed: 12/07/2024).
Google Scholar
2Luna A, Vilanova JC, Da Cruz H, Rossi S. Functional Imaging in Oncology. Springer, Berlin, 2014.
Google Scholar
3 QIBA. Quantitative Imaging Biomarkers Alliance. [Internet] 2011 [cited 2017 Oct 9]. DCE MRI quantification version 1.6. Available from: http://qibawiki.rsna.org/images/7/7b/DCEMRIProfile_v1_6-20111213.pdf.
Google Scholar
4Buckler AJ, Bresolin L, Dunnick MNR, Sullivan DC. A collaborative enterprise for multi-stakeholder participation in the advancement of quantitative imaging. Radiology [Internet] 2011; 258: 906–914.
10.1148/radiol.10100799
Google Scholar
5Fass L. Imaging and cancer: A review. Mol Oncol 2008; 2: 115–152.
10.1016/j.molonc.2008.04.001
PubMed Web of Science® Google Scholar
6Aerts HJ, Velazquez ER, Leijenaar RT, et al. Decoding tumour phenotype by noninvasive imaging using a quantitative radiomics approach. Nat Commun 2014; 5: 4006.
10.1038/ncomms5006
CAS PubMed Web of Science® Google Scholar
7Clark WM, Brooks W, Mackey A, et al. Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. N Engl J Med 2005; 353: 977–987.
PubMed Google Scholar
8Davnall F, Yip CSP, Ljungqvist G, et al. Assessment of tumor heterogeneity: An emerging imaging tool for clinical practice? Insights Imaging 2012; 3: 573–589.
10.1007/s13244-012-0196-6
PubMed Google Scholar
9Segal E, Sirlin CB, Ooi C, et al. Decoding global gene expression programs in liver cancer by noninvasive imaging. Nat Biotechnol 2007; 25: 675–680.
10.1038/nbt1306
CAS PubMed Web of Science® Google Scholar
10Upadhaya T, Morvan Y, Stindel E, Le Reste PJ, Hatt M. A framework for multimodal imaging-based prognostic model building: Preliminary study on multimodal MRI in Glioblastoma Multiforme. IRBM 2015; 36: 345–350.
10.1016/j.irbm.2015.08.001
Web of Science® Google Scholar
11Nestle U, Grpsu AL. Radiotherapy and Imaging. Functional Imaging in Oncology. 2014 p. 59–76.
Google Scholar
12Vinod SK, Min M, Jameson MG, Holloway LC. A review of interventions to reduce inter-observer variability in volume delineation in radiation oncology. J Med Imaging Radiat Oncol 2016; 60: 393–406.
10.1111/1754-9485.12462
PubMed Web of Science® Google Scholar
13Hatt M, Le Rest CC, Tixier F, Badic B, Schick U, Visvikis D. Radiomics: Data are also images. J Nucl Med 2019; 60: 38S–44S.
10.2967/jnumed.118.220582
PubMed Web of Science® Google Scholar
14Litjens G, Kooi T, Bejnordi BE, et al. A survey on deep learning in medical image analysis. Med Image Anal 2017; 42: 60–88.
10.1016/j.media.2017.07.005
PubMed Web of Science® Google Scholar
15Tran KA, Kondrashova O, Bradley A, Williams ED, Pearson JV, Waddell N. Deep learning in cancer diagnosis, prognosis and treatment selection. Genome Med 2021; 13: 1–17.
10.1186/s13073-021-00968-x
PubMed Web of Science® Google Scholar
16Lecun Y, Bengio Y, Hinton G. Deep learning. Nature 2015; 521: 436–444.
10.1038/nature14539
CAS PubMed Web of Science® Google Scholar
17Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation. Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference. Springer International Publishing, Munich, Germany, 2015; 234–241.
10.1007/978-3-319-24574-4_28
Google Scholar
18Cao K, Yeung J, Arafat Y, et al. Using a new artificial intelligence-aided method to assess body composition CT segmentation in colorectal cancer patients. J Med Radiat Sci 2024; 71: 519–528.
10.1002/jmrs.798
Google Scholar
19Zhang Z, Han J, Ji W, et al. Improved deep learning for automatic localisation and segmentation of rectal cancer on T2-weighted MRI. J Med Radiat Sci 2024; 71: 519–528.
Google Scholar

Volume71, Issue4

December 2024

Pages 505-508

Deep learning in image segmentation for cancer

Graphical Abstract

Introduction

Deep Learning

CNNs and U-Nets

Conclusion

Conflict of Interest

Open Research

Data Availability Statement

References

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

Deep learning in image segmentation for cancer

Graphical Abstract

Introduction

Deep Learning

CNNs and U-Nets

Conclusion

Conflict of Interest

Open Research

Data Availability Statement

References

References

Related

Information