Volume 7, Issue 3 pp. 189-196
ORIGINAL ARTICLE
Open Access

Auto-segmentation of the clinical target volume using a domain-adversarial neural network in patients with gynaecological cancer undergoing postoperative vaginal brachytherapy

Junfang Yan

Junfang Yan

Department of Radiation Oncology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, P.R. China

Search for more papers by this author
Xue Qin

Xue Qin

Department of Obstetrics and Gynaecology, Luohe Central Hospital, Luohe, P.R. China

Search for more papers by this author
Caixia Qiao

Caixia Qiao

Department of oncology, Liaocheng Third People's Hospital, Liaocheng, P.R. China

Search for more papers by this author
Jiawei Zhu

Jiawei Zhu

Department of Radiation Oncology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, P.R. China

Search for more papers by this author
Lina Song

Lina Song

Department of Radiation Therapy, Cangzhou Central Hospital, Cangzhou, P.R. China

Search for more papers by this author
Mi Yang

Mi Yang

Department of oncology, Nanchong Central Hosipital, Nanchong, P.R. China

Search for more papers by this author
Shaobin Wang

Shaobin Wang

MedMind Technology Co., Ltd., Beijing, P.R. China

Search for more papers by this author
Lu Bai

Lu Bai

MedMind Technology Co., Ltd., Beijing, P.R. China

Search for more papers by this author
Zhikai Liu

Corresponding Author

Zhikai Liu

Department of Radiation Oncology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, P.R. China

Correspondence

Zhikai Liu and Jie Qiu, Department of Radiation Oncology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College No. 1 Shuaifuyuan Wangfujing, Dongcheng District, Beijing, 100730, China.

Email: [email protected] and [email protected]

Search for more papers by this author
Jie Qiu

Corresponding Author

Jie Qiu

Department of Radiation Oncology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, P.R. China

Correspondence

Zhikai Liu and Jie Qiu, Department of Radiation Oncology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College No. 1 Shuaifuyuan Wangfujing, Dongcheng District, Beijing, 100730, China.

Email: [email protected] and [email protected]

Search for more papers by this author
First published: 07 August 2023
Citations: 1

Junfang Yan, Xue Qin and Caixia Qiao contributed equally to this work

Abstract

Purpose

For postoperative vaginal brachytherapy (POVBT), the diversity of applicators complicates the creation of a generalized auto-segmentation model, and creating models for each applicator seems difficult due to the large amount of data required. We construct an auto-segmentation model of POVBT using small data via domain-adversarial neural networks (DANNs).

Methods

CT images were obtained postoperatively from 90 patients with gynaecological cancer who underwent vaginal brachytherapy, including 60 and 30 treated with applicators A and X, respectively. A basal model was devised using data from the patients treated with applicator A; next, a DANN model was constructed using these same 60 patients as well as 10 of those treated with applicator X through transfer learning techniques. The remaining 20 patients treated with applicator X comprised the validation set. The model's performance was assessed using objective metrics and manual clinical evaluation.

Results

The DANN model outperformed the basal model on both objective metrics and subjective evaluation (p<0.05 for all). The median DSC and 95HD values were 0.97 and 3.68 mm in the DANN model versus 0.94 and 5.61 mm in the basal model, respectively. Multi-centre subjective evaluation by three clinicians showed that 99%, 98%, and 81% of CT slices contoured by the DANN model were acceptable versus only 73%, 77%, and 57% of those contoured by the basal model. One clinician deemed the DANN model comparable to manual delineation.

Conclusion

DANNs provides a realistic approach for the wide application of automatic segmentation of POVBT and can potentially be used to construct auto-segmentation models from small datasets.

1 INTRODUCTION

Auto-segmentation models for clinical target volume (CTV) delineation using deep learning algorithms have gained increasing interest among researchers. When successful, such methods can provide clinicians with manual relief, and can also shorten the contouring time while reducing inter-observer variations.1-3 Postoperative vaginal brachytherapy (POVBT), an effective treatment modality for gynaecological tumours, is unique for the variety of applicators that it uses, which in turn involves careful CTV contouring. Moreover, different CTVs may generate for each individual patient when using different applicators.

Building an automatic segmentation model of POVBT poses two challenges. The diversity of applicators complicates the creation of a generalized auto-segmentation model. Developing a CTV auto-segmentation model for each applicator seems difficult, as constructing a satisfactory model usually requires a very large, homogeneous, high-quality dataset. However, collecting sufficient data is impractical or impossible in some clinical settings.

Transfer learning is a strategy inspired by the idea that previously acquired knowledge can be used to tackle new problems faster or provide better solutions.4 Domain-adversarial neural networks (DANNs) use transfer learning techniques that incorporate domain adaptation into the process of determining common feature representation; this method extracts features that are discriminative and invariant to the change in domains. Such networks has been applied to medical imaging modalities for different tasks, such as

Brion et al. have used 104 annotated CT scans and 60 non-annotated cone beam CT scans to train a DANN model for segmentation of pelvic organs in cone beam CTs, with an outlining speed of less than one second, providing some technical basis for online adaptive radiotherapy pelvic organ segmentation on cone beam computed tomography,5 traumatic brain injury segmentation on magnetic resonance imaging,6 and synthetic medical images acquired during endoscopy.7 To date, however, there have been no studies on the application of this cost-effective network in CTV auto-segmentation models for radiotherapy.

In this study, we try to construct an auto-segmentation model using small data based on DANN. First, we use data from 10 patients treated with applicator X, as well as the readily available data from 60 patients treated with applicator A to build an auto-segmentation for applicator X; we then assessed this model using objective as well as multi-centre subjective evaluations. To our knowledge, ours is the first CTV auto-segmentation model for patients with gynaecological cancer treated with POVBT, and is the first attempt to apply a transfer learning strategy for CTV segmentation to radiotherapy.

2 METHODS

2.1 Data processing

CT data from 90 patients with gynaecological cancer who had undergone hysterectomy followed by POVBT in our department were collected between May 2018 and March 2021. Sixty of the patients were treated with applicator A (Nucletron; Elekta, Stockholm, Sweden; Figure 1A) and 30 were treated with applicator X (patent no. ZL201320564893.3; Figure 1B). All patients underwent pelvic CT simulation with 3 mm-thick slices using Brilliance CT Big Bore instrument (Philips Healthcare, Best, The Netherlands).

Details are in the caption following the image
Two different vaginal applicators used in this study and the corresponding computed tomography axial images showing them positioned in situ. (A) applicator A, and (B) applicator X.

The CTV was defined as a 5-mm expansion of the applicator surface along a 3-cm length measured from the tip of the applicator while avoiding the nearby normal organs (bladder, rectum, sigmoid, and small intestine). Any surrounding air gaps between the applicator surface and vaginal mucosa as well as potential lesions thicker than 5 mm were considered when contouring the CTV. To ensure the delineation accuracy, all CTVs were first delineated by a junior clinician and then approved by a radiation oncologist with broad experience in treating gynaecologic malignancies.

Two models were created: a basal model using CT data from 60 patients treated with applicator A, and a DANN model created using the CT slices from the same 60 patients in additional to 10 randomly selected patients treated with applicator X. The remaining 20 patients treated with applicator X were used as the validation dataset. Manually delineated reference segmentation was referred to as “ground truth (GT)”.

2.2 Segmentation network architectures using the transfer learning technique

Inspired by the DANN, the segment network using transfer learning also aims to extract common features in different scales that domain discriminator cannot distinguish correctly. Segmentation decisions are made based on the multi-scale features, and their parameters are optimized using segmentation loss from both the basal and target domains.

As depicted in Figure 2, the network architecture contains three parts: a feature extractor F θ F ${F}_{{\theta }_F}$ , a segmentation predictor S θ S ${S}_{{\theta }_S}$ , and a domain discriminator D θ D ${D}_{{\theta }_D}$ . The feature extractor aims to obtain common features from the inputted basal or target data that are difficult for the domain discriminator to differentiate. At the same time, the parameters used in both the feature extractor and segmentation predictor minimises the segmentation loss in both the basal and target domains.

Details are in the caption following the image
The network architecture of domain-adversarial neural networks.

In the training stage, the amounts of the basal input and label { ( x i S , y i S ) } i N $\{ {( {x_i^S,y_i^S} )} \}_i^N$ are very large, whereas the available target data { ( x i T , y i T ) } i N $\{ {( {x_i^T,y_i^T} )} \}_i^{N^{\prime}}$ are considerably fewer. To load the same amount of basal and target data in each batch, the target data are augmented to match the amount of basal data using rotation, scale, and shift operations.

The loss function of the segmentation transfer learning model consists of three parts: the segmentation dice loss of target samples, the segmentation dice loss of basal samples, and the binary cross entropy of domain discriminator results.
L x S , y S , x T , y T = ( 1 D i c e x S , y S ) + ( 1 D i c e x T , y T ) + α · B C E x S , 0 , x T , 1 $$\begin{equation*}L\left( {{x}^S,{y}^S,{x}^T,{y}^T} \right) = (1 - Dice\left( {{x}^S,{y}^S} \right)) + (1 - Dice\left( {{x}^T,{y}^T} \right)) + \alpha \cdot BCE\left( {{x}^S,0,{x}^T,1} \right)\end{equation*}$$
D i c e x S , y S = 1 N i 2 · S θ S F θ F x i S y i S S θ S F θ F x i S + y i S $$\begin{equation*}Dice\left( {{x}^S,{y}^S} \right) = \frac{1}{N}\sum_i {\frac{{2 \cdot \left| {{{{\bf S}}}_{{\theta }_{{\bf S}}}\left( {{{{\bf F}}}_{{\theta }_{{\bf F}}}\left( {x_i^S} \right)} \right) \cap y_i^S} \right|}}{{\left| {{{{\bf S}}}_{{\theta }_{{\bf S}}}\left( {{{{\bf F}}}_{{\theta }_{{\bf F}}}\left( {x_i^S} \right)} \right)} \right| + \left| {y_i^S} \right|}}} \end{equation*}$$
D i c e x T , y T = 1 N i 2 · S θ S F θ F x i T y i T S θ S F θ F x i T + y i T $$\begin{equation*}Dice\left( {{x}^T,{y}^T} \right) = \frac{1}{N}\sum_i {\frac{{2 \cdot \left| {{{{\bf S}}}_{{\theta }_{{\bf S}}}\left( {{{{\bf F}}}_{{\theta }_{{\bf F}}}\left( {x_i^T} \right)} \right) \cap y_i^T} \right|}}{{\left| {{{{\bf S}}}_{{\theta }_{{\bf S}}}\left( {{{{\bf F}}}_{{\theta }_{{\bf F}}}\left( {x_i^T} \right)} \right)} \right| + \left| {y_i^T} \right|}}} \end{equation*}$$
B C E x S , 0 , x T , 1 = i w i 0 · log D θ D F θ F x i S + 1 0 · log 1 D θ D F θ F x i S i w i 1 · log D θ D F θ F x i T + 1 1 · log 1 D θ D F θ F x i T = i w i log 1 D θ D F θ F x i S + log D θ D F θ F x i T $$\begin{equation*} \def\eqcellsep{&}\begin{array}{l} BCE\left( {{x}^S,0,{x}^T,1} \right) = - \displaystyle\sum_i {{w}_i\left( {0 \cdot \log {{{\bf D}}}_{{\theta }_{{\bf D}}}\left( {{{{\bf F}}}_{{\theta }_{{\bf F}}}\left( {x_i^S} \right)} \right) + \left( {1 - 0} \right) \cdot \log \left( {1 - {{{\bf D}}}_{{\theta }_{{\bf D}}}\left( {{{{\bf F}}}_{{\theta }_{{\bf F}}}\left( {x_i^S} \right)} \right)} \right)} \right)} \\ - \displaystyle\sum_i {{w}_i\left( {1 \cdot \log {{{\bf D}}}_{{\theta }_{{\bf D}}}\left( {{{{\bf F}}}_{{\theta }_{{\bf F}}}\left( {x_i^T} \right)} \right) + \left( {1 - 1} \right) \cdot \log \left( {1 - {{{\bf D}}}_{{\theta }_{{\bf D}}}\left( {{{{\bf F}}}_{{\theta }_{{\bf F}}}\left( {x_i^T} \right)} \right)} \right)} \right)} \\ = - \displaystyle\sum_i {{w}_i\left( {\log \left( {1 - {{{\bf D}}}_{{\theta }_{{\bf D}}}\left( {{{{\bf F}}}_{{\theta }_{{\bf F}}}\left( {x_i^S} \right)} \right)} \right) + \log {{{\bf D}}}_{{\theta }_{{\bf D}}}\left( {{{{\bf F}}}_{{\theta }_{{\bf F}}}\left( {x_i^T} \right)} \right)} \right)} \end{array} \end{equation*}$$
α is a scalar that controls the weight of the discriminator loss L. Weight wi denotes the contribution of sample i to the discriminator loss. Because the feature extractor aims to confuse the discriminator and maximise the discriminator loss, the multiplier λ $ - \lambda $ is introduced to back-propagate the discriminator loss to θD. The parameter λ was set to 0.2.

2.3 Quantitative evaluation metrics

The commonly used Dice similarity coefficient (DSC) and 95th percentile Hausdorff distance (95HD) were used to objectively quantify the contouring accuracy. The relative volumetric overlap between automated and manual contours was measured by DSC, for which a higher value indicated a higher overlap ratio; the ideal DSC value is 1. The 95HD (in mm) reflects the coordination between the two contours, and a lower value indicates a lower difference; the ideal 95HD value is 0 mm. The medians (25–75% interquartile ranges [IQRs]) were calculated for both the DSC and 95HD.

2.4 Clinical evaluation

Five slices from each patient included in the testing dataset were randomly selected using Fisher-Yates shuffling to determine the basal model, DANN model, and GT contours. A total of 300 slices were obtained (5 × 20 = 100 slices in each of the basal, DANN, and GT groups).

Three radiation oncologists from three different centres with broad experience in VBT were invited to perform the subject evaluation on a four-point scale8. Clinician A was from our centre, while the other two came from outside institutions. The scores represent the degrees of modifications required for the auto-segmentation results: 0 = rejection, 1 = major revision, 2 = minor revision, and 3 = no revision. A score of ≥2 indicates that the segmentation is acceptable for clinical application.

2.5 Consistency test

The consistency test was supplementary to the graded scoring evaluation performed by the clinicians. One hundred CT slices in the testing dataset were randomly selected to show the delineation results of two models simultaneously, of which the contour colour on each CT slice was randomly set to red or green. Three clinicians who were blind to the data chose the more suitable slice. The tests were performed for the basal model versus GT contours, DANN model versus GT contours, and DANN model versus basal contours. A total of 300 slices were evaluated as part of the consistency test.

2.6 Statistical analysis

The Wilcoxon signed-rank test was used to assess the DSC and 95HD values of the two auto-segmentation models. The Kruskal-Wallis test was performed to compare the evaluation scores of the basal model, DANN model, and GT contours from each clinician. The Friedman test for K-related samples was performed to analyse the differences in the scores of the three clinicians for each model. Post hoc tests for pairwise comparisons after multiple comparisons were performed using the Bonferroni method to correct for significance. The χ2 and consistency tests were used to check the evaluation consistency between the three clinicians. Statistical significance was set at a two-tailed p-value of <0.05.

3 RESULTS

3.1 Quantitative evaluation

All CT slices of the 20 patients in the test dataset were evaluated for the basal and

DANN models with quantitative performance metrics. The DSCs (interquartile ranges) of the basal and DANN models were 0.94 (0.93, 0.95) versus 0.97 (0.94, 0.98) (p<0.001), while the 95HD values were 5.61 (4.86, 7.26) versus 3.68 (2.91, 4.50) mm (p = 0.006). These data indicated that both the DSC and 95HD of the DANN model were significantly improved over those of the basal model (Figure 3).

Details are in the caption following the image
The Dice similarity coefficients (DSC) and 95th percentile Hausdorff distances (95HD) of the auto-segmentation results for basal model and domain-adversarial neural network model in all slices obtained from 20 patients in the validation group.

3.2 Clinicians’ evaluations

Examples of the clinicians’ evaluations of the auto-segmented slices are presented in Figure 4. As summarised in Supplementary Table. 1, the rates of evaluation scores ≥2 in the three groups as obtained by three clinicians were 73%, 77%, and 57% for the basal group, 99%, 98%, and 81% for the DANN group, and 100%, 98%, and 95% for the GT group. Each clinician provided significantly different scores for each group except for clinician B, whose contouring results with the DANN model were comparable to those obtained with manual delineation (p = 0.05) (Figure 5A, B, and C). For each group, the evaluation scores showed no significant difference between clinicians A and B, whereas clinician C's scores were significantly lower than those of the other two (all p>0.05, Figure 5D, E, and F).

Details are in the caption following the image
Computed tomography images showing examples of clinician's evaluations of auto-segmented slices. (A) Score=3; (B) score=2; (C) score=1; and (D) score=0. Auto-segmentation contours are in blue, while the ground truth is shown in red.
Details are in the caption following the image
Violin plots show the subjective scoring of the three groups by three clinicians (n = 100 computed tomography slices in each group). DANN, domain-adversarial neural networks; GT, ground truth.

3.3 Consistency evaluation

The consistency evaluation results using two contours on the CT slice simultaneously are shown in Figure 6 and Supplementary Tables 2–7. No significant differences were found between the three clinicians in terms of selecting the more suitable contour for the basal versus GT groups (χ2 = 2.829, p = 0.243, kappa index 0.390–0.491), DANN versus GT groups (χ2 = 4.062, p = 0.131, kappa index 0.391–0.565), and DANN versus basal groups (χ2 = 0.034, p = 0.983, kappa index 0.543–0.674). Supplementary Figure 1 shows examples of the evaluations in the DANN versus basal groups as performed by the three clinicians.

Details are in the caption following the image
The consistency evaluation results with two groups contours on the same slice simultaneously in 100 CT slices. CT, computed tomography; DANN,

4 DISCUSSION

With the development of POVBT techniques, a rich variety of applicators with different shapes (e.g., cylindrical, irregular, or spherical) and materials (polyphenylsulfone plastic, photopolymer, acrylic, or saline-filled)9-12 have emerged. The diversity of applicators makes it difficult to create an automatic segmentation model of POVBT. Through the DANN, we collect only 10 cases to build a clinically applicable auto-segmentation model for postoperative patients with gynaecological cancer, demonstrating the DANN network could addressing data sample size limitations.

The DANN model yielded good quantitative data, as a DSC value ≥0.70 is generally considered to represent good agreement.13, 14 While the basal model yielded a satisfactory median DSC value of 0.94, the DANN model had a significantly higher DSC value of 0.97. There is no recommended 95HD value, although the lowest achievable is desired. For patients with cervical cancer who underwent radical radiotherapy, the CTV 95HD values were reported to be 5.34 mm in external beam radiotherapy15 and 8.1 mm for VBT.16 For patients with breast cancer, the 95HD values were reportedly 5.65 mm for post-modified radical mastectomy8 and 10.5 mm for breast-conserving radiotherapy.2 The 95HD values in all these models are higher than those achieved using our DANN model (3.68 mm). This may be partly related to the small volume of the CTV.

Although the CTV for POVBT is generally defined as a 5 mm expansion from the applicator surface,17some contouring details that should be considered in clinical practice include air pockets between the applicator surface and vaginal mucosa18 as well as the outer walls of the bladder and rectum that should be carefully avoided.19These subtle but critical details are not reflected in quantitative indicators; therefore, clinicians’ evaluations of auto-segmentation results are necessary before clinical use.

The DANN model showed satisfactory performance in terms of the clinicians’ evaluation. Subjective evaluation of three clinicians from different centres showed that 81%, 98%, and 99% of CT slices contoured by the DANN model had scores ≥2, rendering them acceptable for clinical practice; these scores correspond to the acceptable rates of 89.1–97.9% for patients with breast cancer post-mastectomy and 94.6% for rectal cancer patients receiving neoadjuvant radiotherapy.8, 20In this study, clinicians A and C concluded that the performance of the DANN model was slightly inferior to that of GT, while clinician B determined that the DANN model was comparable to GT. Clinician C found that only 81% of CTV slices generated by the DANN model were acceptable, and that the mean score of this model was only 1.93 (Supplementary Tab. 1). Moreover, the scores of clinician C for all three groups were significantly lower than those of clinicians A and B (Figure 5D, E, and F). This indicated that clinician C had higher expectations in terms of delineating the results, which may also explain the inter-observer variability. When choosing the more suitable of the two contours on a CT slice that was presented to all three simultaneously, all came to the same judgment (Figure 5), suggesting no large inter-observer differences between clinicians. The differences in scores between clinician C and the other evaluators may be reduced if the precision of the grading system is sufficiently improved.

Kandalan et al21 built a Source model for dose prediction, then they built an External target model through transfer learning with minimal input data from an external institution planning style. The External target model predicts well, with DSC values of isodose volumes was 0.92, 0.93 and 0.96 for the low, intermediate and high dose. Their research proves that it is feasible to use the deep Transfer learning model to localize the source model. In addition to sharing automatic segmentation models among institutions with different practical styles, this approach may also be applicable for addressing other similar questions. For example, constructing auto-segmentation models for patients with cervical cancer undergoing radical brachytherapy with different applicators, and even sharing segmentation model data for organs-at-risk in applications ranging from external radiotherapy to brachytherapy.

There were some limitations to this study. First, although multicentre evaluation was used in this study, both training and testing data came from the same centre, lacking robustness test of the DANN model. Second, the scoring system may not have been sufficiently refined and addressed this ought to improve subjective evaluations, such as the time spent on the modification. Third, we did not compare the dosimetry parameters based on DANN and GT delineations. Therefore, additional research is needed to further explore the role of DANN in auto-segmentation for radiotherapy.

5 CONCLUSION

For clinical settings where large amounts of data are not readily available, the DANN potentially allows for the use of small datasets to construct auto-segmentation models with acceptable performance.

DECLARATIONS

CONFLICT OF INTERESTS

Authors Shaobin Wang and Lu Bai were employed by company of MedMind Technology CO. Ltd. Beijing. The remaining authors declare that this research was conducted without any financial or commercial relationships that could be construed as a potential conflict of interest.

ETHICS APPROVAL

This study was approved by the Institutional Review Board of Peking Union Medical College Hospital.

    The full text of this article hosted at iucr.org is unavailable due to technical difficulties.