Department of Dermatology , Gangnam Severance Hospital , Cutaneous Biology Research Institute , Yonsei University College of Medicine , Seoul , Republic of Korea , yonsei.ac.kr

Search for more papers by this author

Yeon-Woo Heo,

Yeon-Woo Heo

Department of Dermatology , Yonsei University Wonju College of Medicine , Wonju , Republic of Korea , yonsei.ac.kr

Search for more papers by this author

Sang-Hoon Lee,

Sang-Hoon Lee

Department of Dermatology , Yonsei University Wonju College of Medicine , Wonju , Republic of Korea , yonsei.ac.kr

Search for more papers by this author

Hang-Seok Chang,

Hang-Seok Chang

Department of Surgery , Thyroid Cancer Center , Gangnam Severance Hospital , Institute of Refractory Thyroid Cancer , Yonsei University College of Medicine , Seoul , Republic of Korea , yonsei.ac.kr

Search for more papers by this author

Yong Sang Lee,

Yong Sang Lee

Department of Surgery , Thyroid Cancer Center , Gangnam Severance Hospital , Institute of Refractory Thyroid Cancer , Yonsei University College of Medicine , Seoul , Republic of Korea , yonsei.ac.kr

Search for more papers by this author

Seok-Mo Kim,

Seok-Mo Kim

Department of Surgery , Thyroid Cancer Center , Gangnam Severance Hospital , Institute of Refractory Thyroid Cancer , Yonsei University College of Medicine , Seoul , Republic of Korea , yonsei.ac.kr

Search for more papers by this author

Sang Eun Lee,

Sang Eun Lee

orcid.org/0000-0003-4720-9955

Department of Dermatology , Gangnam Severance Hospital , Cutaneous Biology Research Institute , Yonsei University College of Medicine , Seoul , Republic of Korea , yonsei.ac.kr

Search for more papers by this author

Byungho Oh,

Byungho Oh

Department of Dermatology , Severance Hospital , Cutaneous Biology Research Institute , Yonsei University College of Medicine , Seoul , Republic of Korea , yonsei.ac.kr

Search for more papers by this author

Mi Ryung Roh,

Corresponding Author

Mi Ryung Roh

[email protected]

orcid.org/0000-0002-6285-2490

Department of Dermatology , Gangnam Severance Hospital , Cutaneous Biology Research Institute , Yonsei University College of Medicine , Seoul , Republic of Korea , yonsei.ac.kr

Search for more papers by this author

Sejung Yang,

Corresponding Author

Sejung Yang

[email protected]

orcid.org/0000-0002-5841-851X

Department of Precision Medicine , Yonsei University Wonju College of Medicine , 20 Ilsan-ro Wonju-si, Gangwon-do , Republic of Korea , yonsei.ac.kr

Department of Medical Informatics and Biostatistics , Graduate School , Yonsei University 20 Ilsan-ro , Wonju-si, Gangwon-do , Republic of Korea

Search for more papers by this author

Yuseong Chu,

Yuseong Chu

orcid.org/0000-0003-0930-4628

Department of Biomedical Engineering , Yonsei University , Wonju , Republic of Korea , yonsei.ac.kr

Search for more papers by this author

Seung-Won Jung,

Seung-Won Jung

orcid.org/0000-0003-4874-1513

Department of Dermatology , Yonsei University Wonju College of Medicine , Wonju , Republic of Korea , yonsei.ac.kr

Search for more papers by this author

Solam Lee,

Solam Lee

Department of Dermatology , Yonsei University Wonju College of Medicine , Wonju , Republic of Korea , yonsei.ac.kr

Search for more papers by this author

Sang Gyun Lee,

Sang Gyun Lee

Department of Dermatology , Gangnam Severance Hospital , Cutaneous Biology Research Institute , Yonsei University College of Medicine , Seoul , Republic of Korea , yonsei.ac.kr

Search for more papers by this author

Yeon-Woo Heo,

Yeon-Woo Heo

Department of Dermatology , Yonsei University Wonju College of Medicine , Wonju , Republic of Korea , yonsei.ac.kr

Search for more papers by this author

Sang-Hoon Lee,

Sang-Hoon Lee

Department of Dermatology , Yonsei University Wonju College of Medicine , Wonju , Republic of Korea , yonsei.ac.kr

Search for more papers by this author

Hang-Seok Chang,

Hang-Seok Chang

Department of Surgery , Thyroid Cancer Center , Gangnam Severance Hospital , Institute of Refractory Thyroid Cancer , Yonsei University College of Medicine , Seoul , Republic of Korea , yonsei.ac.kr

Search for more papers by this author

Yong Sang Lee,

Yong Sang Lee

Department of Surgery , Thyroid Cancer Center , Gangnam Severance Hospital , Institute of Refractory Thyroid Cancer , Yonsei University College of Medicine , Seoul , Republic of Korea , yonsei.ac.kr

Search for more papers by this author

Seok-Mo Kim,

Seok-Mo Kim

Department of Surgery , Thyroid Cancer Center , Gangnam Severance Hospital , Institute of Refractory Thyroid Cancer , Yonsei University College of Medicine , Seoul , Republic of Korea , yonsei.ac.kr

Search for more papers by this author

Sang Eun Lee,

Sang Eun Lee

orcid.org/0000-0003-4720-9955

Department of Dermatology , Gangnam Severance Hospital , Cutaneous Biology Research Institute , Yonsei University College of Medicine , Seoul , Republic of Korea , yonsei.ac.kr

Search for more papers by this author

Byungho Oh,

Byungho Oh

Department of Dermatology , Severance Hospital , Cutaneous Biology Research Institute , Yonsei University College of Medicine , Seoul , Republic of Korea , yonsei.ac.kr

Search for more papers by this author

Mi Ryung Roh,

Corresponding Author

Mi Ryung Roh

[email protected]

orcid.org/0000-0002-6285-2490

Department of Dermatology , Gangnam Severance Hospital , Cutaneous Biology Research Institute , Yonsei University College of Medicine , Seoul , Republic of Korea , yonsei.ac.kr

Search for more papers by this author

Sejung Yang,

Corresponding Author

Sejung Yang

[email protected]

orcid.org/0000-0002-5841-851X

Department of Precision Medicine , Yonsei University Wonju College of Medicine , 20 Ilsan-ro Wonju-si, Gangwon-do , Republic of Korea , yonsei.ac.kr

Department of Medical Informatics and Biostatistics , Graduate School , Yonsei University 20 Ilsan-ro , Wonju-si, Gangwon-do , Republic of Korea

Search for more papers by this author

First published: 11 January 2025

https://doi.org/10.1155/dth/4636142

Academic Editor: Tak-Wah Wong

Share a link

Email
Wechat
Bluesky

Abstract

The rising incidence of thyroid cancer globally is increasing the number of thyroidectomies, causing visible scars that can greatly affect the quality of life due to cosmetic, psychological, and social impacts. In this study, we explored the application of deep learning algorithms to objectively assess post-thyroidectomy scar morphology using computer-aided diagnosis. This study was approved by the Institutional Review Board of Yonsei University College of Medicine (approval no. 3-2021-051). A dataset comprising 7524 clinical photographs from 3565 patients with post-thyroidectomy scars was utilized. We developed a deep learning model using a convolutional neural network (CNN), specifically the ResNet 50 model and introduced a multiple clinical photography learning (MCPL) method. The MCPL method aimed to enhance the model’s understanding by considering characteristics from multiple images of the same lesion per patient. The primary outcome, measured by the area under the receiver operating characteristic curve (AUROC), demonstrated the superior performance of the MCPL model in classifying scar subtypes compared to a baseline model. Confidence variation analysis showed reduced discrepancies in the MCPL model, emphasizing its robustness. Furthermore, we conducted a decision study involving five physicians to evaluate the MCPL model’s impact on diagnostic accuracy and agreement. Results of the decision study indicated enhanced accuracy and reliability in scar subtype determination when the confidence scores of the MCPL model were integrated into decision-making. Our findings suggest that deep learning, particularly the MCPL method, is an effective and reliable tool for objectively classifying post-thyroidectomy scar subtypes. This approach holds promise for assisting professionals in improving diagnostic precision, aiding therapeutic planning, and ultimately enhancing patient outcomes in the management of post-thyroidectomy scars.

1. Introduction

The incidence of thyroid cancer is increasing rapidly worldwide. Traditional thyroidectomy involves making transverse incisions in the neck, leaving visible scars in exposed areas. Therefore, post-thyroidectomy scars can lead to considerable cosmetic problems and are associated with poor quality of life [1, 2]. Furthermore, as scars remain visible in exposed anatomical regions, they can result in negative psychological and social consequences, including low self-esteem, anxiety, depression, and stigmatization [3, 4]. In addition to esthetic issues, symptoms associated with post-thyroidectomy surgery such as pruritus, tightening, pain, and restriction of mobility on the scar could significantly impair the quality of life of patients [1, 4].

Various modalities have been used to treat post-thyroidectomy scars, including surgical revision, intralesional steroid injections, topical therapies, antimitotic agents, and laser treatment. However, scar management remains a challenge for dermatologists, as many cases require long-term, repeated, and combined treatment, tailored to the unique characteristics of each scar [5–8]. In addition, individual patients may want specific features of their scars treated, such as the size, shape, protrusion, recession, adhesion, and difference of color compared to that of the surrounding normal skin [9]. Therefore, as these features vary by scar subtype, appropriately assessing the morphological subtypes of scars to establish a suitable and personalized therapeutic strategy for individual patients is necessary.

Various scar evaluation tools for assessing both diagnostic and cosmetic outcomes are widely used and are known to reflect the therapeutic needs of patients; however, a standardized tool is still lacking, which remains a significant limitation in the field [10–12]. Recently, many studies using deep learning have been conducted in the field of dermatology [13–17]. In a 2017 study, a deep learning model demonstrated skin cancer diagnosis accuracy comparable to that of a dermatologist [18]. Since then, in 2021, deep learning models have become mainstream, with 85% of skin disease diagnosis algorithm developments using deep learning methods [19]. Additionally, deep learning–based computer-aided diagnosis systems have proven beneficial to physicians beyond simple skin diagnosis (Table S1) [20–22].

Although multiple images of the same skin lesion can be obtained, the deep learning model may not remain consistent [23]. The post-thyroidectomy scar dataset we collected also contains multiple images of the same scar (Figure S1). To overcome this problem, a new learning method was developed in a recent study [24]. This method utilizes the characteristics of skin data obtained from multiple clinical photographs of a single lesion. Therefore, we hypothesized that deep learning models could classify scar subtypes after thyroidectomy and that utilizing multiple images of the same scar could help develop more robust deep learning models.

Considering the limitations of current scar evaluation methods, particularly in subjective assessments, we propose a deep learning model to classify post-thyroidectomy scar types by analyzing their morphological characteristics. Using deep learning, we aimed to minimize the variability in scar assessments while improving diagnostic accuracy. We developed a deep learning model that employs multiple clinical photography learning (MCPL) from multiple clinical photographs. To validate its effectiveness, we compared a conventional deep learning model with a model employing MCPL. In addition, we conducted a decision study using the MCPL method to evaluate the improvements in diagnostic accuracy and physician agreement.

2. Materials and Methods

2.1. Data Source and Study Approval

This study was conducted on patients with post-thyroidectomy scars who visited the Department of Dermatology at Yonsei University Gangnam Severance Hospital between 2009 and 2019. Clinical data and photographs of their post-thyroidectomy scars were obtained. This study was conducted in accordance with the guidelines of the Declaration of Helsinki and approved by the Institutional Review Board of Yonsei University College of Medicine (approval no. 3-2021-051). The requirement for written informed consent was waived owing to the retrospective nature of the study, in which deidentified data were used. The analysis only included photographs captured during the initial visits. Three board-certified dermatologists (M.R., J.K., and S.L.) independently assessed each photograph to determine the diagnostic criteria (gold standard) for scar type. In the case of discrepancies between the reviewers, a final decision was reached by majority rule.

3. Data Preparation

We developed and validated deep learning models to classify four scar types (Figure 1(a)). We excluded missing and inadequate-quality photographs (e.g., images of scars that were not from horizontal incisions or were not in frontal view). Among the follow-up data for each patient, only first-visit data were used. However, all the photographs of adhesive scars were used because of insufficient data. A total of 7524 photographs from 3565 patients were included in the study. The number of multiple images of the same lesion per patient varied from one to seven. The most common scar type was linear flat (n = 1740; 48.81%), followed by hypertrophic (n = 1288; 36.13%), linear bulging (n = 517; 14.50%), and adhesive (n = 20; 0.56%). Most patients were women (3283 [92.09%]), and the mean age was 38.55 ± 8.41 years. A history of keloids and c-sec hypertrophic scars was found in 38 (1.07%) and 41 (1.35%) patients, respectively. We split the data into training, validation, and test datasets for each subtype at a ratio of 7:1:2 on a patient-by-patient basis. This process is summarized in Figure 1(b), and information regarding the dataset is presented in Table 1.

Details are in the caption following the image — **Figure 1 (a)**
Open in figure viewer PowerPoint

Study profile. (a) Four post-thyroidectomy scar subtypes. (b) Dataset preparation.

Table 1. Data characteristics.

Scar type (no. of images, %)	Linear flat scar (1740, 48.81%)	Linear bulging scar (517, 14.50%)	Hypertrophic scar (1288, 36.13%)	Adhesive scar (20, 0.56%)	Total (3565, 100.00%)
Sex (female), no. of patients (%)	1635 (93.97%)	481 (93.04%)	1148 (89.13%)	19 (95.00%)	3283 (92.09%)
Age, mean (SD)	38.37 (8.90)	43.68 (10.37)	36.57 (8.57)	49.05 (11.98)	38.55 (9.41)
Height, mean (SD)	162.14 (5.94)	161.15 (6.18)	162.71 (7.25)	160.01 (4.1)	162.19 (6.49)
Weight, mean (SD)	58.43 (10.07)	60.59 (10.71)	62.4 (12.54)	57.9 (5.51)	60.18 (11.24)
BMI, mean (SD)	22.18 (3.25)	23.31 (3.73)	23.5 (3.89)	22.65 (2.38)	22.58 (3.62)
History of keloid, no. of patients (%)	14 (0.80%)	5 (0.97%)	19 (1.48%)	0 (0.00%)	38 (1.07%)
History of c-sec hypertrophy, no. of patients (%)	28 (1.61%)	4 (0.77%)	16 (1.24%)	0 (0.00%)	48 (1.35%)
No. of images in train/valid/test (no. of patients in train/valid/test)	2480/354/708 (1218/172/350)	755/106/229 (361/51/105)	1836/257/514 (901/127/260)	185/0/100 (12/0/8)	5256/717/1551 (2492/350/723)

Note: History of keloid: Previous occurrence of keloids. History of c-sec hypertrophy: Previous occurrence of hypertrophic scarring from a cesarean section. No. of images in train/valid/test: The images were split into training, validation, and test datasets in a 7:1:2 ratio on a patient-by-patient basis.
Abbreviations: BMI, body mass index; SD, standard deviation.

3.1. Convolutional Neural Network (CNN)

CNNs are actively used in computer vision owing to recent hardware developments and the availability of big data. CNNs differ from traditional machine-learning methods in that they employ an integrated structure [25]. This allows them to extract and classify relevant features simultaneously during the learning process. Figure S2 illustrates an example of the CNN inference process.

The ResNet 50 CNN model was used, in which the number of last nodes in the scar-type classification was four [26]. For training the CNN, all images were resized to 224 × 224 and trained, and the ImageNet-pretrained CNN was fine-tuned. The Adam optimizer was used for all the tasks (model learning rate, 1 × 10⁻⁵) to prevent the CNN from overfitting the training data. The validation loss was measured for 10 epochs; if it did not decrease, early stopping was applied. Considering the limitations of the GPU CUDA memory, the batch size was set to 64.

For the deep learning method, the PyTorch framework Version 2.0.1 (https://pytorch.org/) with torchvision 0.15.2/CUDA 11.8 was used. The hardware system consisted of an Intel i9-13900K (CPU mark: 59107), a 64-GB DDR4 RAM, a 1-TB solid-state drive, and one NVIDIA GeForce RTX 4090TI 24-GB GPU.

3.2. Training Mechanism

Figure S1 (supporting information) shows examples of multiple images of each scar subtype after thyroidectomy captured per patient in the scar dataset. We developed an MCPL method that utilizes instance-based similarity loss. This method learns the dermatological data characteristics on a per-patient basis from multiple images of a single lesion per patient. As shown in Figure S3 (supporting information), the developed loss function is divided into three parts. First, L_{SPI(same patient instance)} assumes that lesion data from the same patient have similar characteristics. Next, L_{PS(positive similarity)} assumes that the characteristics of data from the same lesion class are similar. Finally, L_{NS(negative similarity)} ensures that different lesion classes have distinct characteristics. As shown on the left in Figure S3, feature maps are extracted using the deep learning model (in this case, batch size 10). Among the 10 feature maps, the feature map numbers (from the left) 1–3, 4–7, and 8–10 are patients 1 (blue), 2 (yellow), and 3 (red), respectively. For same patient instance loss (L_SPI), multiple images taken of same lesions of each patient have very similar features. Therefore, features extracted from various images of the same lesion are assumed to be very similar. Positive similarity loss (L_PS) indicates that images of the same scar type have similar features. Negative similarity loss (L_NS) indicates that feature maps extracted from images of different scar types are different. All three losses are calculated based on cosine similarity. Finally, learning is performed by adding a weighted cross-entropy loss function that considers class imbalance.

4. Primary Outcome and Measurement

The area under the receiver operating characteristic curve (AUROC) was the primary outcome for measuring the performance of the models. Sensitivity, specificity, precision, F1-score, and accuracy were evaluated. To assess model confidence variation between multi-instance images, we analyzed the confidence scores of multi-instance images of the same lesion per patient to confirm the confidence variation between the data. All statistics were reported using point estimates and 95% confidence intervals (CIs). Data were analyzed and visualized using Python 3.7.0 (Python Software Foundation).

4.1. Explanatory Deep Learning Model

We employed a class activation map (CAM) to explain the predictions of our deep learning model [27]. A CAM provides an intuitive understanding of the importance of deep learning models for a particular class. It explains the decision-making process of the models and helps to increase their reliability. Using the CAM, we analyzed the confidence difference between multiple dermatological data points for one lesion in the MCPL model.

4.2. Decision Study

We conducted a decision study to determine the effectiveness of the MCPL model in helping physicians determine morphological scar subtypes. Figure S4 shows the decision-making process for the decision study. Five physicians were recruited, and 100 images (human set) were extracted from the test dataset for the decision study. At a 2-week interval, the scar type was first determined by only viewing an image (Phase 1), and then, the decision study was conducted again with the confidence score of the MCPL model (Phase 2). Figure S5 shows an example of the decision study Google Survey distributed to physicians.

5. Results

Figure 2 shows the receiver operating characteristic curve for each scar type. Among the four subtypes, the MCPL and baseline models exhibit the highest performance in classifying hypertrophic scars (AUROC: 0.915 vs. 0.912; 95% CI: 0.906–0.923 vs. 0.911–0.913), followed by linear flat scars (AUROC: 0.880 vs. 0.841; 95% CI: 0.874–0.886 vs. 0.838–0.844), linear bulging scars (AUROC: 0.879 vs. 0.884; 95% CI: 0.867–0.891 vs. 0.879–0.888), and adhesive scars (AUROC: 0.869 vs. 0.787; 95% CI: 0.847–0.891 vs. 0.769–0.805). In terms of mean AUROC and accuracy, the MCPL model (mean AUROC: 0.886 and accuracy: 76.312) outperforms the baseline model (mean AUROC: 0.856 and accuracy: 72.624). Details of the other metrics, such as precision, recall, specificity, and F1-score, are listed in Table S2. In addition, we used the Inception V3 model [28], and the results demonstrated that it performed similarly to the ResNet 50 model. This is shown in Figure S6 (supporting information).

6. Confidence Variation of Multi-Instance Images

Figure 3(a) shows that both the MCPL and baseline models exhibit variations in instance confidence. For adhesive scars, these variations were 0.049 (95% CI: 0.015–0.084) in the MCPL model and 0.193 (95% CI: 0.124–0.261) in the baseline model. For the linear bulging scars, the variations were 0.063 (95% CI: 0.033–0.093) in the MCPL model and 0.103 (95% CI: 0.077–0.129) in the baseline model. For the hypertrophic scars, the variations were 0.019 (95% CI: 0.008–0.030) in the MCPL model and 0.079 (95% CI: 0.063–0.095) in the baseline model. For the linear flat scars, the variations were 0.031 (95% CI: 0.021–0.041) in the MCPL model and 0.097 (95% CI: 0.086–0.108) in the baseline model. Using the MCPL model, all the variations were lower than those obtained using the baseline model.

7. Performance Based on Decision Methods

The bar chart in Figure 3(b) compares the accuracy of the results obtained using data from patients with two or more images. The instance-based prediction method measures the accuracy of individual data, whereas the patient-based prediction method calculates the accuracy based on the arithmetic mean of the confidence score for each patient. Using both methods, the accuracy of the baseline model improved from 72.519 (95% CI: 72.380–72.657) to 74.882 (95% CI: 73.933–75.831). For the MCPL model, the accuracy increased from 76.334 (95% CI: 75.848–76.820) to 78.817 (95% CI: 77.427–80.206) using the same methods.

7.1. Explanatory Deep Learning Model

Figure 3(c) shows the CAM results obtained using both the MCPL and baseline models. The color CAMs use red to highlight areas important in the decision-making process of the model and blue to highlight the areas considered less significant.

7.2. Augmented Decision Making

Figure 4 summarizes the results of the physicians in each phase. In Phase 1, the mean accuracies of the physicians were 0.600 (95% CI: 0.485–0.715), 0.250 (95% CI: 0.031–0.469), 0.580 (95% CI: 0.309–0.851), and 0.765 (95% CI: 0.646–0.884) for adhesive, linear bulging, hypertrophic, and linear flat scars, respectively. In Phase 2, with the inclusion of the confidence score of the MCPL model, the mean accuracies improved to 0.812 (95% CI: 0.751–0.873), 0.750 (95% CI: 0.674–0.826), 0.813 (95% CI: 0.718–0.909), and 0.850 (95% CI: 0.796–0.904) for adhesive, linear bulging, hypertrophic, and linear flat scars, respectively, demonstrating a significant improvement in accuracy. Fleiss kappa and an agreement heatmap (Figure S7) were utilized to assess the agreement reliability among the five physicians, yielding kappa values of 0.349 (95% CI: 0.207–0.492) for Phase 1 and 0.778 (95% CI: 0.674–0.881) for Phase 2.

8. Discussion

This study showed that the models performed reasonably well in classifying the subtypes of post-thyroidectomy scars compared with previous dermatologic applications [29]. Multiple medical images of patients with the same pathology, such as retinal fundus images or mammograms, often show similar features [24]. These similarities are observed even when images are taken from different perspectives of the same lesion. Consequently, the features extracted from multiple images using a deep learning model may be comparable. Specifically, we hypothesized that the MCPL method, which utilizes multiple images of the same lesion, could reduce confidence variations in the assessment of post-thyroidectomy scars using a deep learning model. Although the AUROC (0.879 vs. 0.884; 95% CI: 0.867–0.891 vs. 0.879–0.888) for linear bulging scars showed a slight decrease, the performance on the other three scar subtypes, particularly adhesive scars (AUROC: 0.869 vs. 0.787; 95% CI: 0.847–0.891 vs. 0.769–0.805), improved significantly. The MCPL model demonstrated superior performance in predicting hypertrophic scars (AUROC: 0.915; 95% CI: 0.906–0.923) compared with the other types, probably owing to the distinct red-to-pinkish coloration of these scars. The predictive performance for linear flat scars (AUROC: 0.880; 95% CI: 0.874–0.886) and linear bulging scars (AUROC: 0.879; 95% CI: 0.867–0.891) was moderate, with discrepancies observed in cases of mild hypertrophic changes or bulging. These discrepancies were also noted among the human experts, potentially contributing to the variance. In the decision study (mean accuracy: 0.812 vs. 0.6; 95% CI: 0.751–0.873 vs. 0.485–0.715; p value < 0.05) involving five physicians, there was a significant improvement in diagnostic accuracy during Phase 2. In this phase, predictions were made with reference to the MCPL model (mean accuracy: 0.812; 95% CI: 0.751–0.873). This was compared with Phase 1, in which predictions were based only on images (mean accuracy: 0.6; 95% CI: 0.485–0.715). The MCPL method significantly reduced the confidence variation for all scar types in the results of multiple images of the same scar for each patient. The CAM results confirmed that the MCPL model consistently identified the same lesion areas across multiple images. Moreover, the patient-based prediction method showed enhanced accuracy (MCPL: 76.334 (95% CI: 75.848–76.820) to 78.817 (95% CI: 77.427–80.206) compared to the instance-based prediction when determining the scar subtype of a patient.

Classifying the subtypes of post-thyroidectomy scars in dermatology is crucial because each patient has different treatment needs based on their scar subtype. For example, patients with such scars, for whom we obtained the best performance with this model, tended to be treated for color in addition to protrusion. This is because this scar subtype shows a distinct erythematous color compared with other subtypes. Intralesional corticosteroid injection, particularly with triamcinolone acetonide (TCA), is a common treatment for hypertrophic scars. TCA is effective because of its antimitotic effects on keratinocytes and fibroblasts, which contribute to scar protrusion [30]. Additionally, it induces a vasoconstrictive effect on scars, preventing the delivery of oxygen and nutrients, which are known to proliferate scars [31]. Therefore, TCA injection is considered the main treatment for hypertrophic scars because it has shown a good effect when used in combination with other therapeutic options such as copper bromide laser and ablative carbon dioxide laser [32, 33]. However, TCA injection within scars can induce side effects such as atrophy, pigmentary changes, and telangiectasia, which can accentuate other features that patients want to treat. Therefore, they must undergo other therapeutic options such as laser treatment [34]. For scars characterized by contraction and retraction, fractional ablative carbon dioxide laser or subcision treatment is preferred [35–37]. These therapeutic needs of patients are well represented in the Vancouver Scar Scale (VSS), the most widely used tool to assess the quality of a scar. The VSS consists of four scar characteristics (height, pliability, pigmentation, and vascularity) and evaluates a semiquantitative score ranging from 0 to 13 points. In a previous study, although scar characteristics scored differently for each thyroid scar, subtypes were not identified, the VSS scores differed significantly among the thyroid scar subtypes, suggesting that each subtype has different characteristics that require treatment [1].

Recently, artificial intelligence has been increasingly integrated into dermatology, forming various medical modalities and demonstrating capabilities comparable to or surpassing those of dermatologists for diagnosis and classification [38–41]. A recent study showed that a CNN model classifying postoperative scars by severity achieved expert-level performance [42]. As dermatologists encounter many patients with post-thyroidectomy scars and ambiguously classify scar subtypes in real-world dermatologic clinics, they need a more objective assessment of scars beyond their visual and physical assessments. This deep learning model reliably assessed the morphological characteristics of post-thyroidectomy scars, and physicians demonstrated improved accuracy and sensitivity in classifying scar subtypes after augmented decision-making. We believe that these findings can significantly contribute to medical practice, from diagnosis to therapeutic planning, for post-thyroidectomy scars.

This study had some limitations. The clinical photographs were obtained at a single tertiary institution, and the models were not validated using external data. Additionally, the skin color of most of the study population was either III or IV on the Fitzpatrick scale. Hence, the performance was suboptimal for patients with other skin colors.

9. Conclusion

In this study, we developed a deep learning model to distinguish between post-thyroidectomy morphological scar subtypes. A total of 7524 scar images from 3565 patients were used, and the datasets were organized for each of the four scar subtypes. In particular, we used the MCPL method, considering that skin images can be multiple images of the same lesion. The MCPL method not only improved the performance of the baseline model but also significantly reduced the confidence variation when assessing multiple images of the same patient. In addition, the incorporation of decision-making with deep learning models markedly improved the diagnostic accuracy and reliability of the agreement between assessments. We expect that these deep learning models will be instrumental in aiding the evaluation and treatment planning of post-thyroidectomy scars, thereby enhancing both diagnostic precision and therapeutic outcomes.

Disclosure

This manuscript was presented at the 18th Congress of the Asian Association of Endocrine Surgeons—AsAES 2023.

Conflicts of Interest

The authors declare no conflicts of interest.

Author Contributions

Yuseong Chu: formal analysis, methodology, software, visualization, writing–original draft. Seung-Won Jung: data curation, investigation, methodology, writing–original draft.

Solam Lee: conceptualization, data curation, writing–original draft.

Sang Gyun Lee: data curation, resources.

Yeon-Woo Heo: investigation, resources.

Sang-Hoon Lee: investigation, resources.

Hang-Seok Chang: data curation, resources.

Yong Sang Lee: data curation, resources.

Seok-Mo Kim: data curation, resources.

Sang Eun Lee: data curation, resources.

Byungho Oh: Methodology, writing–review and editing.

Mi Ryung Roh: conceptualization, funding acquisition, project administration, writing–review and editing.

Sejung Yang: conceptualization, funding acquisition, supervision, writing–review and editing.

Yuseong Chu and Seung-Won Jung contributed equally to this work.

Funding

This study was funded by the National Research Foundation of Korea (NRF) through grants from the Ministry of Science and ICT (NRF-2022R1A2C2091160, NRF-2021R1A2C1094638) and supported by the “Regional Innovation Strategy (RIS)” program, funded by the Ministry of Education (MOE) and managed through the NRF (2022RIS-005).

Supporting Information

Figure S1. Post-thyroidectomy scar subtypes: multiple images per patient.

Figure S2. Example of convolutional neural network analysis process to classify scar subtypes of post-thyroidectomy scar images. Initially, low-level features are extracted from the feature extractor, and as it becomes deeper, high-level features are extracted. Based on the extracted features, the fully connected layer multiplies the weight of each node to create the prediction (output).

Figure S3. Multiple clinical photography learning (MCPL) process.

Figure S4. Schematic flow of decision study. The study was divided into two phases to assess physicians’ performance in diagnosing. In Phase 1, the physician determined the morphological type by looking at the scar image alone; In Phase 2, the physician made a decision by referring to the scar image and AI results. An example of the Google survey used in the decision study can be found in Figure S5.

Figure S5. Example of a decision study distributed to physicians. (a). Example of Phase 1. (b) Example of Phase 2.

Figure S6. Receiver operating characteristics of post-thyroidectomy scar prediction using Inception V3 model: (a) MCPL model. (b) Baseline model.

Figure S7. Agreement heatmap between physicians at each phase. (a). Agreement heatmap of Phase 1. (b) Agreement heatmap of Phase 2.

Table S1. Comparison of diagnostic sensitivity and specificity of physicians with and without AI assistance in skin cancer detection.

Table S2. Comparison of the performance between the baseline and MCPL model.

Open Research

Data Availability Statement

The data reported in this study are available from the corresponding author upon reasonable request. Also, the source code for this work is available at https://github.com/wormschu/multiple-clinical-photography-learning-/.

Supporting Information

References

1 Choi Y., Lee J. H., Kim Y. H. et al., Impact of Postthyroidectomy Scar on the Quality of Life of Thyroid Cancer Patients, Annals of Dermatology. (2014) 26, no. 6, 693–699.
10.5021/ad.2014.26.6.693
PubMed Web of Science® Google Scholar
2 Kurumety S. K., Helenowski I. B., Goswami S., Peipert B. J., Yount S. E., and Sturgeon C., Post-Thyroidectomy Neck Appearance and Impact on Quality of Life in Thyroid Cancer Survivors, Surgery. (2019) 165, no. 6, 1217–1221, https://doi.org/10.1016/j.surg.2019.03.006, 2-s2.0-85064626982.
10.1016/j.surg.2019.03.006
PubMed Web of Science® Google Scholar
3 Balci D. D., Inandi T., Dogramaci C. A., and Celik E., DLQI Scores in Patients With Keloids and Hypertrophic Scars: A Prospective Case Control Study, JDDG: Journal der Deutschen Dermatologischen Gesellschaft. (2009) 7, no. 8, 688–691.
10.1111/j.1610-0387.2009.07034_supp.x
CAS PubMed Web of Science® Google Scholar
4 Bock O., Schmid-Ott G., Malewski P., and Mrowietz U., Quality of Life of Patients With Keloid and Hypertrophic Scarring, Archives of Dermatological Research. (2006) 297, 433–438.
10.1007/s00403-006-0651-7
PubMed Web of Science® Google Scholar
5 Campagnoli M., Dell’Era V., Rosa M. S. et al., Patient’s Scar Satisfaction After Conventional Thyroidectomy for Differentiated Thyroid Cancer, Journal of Personalized Medicine. (2023) 13, no. 7.
10.3390/jpm13071066
Google Scholar
6 Hong N., Sheng B., and Yu P., Early Postoperative Interventions in the Prevention and Management of Thyroidectomy Scars, Frontiers in Physiology. (2024) 15.
10.3389/fphys.2024.1341287
Google Scholar
7 Huang C.-Y., Yen Y.-H., Lin C.-H. et al., Comparative Efficacy of Fractional CO2 Laser Combined With Topical Steroid Cream Versus Solution for Post-Thyroidectomy Scar Treatment: A Prospective Study, Healthcare. (2024) .
10.3390/healthcare12161605
PubMed Google Scholar
8 Tziotzios C., Profyris C., and Sterling J., Cutaneous Scarring: Pathophysiology, Molecular Mechanisms, and Scar Reduction Therapeutics Part II. Strategies to Reduce Scar Formation After Dermatologic Procedures, Journal of the American Academy of Dermatology. (2012) 66, no. 1, 13–24, https://doi.org/10.1016/j.jaad.2011.08.035, 2-s2.0-83655190836.
10.1016/j.jaad.2011.08.035
CAS PubMed Web of Science® Google Scholar
9 Monstrey S., Middelkoop E., Vranckx J. J. et al., Updated Scar Management Practical Guidelines: Non-Invasive and Invasive Measures, Journal of Plastic, Reconstructive & Aesthetic Surgery. (2014) 67, no. 8, 1017–1025.
10.1016/j.bjps.2014.04.011
PubMed Web of Science® Google Scholar
10 Choi W. K., Shin H. Y., Park Y. J., Lee S. H., Lee A.-Y., and Hong J. S., Analysis of Trends and Status of Evaluation Methods in Thyroid Scar, Heliyon. (2024) 10, no. 9.
10.1016/j.heliyon.2024.e29301
Google Scholar
11 Choo A. M. H., Ong Y. S., and Issa F., Scar Assessment Tools: How Do They Compare?, Frontiers in surgery. (2021) 8.
10.3389/fsurg.2021.643098
PubMed Google Scholar
12 Nedelec B., Shankowsky H., and Tredget E., Rating the Resolving Hypertrophic Scar: Comparison of the Vancouver Scar Scale and Scar Volume, Journal of Burn Care & Rehabilitation. (2000) 21, no. 3, 205–212.
10.1097/00004630-200021030-00005
CAS PubMed Web of Science® Google Scholar
13 Campanella G., Navarrete-Dechent C., Liopyris K. et al., Deep Learning for Basal Cell Carcinoma Detection for Reflectance Confocal Microscopy, Journal of Investigative Dermatology. (2022) 142, no. 1, 97–103.
10.1016/j.jid.2021.06.015
CAS PubMed Web of Science® Google Scholar
14 Chu Y. S., Lee S., Lee S. G. et al., Deep Learning Algorithms for Predicting Breslow Thickness From Dermoscopic Images of Acral Lentiginous Melanomas, Journal of Investigative Dermatology. (2022) .
Google Scholar
15 Junayed M. S., Islam M. B., Jeny A. A., Sadeghzadeh A., Biswas T., and Shah A. S., ScarNet: Development and Validation of a Novel Deep CNN Model for Acne Scar Classification With a New Dataset, IEEE Access. (2021) 10, 1245–1258.
10.1109/ACCESS.2021.3138021
Google Scholar
16 Pyun S. H., Min W., Goo B. et al., Real-Time, In Vivo Skin Cancer Triage by Laser-Induced Plasma Spectroscopy Combined With a Deep Learning-Based Diagnostic Algorithm, Journal of the American Academy of Dermatology. (2023) 89, no. 1, 99–105.
10.1016/j.jaad.2022.06.1166
CAS PubMed Google Scholar
17 Wang Y., Wang Y., Cai J., Lee T. K., Miao C., and Wang Z. J., Ssd-kd: A Self-Supervised Diverse Knowledge Distillation Method for Lightweight Skin Lesion Classification Using Dermoscopic Images, Medical Image Analysis. (2023) 84.
10.1016/j.media.2022.102693
Google Scholar
18 Esteva A., Kuprel B., Novoa R. A. et al., Dermatologist-Level Classification of Skin Cancer With Deep Neural Networks, Nature. (2017) 542, no. 7639, 115–118.
10.1038/nature21056
CAS PubMed Web of Science® Google Scholar
19 Choy S. P., Kim B. J., Paolino A. et al., Systematic Review of Deep Learning Image Analyses for the Diagnosis and Monitoring of Skin Disease, NPJ Digital Medicine. (2023) 6, no. 1.
10.1038/s41746-023-00914-8
PubMed Google Scholar
20 Cho S., Sun S., Mun J. H. et al., Dermatologist-Level Classification of Malignant Lip Diseases Using a Deep Convolutional Neural Network, British Journal of Dermatology. (2020) 182, no. 6, 1388–1394.
10.1111/bjd.18459
CAS PubMed Web of Science® Google Scholar
21 Han S. S., Kim Y. J., Moon I. J. et al., Evaluation of Artificial Intelligence–Assisted Diagnosis of Skin Neoplasms: A Single-Center, Paralleled, Unmasked, Randomized Controlled Trial, Journal of Investigative Dermatology. (2022) 142, no. 9, 2353–2362.
10.1016/j.jid.2022.02.003
CAS PubMed Web of Science® Google Scholar
22 Lee S., Chu Y., Yoo S. et al., Augmented Decision-Making for Acral Lentiginous Melanoma Detection Using Deep Convolutional Neural Networks, Journal of the European Academy of Dermatology and Venereology. (2020) 34, no. 8, 1842–1850.
10.1111/jdv.16185
CAS PubMed Web of Science® Google Scholar
23 Goessinger E., Cerminara S., Mueller A. et al., Consistency of Convolutional Neural Networks in Dermoscopic Melanoma Recognition: A Prospective Real-World Study About the Pitfalls of Augmented Intelligence, Journal of the European Academy of Dermatology and Venereology. (2024) 38, no. 5, 945–953.
10.1111/jdv.19777
CAS PubMed Google Scholar
24 Azizi S., Mustafa B., Ryan F. et al., Big Self-Supervised Models Advance Medical Image Classification, Proceedings of the IEEE/CVF International Conference on Computer Vision. (2021) .
Google Scholar
25 LeCun Y., Bengio Y., and Hinton G., Deep Learning, Nature. (2015) 521, no. 7553, 436–444.
10.1038/nature14539
CAS PubMed Web of Science® Google Scholar
26 He K., Zhang X., Ren S., and Sun J., Deep Residual Learning for Image Recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. (2016) .
Google Scholar
27 Zhou B., Khosla A., Lapedriza A., Oliva A., and Torralba A., Learning Deep Features for Discriminative Localization, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, June 2016, Las Vegas, NV, USA.
Google Scholar
28 Szegedy C., Vanhoucke V., Ioffe S., Shlens J., and Wojna Z., Rethinking the Inception Architecture for Computer Vision, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, June 2016, Las Vegas, NV, USA.
Google Scholar
29 Puri P., Comfere N., Drage L. A. et al., Deep Learning for Dermatologists: Part II. Current Applications, Journal of the American Academy of Dermatology. (2022) 87, no. 6, 1352–1360.
10.1016/j.jaad.2020.05.053
PubMed Web of Science® Google Scholar
30 Disphanurat W., Sivapornpan N., Srisantithum B., and Leelawattanachai J., Efficacy of a Triamcinolone Acetonide-Loaded Dissolving Microneedle Patch for the Treatment of Hypertrophic Scars and Keloids: A Randomized, Double-Blinded, Placebo-Controlled Split-Scar Study, Archives of Dermatological Research. (2023) 315, no. 4, 989–997.
10.1007/s00403-022-02473-6
CAS PubMed Google Scholar
31 Wu W.-S., Wang F.-S., Yang K. D., Huang C.-C., and Kuo Y.-R., Dexamethasone Induction of Keloid Regression Through Effective Suppression of VEGF Expression and Keloid Fibroblast Proliferation, Journal of Investigative Dermatology. (2006) 126, no. 6, 1264–1271.
10.1038/sj.jid.5700274
CAS PubMed Web of Science® Google Scholar
32 Kim J., Kim H., Kim Y. et al., The Combination of Copper Bromide Laser, a 10 600 Nm Ablative Carbon Dioxide Laser and Intralesional Triamcinolone for the Treatment of Hypertrophic Thyroidectomy Scars, Journal of the European Academy of Dermatology and Venereology. (2012) 26, no. 1, 125–126.
10.1111/j.1468-3083.2011.04025.x
CAS PubMed Google Scholar
33 On H. R., Lee S. H., Lee Y. S., Chang H. S., Park C., and Roh M. R., Evaluating Hypertrophic Thyroidectomy Scar Outcomes After Treatment With Triamcinolone Injections and Copper Bromide Laser Therapy, Lasers in Surgery and Medicine. (2015) 47, no. 6, 479–484.
10.1002/lsm.22375
PubMed Web of Science® Google Scholar
34 Schetman D., Hambrick G. W., and Wilson C. E., Cutaneous Changes Following Local Injection of Triamcinolone, Archives of Dermatology. (1963) 88, no. 6, 820–828.
10.1001/archderm.1963.01590240144024
CAS PubMed Web of Science® Google Scholar
35 Belmontesi M., Polydeoxyribonucleotide for the Improvement of a Hypertrophic Retracting Scar—An Interesting Case Report, Journal of Cosmetic Dermatology. (2020) 19, no. 11, 2982–2986.
10.1111/jocd.13710
PubMed Web of Science® Google Scholar
36 Lee J. H., Kim T. H., Lee Y. S., Chang H.-S., Park C. S., and Roh M. R., Combination of Surgical Subcision and Intralesional Corticosteroid Injection as a Cost-Effective and Minimally Invasive Treatment for Postoperative Adhesive Thyroidectomy Scars, Dermatologic Surgery. (2013) 39, no. 12, 1822–1826.
10.1111/dsu.12361
CAS PubMed Web of Science® Google Scholar
37 Staubach R., Glosse H., Fennell S., and Loff S., A Single-Institution Experience About 10 Years With Children Undergoing Fractional Ablative Carbon Dioxide Laser Treatment After Burns: Measurement of Air Pressure-Induced Skin Elevation and Retraction Time (Dermalab) Including Standardized Subjective and Objective Scar Evaluation, Journal of Burn Care and Research. (2023) 44, no. 3, 655–669.
10.1093/jbcr/irac125
PubMed Google Scholar
38 Fujisawa Y., Otomo Y., Ogata Y. et al., Deep-Learning-Based, Computer-aided Classifier Developed With a Small Dataset of Clinical Images Surpasses Board-Certified Dermatologists in Skin Tumour Diagnosis, British Journal of Dermatology. (2019) 180, no. 2, 373–381.
10.1111/bjd.16924
CAS PubMed Web of Science® Google Scholar
39 Han S. S., Kim M. S., Lim W., Park G. H., Park I., and Chang S. E., Classification of the Clinical Images for Benign and Malignant Cutaneous Tumors Using a Deep Learning Algorithm, Journal of Investigative Dermatology. (2018) 138, no. 7, 1529–1538.
10.1016/j.jid.2018.01.028
CAS PubMed Web of Science® Google Scholar
40 Han S. S., Park G. H., Lim W. et al., Deep Neural Networks Show an Equivalent and Often Superior Performance to Dermatologists in Onychomycosis Diagnosis: Automatic Construction of Onychomycosis Datasets by Region-Based Convolutional Deep Neural Network, PLoS One. (2018) 13, no. 1.
Google Scholar
41 Litjens G., Kooi T., Bejnordi B. E. et al., A Survey on Deep Learning in Medical Image Analysis, Medical Image Analysis. (2017) 42, 60–88.
10.1016/j.media.2017.07.005
PubMed Web of Science® Google Scholar
42 Kim J., Oh I., Lee Y. N. et al., Predicting the Severity of Postoperative Scars Using Artificial Intelligence Based on Images and Clinical Data, Scientific Reports. (2023) 13, no. 1.
Google Scholar

All articles

Deep Learning Algorithms for Assessment of Post-Thyroidectomy Scar Subtype

Abstract

1. Introduction