Volume 2025, Issue 1 8511049

Research Article

Open Access

Diagnosis of Benign and Malignant Newly Developed Nodules on the Surgical Side After Breast Cancer Surgery Based on Machine Learning

Zhixiang Wang,

Zhixiang Wang

orcid.org/0000-0002-4778-7539

Department of Ultrasound , Beijing Friendship Hospital , Capital Medical University , Beijing , China , ccmu.edu.cn

Search for more papers by this author

Qingqing Li,

Qingqing Li

orcid.org/0000-0002-3166-3195

Department of Ultrasound , Beijing Friendship Hospital , Capital Medical University , Beijing , China , ccmu.edu.cn

Search for more papers by this author

Yiran Wang,

Yiran Wang

orcid.org/0009-0004-8351-3510

Honors College , Nanjing Normal University , Nanjing , 210023 , China , njnu.edu.cn

Search for more papers by this author

Linxue Qian,

Linxue Qian

orcid.org/0000-0001-7116-0608

Department of Ultrasound , Beijing Friendship Hospital , Capital Medical University , Beijing , China , ccmu.edu.cn

Search for more papers by this author

Xiangdong Hu,

Xiangdong Hu

orcid.org/0000-0003-4841-4264

Department of Ultrasound , Beijing Friendship Hospital , Capital Medical University , Beijing , China , ccmu.edu.cn

Search for more papers by this author

Dong Liu,

Corresponding Author

Dong Liu

[email protected]

orcid.org/0000-0001-8285-2394

Department of Ultrasound , Beijing Friendship Hospital , Capital Medical University , Beijing , China , ccmu.edu.cn

Search for more papers by this author

Zhixiang Wang,

Zhixiang Wang

orcid.org/0000-0002-4778-7539

Department of Ultrasound , Beijing Friendship Hospital , Capital Medical University , Beijing , China , ccmu.edu.cn

Search for more papers by this author

Qingqing Li,

Qingqing Li

orcid.org/0000-0002-3166-3195

Department of Ultrasound , Beijing Friendship Hospital , Capital Medical University , Beijing , China , ccmu.edu.cn

Search for more papers by this author

Yiran Wang,

Yiran Wang

orcid.org/0009-0004-8351-3510

Honors College , Nanjing Normal University , Nanjing , 210023 , China , njnu.edu.cn

Search for more papers by this author

Linxue Qian,

Linxue Qian

orcid.org/0000-0001-7116-0608

Department of Ultrasound , Beijing Friendship Hospital , Capital Medical University , Beijing , China , ccmu.edu.cn

Search for more papers by this author

Xiangdong Hu,

Xiangdong Hu

orcid.org/0000-0003-4841-4264

Department of Ultrasound , Beijing Friendship Hospital , Capital Medical University , Beijing , China , ccmu.edu.cn

Search for more papers by this author

Dong Liu,

Corresponding Author

Dong Liu

[email protected]

orcid.org/0000-0001-8285-2394

Department of Ultrasound , Beijing Friendship Hospital , Capital Medical University , Beijing , China , ccmu.edu.cn

Search for more papers by this author

First published: 17 February 2025

https://doi.org/10.1155/tbj/8511049

Academic Editor: Imtiaz Wani

Share a link

Email
Wechat
Bluesky

Abstract

Objective: To enhance the diagnostic accuracy of new nodules on the surgical side after breast cancer surgery using machine learning techniques and to explore the role of multifeature fusion.

Methods: Data from 137 breast cancer postoperative patients with new nodules from January 2016 to April 2024 were analyzed. Clinical, ultrasound, immunohistochemistry, and surgical features were combined. Multiple machine learning models, including support vector machine (SVM), random forest, gradient boosting, AdaBoost, and XGBoost, were trained and tested. Model performance was evaluated using stratified ten-fold cross-validation. Ablation experiments assessed the impact of different feature combinations on diagnostic performance.

Results: The SVM model performed best, with an AUC of 0.8664, an accuracy of 0.8099, a sensitivity of 0.565, and a specificity of 0.9267. Ablation experiments indicated that multifeature fusion significantly improved diagnostic performance, especially when combining clinical, ultrasound, immunohistochemistry, and surgical features. Gradient boosting and random forest models showed slightly inferior performance, while AdaBoost had balanced but lower effectiveness.

Conclusion: Machine learning, particularly the multifeature fusion SVM model, shows significant potential in diagnosing new nodules after breast cancer surgery. It can assist doctors in developing more effective treatment plans, improving patient outcomes. Future studies should expand sample sizes, include multicenter data, and explore advanced algorithms to further enhance diagnostic performance.

1. Introduction

Breast cancer remains a leading cause of morbidity and mortality among women worldwide. According to the World Health Organization, breast cancer has surpassed lung cancer to become the most frequently diagnosed cancer in women globally, with 2.3 million new cases each year, accounting for 11.7% of all cancer cases [1]. In China, breast cancer not only has the highest incidence rate among women but also ranks as a leading cause of cancer-related deaths among women under the age of 70 [2]. Despite advances in breast cancer treatment, postoperative recurrence and distant metastasis continue to pose significant challenges in clinical management [3].

The diagnosis of postoperative recurrent nodules remains a significant challenge in the follow-up of breast cancer. Newly developed nodules after breast cancer surgery may be benign postoperative reactions such as seromas [4], hematomas [5], or scar tissue [6], but they could also indicate malignant tumor recurrence [7]. The nature of these nodules is difficult to accurately differentiate using conventional ultrasound [8], leading to uncertainty in subsequent patient treatment. Existing follow-up methods, such as ultrasound, mammography, and magnetic resonance imaging (MRI) [9–11], each have their advantages but also certain limitations [12]. Ultrasound, due to its noninvasiveness, low cost, and convenience, is widely used in routine postoperative follow-up of breast cancer [13]. However, due to the overlapping ultrasound appearances of postoperative reactions and recurrent nodules [14], the accuracy (ACC) of ultrasound in distinguishing between benign and malignant newly developed nodules still needs improvement [15].

In recent years, machine learning technology has shown tremendous potential in medical imaging diagnosis [16]. By learning from large volumes of medical data, machine learning models can identify and classify complex imaging features, enhancing diagnostic ACC and efficiency [17, 18]. Existing studies have demonstrated that machine learning algorithms can achieve excellent results in the imaging diagnosis of various cancers, such as lung cancer [19] and brain tumors [20]. Especially for breast nodules, Zheng X et al. demonstrated that deep learning radiomics can be applied to predict axillary lymph node status in early-stage breast cancer. Meanwhile, Zheng et al. [27] utilized machine learning-based breast tumor ultrasound radiomics to preoperatively predict the axillary sentinel lymph node metastasis burden in early-stage invasive breast cancer [28]. However, research on the diagnosis of newly developed nodules in breast cancer using machine learning remains relatively limited.

This study aims to utilize machine learning technology, combined with conventional ultrasound and clinical indicators, to construct an efficient diagnostic model for newly developed nodules after breast cancer surgery. The study evaluates the application value of the machine learning model in distinguishing between benign and malignant newly developed nodules. In addition, it explores the diagnostic capabilities of different feature combinations, leveraging machine learning technology to identify more valuable diagnostic features.

2. Method

2.1. Data Collection

This study selected postoperative breast cancer patients who underwent routine ultrasound follow-up at the Beijing Friendship Hospital affiliated with Capital Medical University from January 2016 to April 2024. The subjects included patients who developed new local nodules in the surgical area and had obtained pathological results, totaling 137 cases. Inclusion criteria were newly developed local nodules after breast cancer surgery detected by ultrasound, clear pathological results for the new nodules, and informed consent from both the patients and their families. Exclusion criteria included poor quality of ultrasound images, absence of new nodules, presence of new nodules without pathological results, nodules not on the surgical side, new nodules in the surgical area accompanied by multiple organ metastases, incomplete clinical data, and severe diseases in other systems or major organs.

The color Doppler ultrasound diagnostic equipment used included Philips iu22/iUElite (Royal Philips, Netherlands), HI VISION Ascendus (Hitachi Aloka, Japan), and Mindray Resona R9 (Mindray Bio-Medical Electronics, Shenzhen), with probe frequencies of 5–14 MHz. The patients were examined in a supine or lateral position to fully expose the breast/chest wall, and a comprehensive scan of the breast area was performed. Ultrasound characteristics of the new nodules in the surgical area were recorded, including maximum diameter (cm), cystic-solid nature (purely solid, purely cystic, and cystic-solid mixed), echogenicity (hyperechoic, isoechoic, and hypoechoic), homogeneity (homogeneous/heterogeneous), margin (clear/unclear), shape (regular/irregular), posterior echo (enhanced/unchanged/attenuated), growth direction (horizontal/vertical), calcification (none/coarse/punctate), and blood flow signal (none/low/moderate/rich).

Blood flow signals were graded according to Alder’s classification: Grade 0 indicated no blood flow signal within the lesion, Grade I indicated a small amount of blood flow signal with 1-2 punctate or rod-like blood flow signals, Grade II indicated a moderate amount of blood flow signal with 3-4 punctate or 1 longer vessel, and Grade III indicated abundant blood flow signals with ≥ 5 punctate or ≥ 2 longer vessels. Clinical data recorded for the enrolled subjects included gender, age at the time of new nodule development, preoperative pathological stage of breast cancer, type of surgery (total mastectomy/partial mastectomy), follow-up time at the appearance of new nodules (months), tumor markers at the time of nodule appearance, and pathological results of the new nodules.

2.2. Data Preprocessing, Model Construction, and Testing

To ensure data completeness and consistency, the following preprocessing steps were performed. First, the dates of breast cancer surgery and the appearance of new nodules were converted to a standard string format. Missing values were then filled in with “None” as the default value to avoid the impact of missing data on the analysis. In addition, all numerical columns were converted to string format to prevent the presence of nonnumeric characters in the data. The preprocessed data ensured high-quality input for subsequent modeling and analysis.

To evaluate the performance of different machine learning models in diagnosing newly developed nodules after breast cancer surgery, the study selected several commonly used machine learning models for comparison: random forest, support vector machine (SVM), gradient boosting, AdaBoost, and XGBoost [21–24]. These models were trained and tested on standardized data. Feature data were standardized using StandardScaler to improve model training performance and stability. The models were evaluated using stratified K-fold cross-validation [25], and the following evaluation metrics were computed: accuracy (ACC), area under the receiver operating characteristic curve (AUC), sensitivity (SEN), and specificity (SPE). For each cross-validation iteration, the model’s prediction results were recorded, and confusion matrices were calculated to obtain ACC, SEN, and SPE. The receiver operating characteristic (ROC) curve and its AUC were used to assess the model’s classification performance. Multiple cross-validations [26] for each model and feature combination provided average model performance to ensure the stability and reliability of the results.

2.3. Ablation Study

To further explore the impact of different feature combinations on the performance of diagnostic models, this study conducted an ablation study to assess the diagnostic efficacy of the following feature combinations: Clinical features, Ultrasound features, Immunohistochemistry features, Surgical features, Clinical features + Ultrasound features, Clinical features + Immunohistochemistry features, Clinical features + Surgical features, Ultrasound features + Immunohistochemistry features, Ultrasound features + Surgical features, Immunohistochemistry features + Surgical features, Clinical features + Ultrasound features + Immunohistochemistry features, Clinical features + Ultrasound features + Surgical features, Clinical features + Immunohistochemistry features + Surgical features, Ultrasound features + Immunohistochemistry features + Surgical features, and Clinical features + Ultrasound features + Immunohistochemistry features + Surgical features.

SVM was used as the core model to build and evaluate these feature combinations, and the performance metrics of ACC, AUC, SEN, and SPE were computed for each combination.

3. Result

3.1. Patient Information

The patient information is shown in Table 1.

Table 1. Comparison of ultrasound characteristics and clinical indicators between pathologically benign and malignant groups.

Clinical and ultrasonographic Indicators (n = 137)	Pathologically benign nodule (93)	Pathologic malignant nodules (44)	Statistic	p
Clinical indicators
1. Age (means ± standard deviation)	59.95 ± 10.72	57.95 ± 13.79	t	0.401
2. Sex (m/f)	0/93	1/43	χ²	0.701
3. Tumor-marker (normal/elevated)	77/16	33/11	χ²	0.284
4. Surgery			χ²	< 0.01
General cut	16 (17.2%)	26 (59.1%)
Partial cut	77 (82.8%)	18 (40.9%)
5. Preoperative staging(I/II/III/IV)	55/31/5/2	21/20/2/1	χ²	0.593
6. Follow-up time (months, median).	21	52	Mann–Whitney U	< 0.01
Ultrasound indicators
1. Maximum diameter line (cm, median)	1.4	1.6	Mann–Whitney U	0.027
2. Echoes			χ²	0.206
High	2a (2.2%)	1a (2.3%)
Equal	4a (4.3%)	0a (0.0%)
Low	87a (93.5%)	43a (97.7%)
3. Ingredient			χ²	1.000
cystic solid	9 (9.7%)	4 (9.1%)
Solid	84 (90.3%)	40 (90.9%)
4. Homogeneity			χ²	0.320
Homogenized	53 (57.0%)	29 (65.9%)
Inhomogeneous	40 (43.0%)	15 (34.1%)
5. Boundaries			χ²	0.831
Clear	27 (29.0%)	12 (27.3%)
Unclear	66 (71.0%)	32 (72.7%)
6. Size			χ²	0.127
Even	18 (19.4%)	4 (9.1%)
Uneven	75 (80.6%)	40 (90.9%)
7. Growth pattern			χ²	0.826
Horizontal	79 (84.9%)	38 (86.4%)
Vertical	14 (15.1%)	6 (13.6%)
8. Rear echo			χ²	< 0.01
Reinforce	10a (10.8%)	19a (43.2%)
Invariably	25a (26.9%)	17a (38.6%)
attenuation	58b (62.4%)	8b (18.2%)
9. Blood flow			χ²	< 0.01
None	71a (76.3%)	17a (38.6%)
A smidgen	10a,b (10.8%)	9a,b (20.5%)
Medium	10b (10.8%)	11b (25.0%)
Enrichment	2b (2.2%)	7b (15.9%)
10. Strong echo			χ²	0.007
None	61a (65.6%)	33a (75.0%)
Loud	12b (12.9%)	0b (0.0%)
Punctiform	20a,b (21.5%)	11a,b (25.0%)

Note: There is no difference between groups when the footnote a/b/(a, b) is the same, and there is a difference between groups when it is different.

3.2. Model Performance

The study evaluated the performance of various machine learning models in diagnosing newly developed nodules after breast cancer surgery, with the results summarized in Figure 1 and Table 2. The SVM model demonstrated the best performance, with an AUC of 0.8664, ACC of 0.8099, SEN of 0.565, and SPE of 0.9267. The gradient boosting model also performed relatively well, with an AUC of 0.8533, ACC of 0.7659, SEN of 0.61, and SPE of 0.8389. The random forest model had an AUC of 0.8224, ACC of 0.7934, SEN of 0.57, and SPE of 0.9033. The XGBoost model showed balanced performance, with an AUC of 0.8189, ACC of 0.7791, SEN of 0.64, and SPE of 0.8478. The AdaBoost model performed slightly worse, with an AUC of 0.7506, ACC of 0.7225, SEN of 0.58, and SPE of 0.7844.

Details are in the caption following the image — **Figure 1**
Open in figure viewer PowerPoint

ROC curves of different models.

Table 2. Different models’ performances.

Model	ACC	AUC	SEN	SPE
Random forest	0.793406593	0.822361111	0.57	0.903333333
Gradient boosting	0.765934066	0.853333333	0.61	0.838888889
AdaBoost	0.722527473	0.750555556	0.58	0.784444444
XGBoost	0.779120879	0.818888889	0.64	0.847777778
SVM	0.80989011	0.866388889	0.565	0.926666667

3.3. Ablation Study

The results of the ablation study are presented in Table 3 and Figure 2, showing the impact of different feature combinations on the performance of the logistic regression model. When using individual feature types—clinical features, ultrasound features, immunohistochemistry features, and surgical features—the model’s performance was relatively weak. Specifically, with clinical features alone, the ACC was 0.7005, AUC was 0.7347, SEN was 0.2, and SPE was 0.9378. Using immunohistochemistry features alone resulted in the lowest SEN, at only 0.065, though the SPE reached 0.9778.

Table 3. Performances of different models and feature sets.

Feature set	ACC	AUC	SEN	SPE
Clinical	0.700549451	0.734722222	0.2	0.937777778
Ultrasound	0.686813187	0.695555556	0.295	0.87
Immunohistochemistry	0.686263736	0.593194444	0.065	0.977777778
Surgery	0.751648352	0.732777778	0.6	0.828888889
Clinical + ultrasound	0.7	0.789444444	0.285	0.893333333
Clinical + immunohistochemistry	0.765384615	0.760555556	0.285	0.99
Clinical + surgery	0.744505495	0.764444444	0.57	0.827777778
Ultrasound + immunohistochemistry	0.700549451	0.753611111	0.2	0.935555556
Ultrasound + surgery	0.781318681	0.800555556	0.555	0.893333333
Immunohistochemistry + surgery	0.714285714	0.699027778	0.45	0.84
Clinical + ultrasound + immunohistochemistry	0.751098901	0.803611111	0.305	0.957777778
Clinical + ultrasound + surgery	0.774175824	0.847777778	0.5	0.904444444
Clinical + immunohistochemistry + surgery	0.77967033	0.804444444	0.54	0.892222222
Ultrasound + immunohistochemistry + surgery	0.773626374	0.809166667	0.44	0.935555556
Clinical + ultrasound + immunohistochemistry + surgery	0.80989011	0.866388889	0.565	0.926666667

Combining multiple feature sets as input significantly improved model performance. Combining clinical and ultrasound features yielded an ACC of 0.7, AUC of 0.7894, SEN of 0.285, and SPE of 0.8933. Combining clinical, ultrasound, and surgical features resulted in better performance, with an ACC of 0.7742, AUC of 0.8478, SEN of 0.5, and SPE of 0.9044. The model performed best when all feature types were combined, achieving an ACC of 0.8099, AUC of 0.8664, SEN of 0.565, and SPE of 0.9267.

4. Discussion

Breast cancer remains the most common malignant tumor among women worldwide, and accurate diagnosis of postoperative recurrent nodules is crucial for improving patient prognosis. This study aimed to enhance the diagnostic ACC of newly developed nodules on the surgical side after breast cancer surgery using machine learning techniques. The data we utilized can mirror the current state of breast cancer comprehensively. It encompasses new trends in risk factors, treatment responses, and recurrence patterns. For example, novel treatment modalities and their associated side-effects that might impact the manifestation of postoperative nodules are more likely to be incorporated into our dataset. Given that alterations in treatment paradigms can lead to diverse presentations of recurrent nodules, both in terms of clinical manifestations and underlying biological characteristics, having such data is of great significance.

The results indicate that the SVM model performed the best when combining all features, achieving an AUC of 0.8664. This demonstrates that multifeature fusion plays a significant role in diagnosing postoperative recurrent nodules in breast cancer. By integrating clinical, ultrasound, immunohistochemistry, and surgical features, the model can capture more detailed information related to recurrence, thus improving diagnostic SEN and SPE. This multifeature fusion approach provides comprehensive information coverage, reducing bias from single features and enhancing model stability and ACC.

In contrast, while gradient boosting and random forest models showed good performance with certain feature combinations, their overall performance was slightly inferior to that of the SVM model. This may be due to these models’ SEN to feature redundancy and correlations when handling high-dimensional data, leading to performance fluctuations. The AdaBoost model showed relatively balanced performance, but its overall effectiveness still fell short of the SVM.

From a theoretical perspective, the superior performance of SVM in diagnosing recurrent nodules in breast cancer can be attributed to specific reasons. First, SVM maximizes the margin between classes by finding the optimal hyperplane, which enhances the model’s ability to generalize to unseen data and handle complex high-dimensional data. Second, SVM uses kernel functions to map the original feature space to higher-dimensional spaces, effectively capturing nonlinear relationships, which is crucial for dealing with complex interactions in medical data. In additionally, SVM is robust to feature redundancy and noise, maintaining good performance even with a large number of features. In contrast, random forest and gradient boosting models, although robust to some extent, are more susceptible to feature redundancy and noise, leading to unstable performance. KNN models, on the other hand, are highly dependent on the local structure of the data and are sensitive to uneven data distribution, resulting in lower SEN. Overall, SVM demonstrates significant advantages in handling high-dimensional, multifeature fused data for breast cancer postoperative recurrent nodule diagnosis, providing higher diagnostic ACC and stability.

The outstanding performance of the SVM model, which integrates clinical, ultrasound, immunohistochemistry, and surgical features, further emphasizes the significance of these features in diagnosing breast cancer postoperative recurrent nodules. Clinical features provide a comprehensive understanding of a patient’s overall health and medical history. Ultrasound features can directly reflect the pathological characteristics of nodules. Immunohistochemistry features offer in-depth details about the biological behavior of the primary tumor, and surgical features supply crucial information regarding the surgical process and outcomes. The amalgamation of these diverse data sources enables a more accurate portrayal of the nodules, thus significantly enhancing the diagnostic ACC.

However, diagnosing postoperative breast nodules is an arduous task due to their complex nature. Postoperative changes such as scarring, inflammation, and tissue remodeling can closely imitate cancer recurrence. Benign conditions, for instance, fat necrosis, often exhibit similar imaging features to malignant nodules, frequently resulting in diagnostic errors. Clinicians currently face great challenges in accurately identifying recurrent nodules using existing diagnostic methods.

Adding to the complexity, breast cancer itself has highly variable biological behaviors across different subtypes. Some subtypes are more aggressive, making them relatively easier to detect, while others are more indolent and are extremely difficult to differentiate from normal tissue. This variability further exacerbates the challenge of achieving high SEN in the diagnosis of postoperative recurrent nodules.

In light of these challenges, the 0.565 SEN of our SVM model is in line with the difficulties commonly encountered in clinical practice. For the model to be immediately applicable in a clinical setting, an ideal SEN would be close to 1. Nevertheless, realistically, improving this metric necessitates a combination of refined diagnostic criteria, more advanced imaging techniques, and potentially new biomarkers. Despite the current SEN level, when the SVM model is used in conjunction with other clinical information and diagnostic tools, it can still play a substantial role in assisting clinicians in making well-informed decisions. It helps in shortlisting cases that require further investigation, thereby reducing unnecessary invasive procedures and enabling more targeted follow-up strategies.

Undoubtedly, machine learning models hold great promise in the diagnosis of breast cancer postoperative recurrent nodules. They can assist physicians in enhancing diagnostic ACC and reducing misdiagnosis rates. By improving diagnostic precision, these models contribute to the formulation of more effective treatment plans, ultimately improving patient prognosis. This not only increases survival rates but also elevates the quality of life by reducing unnecessary treatments and follow-up expenses. For example, accurately identifying malignant nodules can prevent treatment delays, and precisely detecting benign nodules can avoid unnecessary invasive examinations and surgeries.

5. Limitations and Future Work

This study has several limitations. First, it is a single-center research with a relatively small sample size, as the data are sourced from only one medical institution. This single-center characteristic severely restricts the generalizability of the model. Patient demographics, such as age, gender distribution, and underlying health conditions, can vary significantly across different centers. In the future, we plan to incorporate multicenter research to enhance the generalizability of our findings. This will allow us to collect a more diverse range of data, covering a broader spectrum of patient characteristics and healthcare settings, thus strengthening the validity and applicability of the model. Second, the sample size is relatively small, which may limit the generalizability of the model. Third, while the study incorporated multiple features, some potentially valuable factors, such as genetic information and patient lifestyle, were not included. These factors may have significant implications for recurrence risk. In addition, the study primarily employed traditional machine learning algorithms and did not fully utilize more advanced techniques like deep learning, which could limit the model’s optimal.

Future research should focus on expanding the sample size and including data from multiple centers to enhance the model’s generalizability and applicability. Multicenter datasets will allow the model to better accommodate different populations and medical practices, improving its widespread clinical utility and reliability. Moreover, exploring and integrating additional features such as genetic information, detailed medical history, and lifestyle factors could provide a more comprehensive diagnostic basis. This would help capture more potential feature associations, further enhancing diagnostic performance. In addition, future studies should incorporate advanced machine learning algorithms, such as deep learning, which excel in handling complex and high-dimensional data and can detect subtle features that traditional methods might miss. These improvements could extend the application of machine learning technology in clinical diagnosis of postoperative recurrent nodules in breast cancer, ultimately improving patient treatment outcomes and prognosis.

6. Conclusion

This study utilized machine learning techniques to enhance the diagnostic ACC of newly developed nodules after breast cancer surgery. The results indicated that a multifeature fusion logistic regression model performed the best, significantly improving diagnostic SEN and SPE. The SVM model demonstrated considerable potential in diagnosis, assisting physicians in developing more effective treatment plans and improving patient prognosis. Despite limitations such as small sample size and single-center data, future research should aim to increase sample size, incorporate multicenter data, and explore additional features and advanced algorithms to further improve diagnostic performance.

Ethics Statement

The study was approved by our institutional review board. Informed consent was waived due to the retrospective and anonymous nature of our study. The study received approval from our institutional review board. Due to its retrospective and anonymous design, the requirement for informed consent was waived.

Consent

Before submitting our manuscript, we obtained consent to publish from all study participants. This consent covers the publication of anonymized data and any other potentially identifiable information included in the manuscript. As authors, we understand our responsibility to ensure this consent, and we have taken all necessary measures to protect the privacy and rights of the participants. We have also informed all participants about the nature of the publication and the potential use of the data in the public domain.

Conflicts of Interest

The authors declare no conflicts of interest.

Author Contributions

Zhixiang Wang and Qingqing Li are co-first authors.

Funding

The authors have nothing to report.

Open Research

Data Availability Statement

The datasets generated and/or analyzed during this study are not publicly available due to privacy and ethical considerations but can be obtained from the corresponding author upon reasonable request. All requests for data access will be reviewed by the author to ensure appropriate use. For more information on the data and how to request access, please contact the corresponding author.

References

1 Sung H., Ferlay J., Siegel R. L. et al., Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries, CA: A Cancer Journal for Clinicians. (2021) 71, no. 3, 209–249, https://doi.org/10.3322/caac.21660.
10.3322/caac.21660
PubMed Web of Science® Google Scholar
2 Lei S., Zheng R., Zhang S. et al., Breast Cancer Incidence and Mortality in Women in China: Temporal Trends and Projections to 2030, Cancer Biology and Medicine. (2021) 18, no. 3, 900–909, https://doi.org/10.20892/j.issn.2095-3941.2020.0523.
10.20892/j.issn.2095-3941.2020.0523
PubMed Web of Science® Google Scholar
3 Tung N. M., Boughey J. C., Pierce L. J. et al., Management of Hereditary Breast Cancer: American Society of Clinical Oncology, American Society for Radiation Oncology, and Society of Surgical Oncology Guideline, Journal of Clinical Oncology. (2020) 38, no. 18, 2080–2106, https://doi.org/10.1200/jco.20.00299.
10.1200/JCO.20.00299
CAS PubMed Web of Science® Google Scholar
4 Milam E. C., Rangel L. K., and Pomeranz M. K., Dermatologic Sequelae of Breast Cancer: From Disease, Surgery, and Radiation, International Journal of Dermatology. (2021) 60, no. 4, 394–406, https://doi.org/10.1111/ijd.15303.
10.1111/ijd.15303
PubMed Web of Science® Google Scholar
5 Zheng J., Cai S., Song H. et al., Prediction of Postoperative Hematoma Occurrence after Ultrasound-Guided Vacuum-Assisted Breast Biopsy in Minimally Invasive Surgery for Percutaneous Removal of Benign Breast Lesions, Gland Surgery. (2020) 9, no. 5, 1346–1353, https://doi.org/10.21037/gs-20-344.
10.21037/gs-20-344
PubMed Web of Science® Google Scholar
6 Early A. P. and Moon W., Breast Cancer and Secondary Cancer Recurrences after Autologous Tissue Reconstruction, Clinical Breast Cancer. (2021) 21, no. 1, e96–e101, https://doi.org/10.1016/j.clbc.2020.07.015.
10.1016/j.clbc.2020.07.015
PubMed Web of Science® Google Scholar
7 Zhang B., Shi H., and Wang H., Machine Learning and AI in Cancer Prognosis, Prediction, and Treatment Selection: a Critical Approach, Journal of Multidisciplinary Healthcare. (2023) 16, 1779–1791, https://doi.org/10.2147/jmdh.s410301.
10.2147/JMDH.S410301
PubMed Web of Science® Google Scholar
8 Afrin H., Larson N. B., Fatemi M., and Alizad A., Deep Learning in Different Ultrasound Methods for Breast Cancer, from Diagnosis to Prognosis: Current Trends, Challenges, and an Analysis, Cancers. (2023) 15, no. 12, https://doi.org/10.3390/cancers15123139.
10.3390/cancers15123139
Web of Science® Google Scholar
9 Wang Q., Li B., Liu Z. et al., Prediction Model of Axillary Lymph Node Status Using Automated Breast Ultrasound (ABUS) and Ki-67 Status in Early-Stage Breast Cancer, BMC Cancer. (2022) 22, no. 1, https://doi.org/10.1186/s12885-022-10034-3.
10.1186/s12885-022-10034-3
Web of Science® Google Scholar
10 Lee J., Kang B. J., and Kim S. H., Usefulness of Postoperative Surveillance MR for Women after Breast-Conservation Therapy: Focusing on MR Features of Early and Late Recurrent Breast Cancer, PLoS One. (2021) 16, no. 6, https://doi.org/10.1371/journal.pone.0252476.
10.1371/journal.pone.0252476
Web of Science® Google Scholar
11 Lee J., Kang B. J., Park G. E., and Kim S. H., The Usefulness of Magnetic Resonance Imaging (MRI) for the Detection of Local Recurrence after Mastectomy with Reconstructive Surgery in Breast Cancer Patients, Diagnostics. (2022) 12, no. 9, https://doi.org/10.3390/diagnostics12092203.
10.3390/diagnostics12092203
Web of Science® Google Scholar
12 Dan Q., Zheng T., Liu L., Sun D., and Chen Y., Ultrasound for Breast Cancer Screening in Resource-Limited Settings: Current Practice and Future Directions, Cancers. (2023) 15, no. 7, https://doi.org/10.3390/cancers15072112.
10.3390/cancers15072112
PubMed Web of Science® Google Scholar
13 Sahu A., Das P. K., and Meher S., An Efficient Deep Learning Scheme to Detect Breast Cancer Using Mammogram and Ultrasound Breast Images, Biomedical Signal Processing and Control. (2024) 87, https://doi.org/10.1016/j.bspc.2023.105377.
10.1016/j.bspc.2023.105377
Web of Science® Google Scholar
14 Singhal A., Kataria B., and Sharma S., Imaging of Tubercular Mastitis Presenting as Recurrent Breast Nodules and Abscesses: a Rare Case Report, Egyptian Journal of Radiology and Nuclear Medicine. (2023) 54, no. 1, https://doi.org/10.1186/s43055-023-00954-w.
10.1186/s43055-023-00954-w
Web of Science® Google Scholar
15 Al-Khalili R., Alzeer A., Nguyen G.-K. et al., Palpable Lumps after Mastectomy: Radiologic-Pathologic Review of Benign and Malignant Masses, RadioGraphics. (2021) 41, no. 4, 967–989, https://doi.org/10.1148/rg.2021200161.
10.1148/rg.2021200161
PubMed Web of Science® Google Scholar
16 Rana M. and Bhushan M., Machine Learning and Deep Learning Approach for Medical Image Analysis: Diagnosis to Detection, Multimedia Tools and Applications. (2023) 82, no. 17, 26731–26769, https://doi.org/10.1007/s11042-022-14305-w.
10.1007/s11042-022-14305-w
Web of Science® Google Scholar
17 Jiang X., Hu Z., Wang S., and Zhang Y., Deep Learning for Medical Image-Based Cancer Diagnosis, Cancers. (2023) 15, no. 14, https://doi.org/10.3390/cancers15143608.
10.3390/cancers15143608
Web of Science® Google Scholar
18 Richens J. G., Lee C. M., and Johri S., Improving the Accuracy of Medical Diagnosis with Causal Machine Learning, Nature Communications. (2020) 11, no. 1, https://doi.org/10.1038/s41467-020-17419-7.
10.1038/s41467-020-17419-7
Web of Science® Google Scholar
19 Bhuiyan M. S., Chowdhury I. K., Haider M. et al., Advancements in Early Detection of Lung Cancer in Public Health: a Comprehensive Study Utilizing Machine Learning Algorithms and Predictive Models, Journal of Computer Science and Technology Studies. (2024) 6, no. 1, 113–121, https://doi.org/10.32996/jcsts.2024.6.1.12.
10.32996/jcsts.2024.6.1.12
Google Scholar
20 Taha A. M., Ariffin D., and Abu-Naser S. S., A Systematic Literature Review of Deep and Machine Learning Algorithms in Brain Tumor and Meta-Analysis, Journal of Theoretical and Applied Information Technology. (2023) 101, no. 1, 21–36.
Google Scholar
21 Sharifani K. and Amini M., Machine Learning and Deep Learning: A Review of Methods and Applications, World Information Technology and Engineering Journal. (2023) 10, no. 07, 3897–3904.
Google Scholar
22 Haug C. J. and Drazen J. M., Artificial Intelligence and Machine Learning in Clinical Medicine, New England Journal of Medicine. (2023) 388, no. 13, 1201–1208, https://doi.org/10.1056/nejmra2302038.
10.1056/NEJMra2302038
CAS PubMed Web of Science® Google Scholar
23 Ding Y., Sun Y., Liu C., Jiang Q. Y., Chen F., and Cao Y., SeRS-Based Biosensors Combined with Machine Learning for Medical Application, ChemistryOpen. (2023) 12, no. 1, https://doi.org/10.1002/open.202200192.
10.1002/open.202200192
Web of Science® Google Scholar
24 Naeem S., Ali A., Anam S., and Ahmed M. M., An Unsupervised Machine Learning Algorithms: Comprehensive Review, International Journal of Computing and Digital Systems. (2023) 13, no. 1, 911–921, https://doi.org/10.12785/ijcds/130172.
10.12785/ijcds/130172
Google Scholar
25 Zhang X. and Liu C.-A., Model Averaging Prediction by K-fold Cross-Validation, Journal of Econometrics. (2023) 235, no. 1, 280–301, https://doi.org/10.1016/j.jeconom.2022.04.007.
10.1016/j.jeconom.2022.04.007
Web of Science® Google Scholar
26 Valente G., Castellanos A. L., Hausfeld L., De Martino F., and Formisano E., Cross-validation and Permutations in MVPA: Validity of Permutation Strategies and Power of Cross-Validation Schemes, NeuroImage. (2021) 238, https://doi.org/10.1016/j.neuroimage.2021.118145.
10.1016/j.neuroimage.2021.118145
PubMed Web of Science® Google Scholar
27 Zheng X., Yao Z., Huang Y. et al., Deep Learning Radiomics can Predict Axillary Lymph Node Status in Early-Stage Breast Cancer, Nature Communications. (2020) 11, no. 1, https://doi.org/10.1038/s41467-020-15027-z.
10.1038/s41467-020-15027-z
Web of Science® Google Scholar
28 Zhou J., Chen X., and Zhan W., Machine Learning-Based Breast Tumor Ultrasound Radiomics for Pre-operative Prediction of Axillary Sentinel Lymph Node Metastasis Burden in Early-Stage Invasive Breast Cancer, Ultrasound in Medicine and Biology. (2024) 50, no. 2, 229–236, https://doi.org/10.1016/j.ultrasmedbio.2023.10.004.
10.1016/j.ultrasmedbio.2023.10.004
PubMed Web of Science® Google Scholar

All articles

Diagnosis of Benign and Malignant Newly Developed Nodules on the Surgical Side After Breast Cancer Surgery Based on Machine Learning

Abstract

1. Introduction