International Journal of Imaging Systems and Technology

Volume 32, Issue 5 pp. 1577-1587

RESEARCH ARTICLE

Open Access

Machine learning based on automated breast volume scanner (ABVS) radiomics for differential diagnosis of benign and malignant BI-RADS 4 lesions

Shi-jie Wang,

Shi-jie Wang

Department of Medical Ultrasonics, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, China

Search for more papers by this author

Hua-qing Liu,

Hua-qing Liu

Artificial Intelligence Innovation Center, Research Institute of Tsinghua, Guangzhou, China

Search for more papers by this author

Tao Yang,

Tao Yang

Department of Ultrasound, The Affiliated Hospital of Southwest Medical University, Sichuan, China

Search for more papers by this author

Ming-quan Huang,

Ming-quan Huang

Department of Breast Surgery, The Affiliated Hospital of Southwest Medical University, Sichuan, China

Search for more papers by this author

Bo-wen Zheng,

Bo-wen Zheng

Department of Medical Ultrasonics, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, China

Search for more papers by this author

Tao Wu,

Tao Wu

Department of Medical Ultrasonics, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, China

Search for more papers by this author

Lan-qing Han,

Lan-qing Han

Artificial Intelligence Innovation Center, Research Institute of Tsinghua, Guangzhou, China

Search for more papers by this author

Yong Zhang,

Yong Zhang

Department of Nuclear Medicine, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, China

Search for more papers by this author

Jie Ren,

Corresponding Author

Jie Ren

[email protected]

orcid.org/0000-0003-2599-9001

Department of Medical Ultrasonics, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, China

Correspondence

Jie Ren, Department of Medical Ultrasonics, The Third Affiliated Hospital of Sun Yat-sen University, 600 Tianhe Road, Guangzhou 510630, China.

Email: [email protected]

Search for more papers by this author

Shi-jie Wang,

Shi-jie Wang

Department of Medical Ultrasonics, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, China

Search for more papers by this author

Hua-qing Liu,

Hua-qing Liu

Artificial Intelligence Innovation Center, Research Institute of Tsinghua, Guangzhou, China

Search for more papers by this author

Tao Yang,

Tao Yang

Department of Ultrasound, The Affiliated Hospital of Southwest Medical University, Sichuan, China

Search for more papers by this author

Ming-quan Huang,

Ming-quan Huang

Department of Breast Surgery, The Affiliated Hospital of Southwest Medical University, Sichuan, China

Search for more papers by this author

Bo-wen Zheng,

Bo-wen Zheng

Department of Medical Ultrasonics, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, China

Search for more papers by this author

Tao Wu,

Tao Wu

Department of Medical Ultrasonics, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, China

Search for more papers by this author

Lan-qing Han,

Lan-qing Han

Artificial Intelligence Innovation Center, Research Institute of Tsinghua, Guangzhou, China

Search for more papers by this author

Yong Zhang,

Yong Zhang

Department of Nuclear Medicine, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, China

Search for more papers by this author

Jie Ren,

Corresponding Author

Jie Ren

[email protected]

orcid.org/0000-0003-2599-9001

Department of Medical Ultrasonics, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, China

Correspondence

Jie Ren, Department of Medical Ultrasonics, The Third Affiliated Hospital of Sun Yat-sen University, 600 Tianhe Road, Guangzhou 510630, China.

Email: [email protected]

Search for more papers by this author

First published: 08 March 2022

https://doi.org/10.1002/ima.22724

Citations: 1

Shi-jie Wang and Hua-qing Liu contributed equally to this work.

Funding information: Guangdong Province Key Field Research and Development Project, Grant/Award Number: 2018B030332001; the Sun Yat-sen University 5010 Program Cultivation Project, Grant/Award Number: 2016016

Share a link

Email
Wechat
Bluesky

Abstract

BI-RADS category 4 represents possibly malignant lesions and biopsy is recommended to distinguish benign and malignant. However, studies revealed that up to 67%–78% of BI-RADS 4 lesions proved to be benign, but received unnecessary biopsies, which may cause unnecessary anxiety and discomfort to patients and increase the burden on the healthcare system. In this prospective study, machine learning (ML) based on the emerging breast ultrasound technology-automated breast volume scanner (ABVS) was constructed to distinguish benign and malignant BI-RADS 4 lesions and compared with different experienced radiologists. A total of 223 pathologically confirmed BI-RADS 4 lesions were recruited and divided into training and testing cohorts. Radiomics features were extracted from axial, sagittal, and coronal ABVS images for each lesion. Seven feature selection methods and 13 ML algorithms were used to construct different ML pipelines, of which the DNN-RFE (combination of recursive feature elimination and deep neural networks) had the best performance in both training and testing cohorts. The AUC value of the DNN-RFE was significantly higher than less experienced radiologist at Delong's test (0.954 vs. 0.776, p = 0.004). Additionally, the accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) of the DNN-RFE were 88.9%, 83.3%, 95.2%, 83.3%, and 95.2%, which also significantly better than less experienced radiologist at McNemar's test (p = 0.043). Therefore, ML based on ABVS radiomics may be a potential method to non-invasively distinguish benign and malignant BI-RADS 4 lesions.

1 INTRODUCTION

With the increasing development of ultrasound (US) in routine breast examination,¹ the American College of Radiology (ACR) updated the Breast Imaging Reporting and Data System (BI-RADS) US lexicon in 2013 to standardized descriptions of lesions and reports.² The breast lesions detected by US can be classified into seven categories (categories 0–6) with the ACR BI-RADS. Among them, BI-RADS US category 4 (herein referred to as BI-RADS 4) lesions represent suspicious lesions with a likelihood of malignancy from 2% to 95%, and biopsy is recommended for this category to confirm the pathological properties.³ However, previous studies revealed that up to 67%–78% of BI-RADS 4 lesions are confirmed as benign,^4-6 but received unnecessary biopsies, which may cause unnecessary anxiety and discomfort to patients and increase the burden on the health care system.⁷ Therefore, improving the assessment of BI-RADS 4 lesions and avoiding unnecessary biopsies is a clinical problem that needs to be resolved.

The automated breast volume scanner (ABVS) is an emerging US technology that automatically scans the breast based on a special high-frequency broadband transducer.⁸ It not only overcomes the limitations of operator dependency and lack of reproducibility in the conventional US but also provides three-dimensional representation of breast tissue and allows image reformatting in three planes (axial, sagittal, and coronal plane).^{9, 10} Recently, several studies have shown that some unique features of ABVS may provide additional information for distinguishing benign and malignant breast lesions.¹¹ Specifically, the retraction phenomenon is manifested as a stellate pattern around the lesion, which has high sensitivity (80%–89%) and specificity (96%–100%) for breast cancer.^12-14 However, visual assessment of these image features of ABVS highly depends on the experience of radiologists and lacks agreement among readers.¹⁵ Therefore, it is still difficult to distinguish between benign and malignant BI-RADS 4 lesions by visual interpretation of ABVS images.

Radiomics is an imaging processing and analysis technique that enables the conversion of routine medical images into quantitative data and the subsequent mining of high-dimensional data, which may reflect both macroscopic and pathophysiological characteristics of tissues.^{16, 17} Radiomics is usually combined with machine learning (ML) methods (such as support vector machine (SVM), random forest (RF), deep neural networks (DNN), etc.) to select features and build decision support models. This strategy has been proven useful for analyzing breast magnetic resonance imaging (MRI) with impressive effectiveness.^{18, 19} Its application in ABVS is still rarely reported,^{20, 21} although ABVS images also have the standardized, reproducible, and high-resolution characteristics similar to MRI images.²² Additionally, previous studies have focused on the identification of benign and malignant breast lesions (including BI-RADS 2–5 lesions), and were limited by the relatively small number of patients.^{20, 21} Whether or how the radiomics method based on ABVS can be used to distinguish between benign and malignant BI-RADS 4 lesions has not been explored.

Therefore, the purpose of this study was to investigate and explore the possibility of using the ML method based on ABVS radiomics to improve the assessment and differential diagnosis of BI-RADS 4 lesions.

2 METHODS AND MATERIALS

2.1 Patients enrollments

This prospective study was approved by the Institutional Review Board of our hospital (KY2020163), and written informed consent was obtained from all participants. Between April 1 and August 31, 2020, consecutive women with BI-RADS 4 lesions detected by the US were invited to participate in the study. Further inclusion and exclusion criteria were as follows.

The inclusion criteria were as follows: (1) each BI-RADS 4 lesion was confirmed by a senior radiologist (with 7 years of breast US experience), and ultimately assigned to a subcategory (4A, 4B, or 4C) according to the second edition of the ACR BI-RADS US atlas; (2) patients who underwent US-guided core needle biopsy (CNB) and; (3) the ABVS examination were performed within 1 week before the biopsy. The exclusion criteria were as follows: (1) women who were not suitable for ABVS, such as pregnancy, breastfeeding, or breast with implants; (2) poor image quality of ABVS images; and (3) absence of a definitive pathological diagnosis.

2.2 ABVS images and clinicopathological information acquisition

The ABVS examinations were performed using the ACUSON S2000 Automated Breast Volume Scanner (Siemens Medical Solutions, Inc.) by one of two well-trained technologists (minimum previous training on ABVS of 6 months). More details on ABVS examination refer to Kim et al.⁸ After the examination, axial ABVS images were sent to a dedicated workstation, where the sagittal and coronal images were reconstructed automatically. A radiologist with 8 years of experience in US-based breast diagnosis and 1 year of experience in ABVS imaging selected ABVS images that showed the maximum size of the target breast lesion on axial, sagittal, and coronal planes. These ABVS images were exported in DICOM format, and all annotations and marks were removed for further radiomics analysis.

US-guided coarse needle biopsy was performed by experienced US interventional doctors. According to the standard biopsy procedure, four to eight samples per lesion were acquired using an automatic biopsy gun with a 14G or 16G needle.²³ The final pathological diagnoses were divided into benign and malignant, in which malignancy was defined as infiltrating carcinoma or ductal carcinoma in situ, and all other diagnoses were considered benign. Information about age, menopausal status, and family history of breast cancer was obtained directly from the patients. Breast density was assessed on digital mammography and classified into categories A-D according to BI-RADS classification. Lesion size was measured as the largest diameter found on the coronal plane of ABVS.

2.3 Lesion segmentation and radiomics feature extraction

Breast lesions were manually segmented using free open-source software (MaZda, version 4.6, www.eletel.p.lodz.pl/mazad/) by one investigator (with 7 years of experience in breast US) who was blinded to the pathology of the breast lesions. The ABVS images were normalized using Mazda's built-in image normalization method before segmentation to minimize the influence of contrast and brightness variation. The region of interest (ROI) covered the whole lesion and adjacent tissues within 1–2 mm from the lesion margin. Seven common feature groups were automatically extracted with Mazda software (Table 1).

TABLE 1. List of radiomics feature classes with descriptions from the MaZda software

Radiomics features	Description
Histogram	Mean, variance, skewness, kurtosis, and 1st, 10th, 50th, 90th, and 99th percentiles
Geometry	Descriptors of the two-dimensional size and shape of the ROI
Absolute gradient	Mean, variance, skewness, kurtosis, and percentage of pixels with nonzero gradient
GLCM	Angular second moment, contrast, correlation, sum of squares, inverse difference moment, sum average, sum variance, sum entropy, entropy, difference variance, and difference entropy; parameters are computed up to 20 times for (d, 0), (0, d), (d, d), (d, −d), and the d can take values of 1, 2, 3, 4, and 5
RLM	Run-length nonuniformity, gray-level nonuniformity, long-run emphasis, short-run emphasis, and fraction of image in runs; parameters are computed 4 times for horizontal, 45°, vertical, and 135° directions
AM	Assumes a local interaction between image pixels in that pixel intensity is a weighted sum of neighborhood pixel intensities and has 5 unknown model parameters – the standard deviation of the driving noise e_s and the model parameter vector θ = [θ1, θ2, θ3, θ4]
Wavelet transform	The discrete wavelet transform is a linear transformation that operates on a data vector whose length is an integer power of two, transforming it into a numerically different vector of the same length

Note: e_s denotes an independent and identically distributed noise; θ is a vector of model parameters.
Abbreviations: AM, autoregressive model; GLCM, gray-level co-occurrence matrix; RLM, run-length matrix; ROI, region of interest.

To determine the intra- and interobserver reproducibility of radiomics features extraction, the intra- and interclass correlation coefficients (ICC) were calculated. Thirty BI-RADS 4 lesions were randomly selected for ROI segmentation by two radiologists (R1 and R2, with 3 and 6 years of experience in breast US, respectively) to evaluate the inter-observer ICC. Two weeks later, R1 repeated the ROI segmentation to evaluate the intra-observer ICC. An ICC greater than 0.80 was considered a satisfactory agreement.²⁴

2.4 Machine learning based on ABVS radiomics

The flow chart of the study is outlined in Figure 1. All BI-RADS 4 lesions enrolled in the study were divided into two cohorts, the training cohort (80% of cases) and the testing cohort (20% of cases). To maintain a consistent percentage of malignant BI-RADS 4 lesions between the training and testing cohorts, stratified sampling was used to match the two cohorts.

Details are in the caption following the image — **FIGURE 1**
Open in figure viewer PowerPoint

The flow chart of the machine learning (ML) based on ABVS radiomics. (I) In the training cohort, seven feature selection methods, and 13 ML algorithms are paired into different ML pipelines, and 6 ML pipelines with the best performance were selected. (III) The performance of the selected ML was tested in testing cohort and compared with different experienced radiologists. 5-CV = 5-cross validation; ABVS, automated breast volume scanner; ML, machine learning

Seven feature selection methods [mutual information and maximal information coefficient (MIC), random forest (RF), recursive feature elimination (RFE), minimum redundancy maximum relevance (mRMR), linear support vector classification (LSVC), logistic regression (LR), and embedding tree] and 13 ML algorithms [logistic regression (LR), support vector machine (SVM), decision tree (DT), K-nearest neighbor (KNN), extra tree (ET), random forest (RF), Gaussian naive Bayesian (Gaussian NB), linear discriminant analysis (LDA), gradient boost (GB), adaptive boosting (AdaBoost), multilayer perception (MLP), deep neural networks (DNN), and Bagging] are combined in pairs to construct different ML pipelines.

2.5 Training and testing of machine learning models

In the training cohort, the performance of each ML pipeline was evaluated based on three times fivefold cross-validation (5-CV) results using area under the receiver operating characteristic (ROC) curve (AUC) analysis. For each 5-CV process, the dataset of the training cohort was randomly divided into five folds with approximately equal sample size, in which four folds were chosen to develop the ML and the remaining fold was used to calculate the model performance metrics. After five iterations, each fold was used as the validation set exactly once. To select the best configuration of hyperparameters for each ML pipeline, we performed a grid search method and 5-CV in the training cohort for hyperparameter tuning. The hyperparameters of each ML model used in this study are shown in Table 2. To avoid overfitting, the top 6 ML pipelines in the training cohort were selected for further verification in the testing cohort.

TABLE 2. The range of hyper-parameters tuning for grid search

Methods	Hyper-parameters	Range
LR	Class-weight	[‘balanced’, none]
LDA	None	None
SVM	None	None
DT	Max-depth	[5, 10, 20]
	Min-samples-leaf	[2, 4, 8, 16]
	Min-samples-split	[2, 4, 8, 16]
	Class-weight	[“balanced,” none]
KNN	Weightp	[‘uniform’, ‘distance’][1, 2]
ET	n-estimators	[10, 100, 1000]
RF	Max-depth	Uniform (loc = 5, scale = 10)
	Min-samples-leaf	Uniform (loc = 0, scale = 0.1)
	Min-samples-split	Uniform (loc = 0, scale = 0.1)
	n-estimators	[10, 20, 35, 50]
Gaussian NB	None	None
GB	n-estimators	[5, 10, 100]
GB	Max-depth	Uniform (loc = 2, scale = 10)
AdaBoost	Base-estimator	[Logistic, SVM, Gaussian NB, DT-clf-best]
AdaBoost	n-estimators	[10, 30, 100]
MLP	Activation	[“identity,” “logistic,” “relu”]
	Learning-rate	[“constant,” “invscaling,” “adaptive”]
	Hidden-layer-sizes	[(32,32), (64,64), (128128)]
DNN	None	None
Bagging	Base-estimator	[Logistic, SVM, Gaussian NB, DT-clf-best]
Bagging	n-estimators	[10, 100, 1000]

Abbreviations: AdaBoost, adaptive boosting; clf, classifier; DNN, deep neural networks; DT, decision tree; ET, extra tree; Gaussian NB, Gaussian Naive Bayesian; GB, gradient boost; invscaling, inverse scaling; KNN, K-nearest neighbor; LDA, linear discriminant analysis; LR, logistic regression; MLP, multi-layer perception; relu, rectifie linear units; RF, random forest; SVM, support vector machine.

In the testing cohort, the output for the ML was the malignant probability of BI-RADS 4 lesions (ranging from 0% to 100%), and the performance metrics of the ML were calculated, including AUC, sensitivity, specificity, accuracy, negative predictive value (NPV), and positive predictive value (PPV). To compare the diagnostic performance of ML and radiologists, three radiologists (R1, R2, and R3, with 3, 6, and 10 years of experience in breast ultrasound), independently evaluated the malignant probability of each BI-RADS 4 lesion, on a scale of 2%–95%, according to their “best guess” after observing the axial, sagittal, and coronal images of ABVS.²⁵ The malignancy probability scale of BI-RADS4 lesions refers to the second edition of ACR BI-RADS US.

2.6 Statistical analysis

Statistical analyses were performed using IBM SPSS Statistics software (version 24.0, SPSS Inc.) and Python programming software (version 3.6.8; https://www.python.org/). Differences in characteristics between the training and testing cohorts and between the benign and malignant groups in each of these cohorts were analyzed using the SPSS software. The differences in age and tumor size were assessed using independent sample t-test, and the Chi-square test was used to evaluate the differences in breast density, subcategory of BI-RADS 4 and family history of breast cancer. The “Keras” package (version 2.4.3) was used for DNN modeling, and the “sklearn” package (version 0.23.1) was used for other ML modeling and feature selection. The feature importance ranking was used the “eli5” package (version 0.10.1). ROC curve analysis was performed to determine the performance of the ML pipelines and radiologists, and accuracy, sensitivity, specificity NPV, PPV, and AUC were calculated by “sklearn” package. The different AUCs were compared by using Delong's test, and McNemar's test was performed to assess differences in the performance between ML and radiologists. A two-sided p value less than 0.05 was considered to indicate statistical significance.

3 RESULTS

3.1 Clinicopathologic characteristics of breast lesions

A total of 223 BI-RADS 4 lesions from 193 patients (mean age, 49.4 ± 12.3 years; range 25 to 79 years) were enrolled in this study, of which 103 were malignant and 120 were benign. The average size of the lesions was 19.5 ± 10.2 mm (range, 5–59 mm). The subcategories of BI-RADS 4 lesions were as follows: BI-RADS 4A (n = 104, 46.6%), BI-RADS 4B (n = 43, 19.3%), and BI-RADS 4C (n = 76, 34.1%). The malignancy rates of category 4A, 4B, and 4C were 11.5% (12/104), 44.2% (19/43), and 94.7% (72/76), respectively. Finally, 178 lesions (97 benign and 81 malignant) were divided into training cohorts and 45 lesions (23 benign and 22 malignant) were divided into the testing cohort. There were no significant differences between the two cohorts in age, lesion size, location of lesions, the subcategory of BI-RADS 4, breast density, or family history of breast cancer (p = 0.229, 0.549, 0.992, 0.593, 0.878, 0.153). We also investigated the above information between malignant and benign lesions in the two cohorts, respectively (Table 3).

TABLE 3. Clinical basic characteristics in training and testing cohorts for benign and malignant lesions

	Training cohort (n = 178)			Testing cohort (n = 45)
	Malignant (n = 81)	Benign (n = 97)	p	Malignant (n = 22)	Benign (n = 23)	p
Age (year)	54.6 ± 12.5	44.1 ± 9.9	<0.001	57.5 ± 9.4	45.4 ± 11.7	<0.001
Lesion size (cm)	2.3 ± 0.9	1.6 ± 0.9	<0.001	2.5 ± 1.2	1.6 ± 0.7	0.005
Subcategory of BI-RADS 4
4A	11 (13.6%)	73 (75.3%)	<0.001	1 (4.6%)	19 (82.6%)	<0.001
4B	16 (19.8%)	20 (20.6%)		3 (13.6%)	4 (17.4 %)
4C	54 (66.6%)	4 (4.1%)		18 (81.8%)	0 (0.0%)
Breast density
A	11 (13.6%)	4 (4.1%)	<0.001	3 (13.6%)	0 (0.0%)	0.319
B	38 (46.9%)	26 (26.8%)		8 (36.4%)	9 (39.2%)
C	26 (32.1%)	42 (43.3%)		8 (36.4%)	11 (47.8%)
D	6 (7.4%)	25 (25.8%)		3 (13.6%)	3 (13.0%)
Menopausal
Pre-	33 (40.7%)	73 (75.3%)	<0.001	5 (22.7%)	14 (60.9%)	0.010
Post-	48 (59.3%)	24 (24.7%)		17 (77.3%)	9 (39.1%)
Family history
Yes	8 (9.9%)	8 (8.2%)	0.705	5 (22.7%)	6 (26.1%)	0.793
No	73 (90.1%)	89 (91.8%)		17 (77.3%)	17 (73.9%)
Location of lesions
Left	51 (63.0%)	52 (53.6%)	0.208	11 (50.0%)	15 (65.2%)	0.302
Right	30 (37.0%)	45 (46.4%)		11 (50.0%)	8 (3%)

Note: The lesion size was defined as the maximum diameter on ABVS images.
Family history refers to breast and/or ovarian cancer in first-degree relatives.
The differences in characteristic variables (age and lesion size) between the two cohorts were compared by two-sample t test, whereas chi-square tests were conducted on other variables. p < 0.05.
Abbreviation: BI-RADS, breast imaging reporting, and data system.

3.2 Feature extraction and the intra- and interobserver agreement

In this study, a total of 1101 (367 × 3) features were extracted for each lesion from the axial, sagittal, and coronal ABVS images. The mean ICCs of intra- and interobserver agreement were 0.96 (range from 0.81 to 0.99) and 0.95 (range from 0.55 to 0.98), respectively, indicating satisfactory intra- and interobserver reproducibility for the radiomics features extracted from the ABVS images.

3.3 Predictive performance of multiple ML pipelines

The predictive performance of different ML pipelines (pairwise combination of 7 feature selection methods and 13ML algorithms) is shown in Figure 2. The top 6 ML pipelines are DNN-RFE (mean AUC: 0.972), AdaBoost-RFE (mean AUC: 0.969), LR-RFE (mean AUC: 0.968), LDA-RFE (mean AUC: 0.968), Bagging-RFE (mean AUC: 0.967), and SVM-RFE (mean AUC: 0.962), respectively. Combined with the feature selection method of RFE, 13 ML algorithms showed stable and satisfactory performance, with mean AUCs ranging from 0.809 to 0.972 (Figure 3A). Fifteen important radiomics features for predicting the malignancy probability of BI-RADS 4 lesions were selected by RFE (Figure 3B).

3.4 Test the predictive performance of selected ML models

In the testing cohort, the performance of selected six ML models (DNN-RFE, AdaBoost-RFE, LR-RFE, LDA-RFE, Bagging-RFE, and SVM-RFE) and three radiologists (R1, R2, and R3) are shown in Figure 4. DNN-RFE also obtained the highest AUC (0.954), followed by LR-RFE (0.948), LDA-RFE (0.942), bagging-RFE (0.942), AdaBoost-RFE (0.940), and SVM-RFE (0.921), while the AUCs of the three radiologists were 0.776, 0.917, and 0.928, respectively. The AUC of the DNN-RFE was significantly higher than that of R1 (0.954 vs. 0.776, p = 0.004) and nonsignificant higher than those of R2 and R3 (0.954 vs. 0.917 and 0.928, p = 0.246, 0.322). In addition, the sensitivity, specificity, accuracy, NPV, and PPV of the six ML pipelines and three radiologists are summarized in Table 4. DNN-RFE had the highest accuracy (88.9%), with sensitivity, specificity, NPV, and PPV were 83.3%, 95.2%, 95.2%, and 83.3%, while R1 had the lowest accuracy (64.4%), with specificity, sensitivity, NPV, and PPV were 45.8%, 85.7%,78.6%, and 58.1%. The McNemar's test comparing DNN-RFE and less experienced radiologist's (R1) readings were significantly different (p = 0.043), but not significantly different from R2 and R3 (p = 0.343, 0.773). Two representative cases show that ML was superior to radiologists in predicting benign and malignant BI-RADS 4 lesions (Figure 5).

TABLE 4. Performance metrics of 6 machine learning pipelines and 3 radiologists in the testing cohort

	AUC	Accuracy	Specificity	Sensitivity	NPV	PPV	TP	FP	TN	FN
DNN-RFE	0.954	88.9%	83.3%	95.2%	95.2%	83.3%	20	4	20	1
LR-RFE	0.948	86.7%	83.3%	90.5%	90.9%	82.6%	19	4	20	2
Bagging-RFE	0.942	86.7%	83.3%	90.5%	90.9%	82.6%	19	4	20	2
LDA-RFE	0.942	86.7%	83.3%	90.5%	90.9%	82.6%	19	4	20	2
AdaBoost-RFE	0.940	84.4%	83.3%	85.7%	87.0%	81.8%	18	4	20	3
SVM-RFE	0.921	82.2%	83.3%	81.0%	83.3%	81.0%	17	4	20	4
Reader 1	0.776	64.4%	45.8%	85.7%	78.6%	58.1%	18	13	11	3
Reader 2	0.917	77.8%	66.7%	90.5%	88.9%	70.4%	19	8	16	2
Reader 3	0.928	86.7%	83.3%	90.5%	90.9%	82.6%	19	4	20	2

Note: Reader 1, Reader 2, Reader 3 with 5, 8, and 10 years of experience in breast ultrasound, respectively.
Abbreviations: AdaBoost, adaptive boosting; DNN, deep neural networks; FN, false negative; FP, false positive; LDA, linear discriminant analysis; LR, logistic regression; NPV, negative predictive value; PPV, positive predictive value; RFE, recursive feature elimination; SVM, support vector machine; TN, true negative; TP, true positive.

4 DISCUSSION

It is well known that BI-RADS 4 lesions have a wide range in the probability of malignancy (2–95%), and the US characteristics of BI-RADS 4 lesions have a certain degree of overlap, which may lead to a high false-positive rate and unnecessary biopsies.²⁶ In our study, only 46.2% (103/223) of BI-RADS 4 were pathologically confirmed to be malignant, meaning that 53.8% of patients received unnecessary invasive procedures. Recently, as a supplement to the conventional US, shear wave elastography (SWE), contrast-enhanced ultrasound (CEUS), and MRI have provided more diagnostic information for BI-RADS 4 lesions, and the AUCs of these multi-mode methods range from 0.78 to 0.93.^27-29 However, these multi-mode methods require specialized medical equipment and specially trained radiologists, which not only increases the workload of radiologists but also increases the financial burden of patients. In our study, we used ML method based on ABVS radiomics, which is an objective, convenient, and low-cost method that can distinguish between benign and malignant BI-RADS 4 lesions, and its performance (AUC = 0.954) is better than previous multi-mode methods. Thus, our research has confirmed the potential and possibility of ML based on ABVS radiomics in distinguishing benign and malignant BI-RADS 4 lesions to some extent.

According to the radiomics quality score (RQS) proposed by Lambin et al.,²² ABVS images with standardized, repeatable, and high-resolution characteristics will be suitable for radiomics analysis, but few studies have investigated the application of ABVS-based radiomics analysis. Marcon et al. used the ML algorithm (SVM) based on ABVS radiomics features to distinguish benign and malignant breast lesions with a maximum AUC value of 0.98 and a maximum accuracy of 90.7%.²⁰ Another study used a novel ML algorithm (DNN) based on ABVS radiomics to detect and classify breast nodules, and the sensitivity, specificity, and accuracy of classification were 87.0%, 88.0%, and 87.5%, respectively.²¹ In our study, the selected ML pipeline (DNN-RFE) based on ABVS radiomics also showed satisfactory discrimination performance in the testing cohort, with the AUC, sensitivity, specificity, and accuracy of 0.954, 95.2%, 83.3%, and 88.9%, respectively. Interestingly, Romeo et al.³⁰ used the ML algorithm (RF) based on US radiomics to distinguish benign and malignant breast lesions with an accuracy of 82% and an AUC of 0.82, which is lower than that of our study (88.9% and 0.954), although our study focuses on more challenging lesions (BI-RADS 4). The possible reason is that the three ABVS images (axial, sagittal, and coronal) may provide more radiomic features and better represent tumor heterogeneity than a single US image. Thus, ABVS images are indeed suitable for radiomics and ML methods, and may provide clinical decision support for the management and treatment of breast cancer. Besides, the application of radiomics and ML in ABVS neither dependence on the experience of radiologists, nor requires specific training of radiologists, and only three ABVS images of the lesion can be used to diagnose the lesion, which greatly simplifies the diagnosis workflow of ABVS and may promote the clinical application of ABVS.

With the increasing development of ML technology, more and more dimensionality reduction methods are proposed, and different dimensionality reduction methods may affect the performance of ML models.³¹ In our study, combined with the mRMR feature selection method, the prediction performance of 13 ML algorithms was generally low (average AUC: 0.762–0.840). On the contrary, combined with the RFE feature selection method, the 13 ML algorithms showed relatively stable and satisfactory performance (mean AUCs: 0.809–0.972). More importantly, the top six ML combinations for predicting malignant BI-RADS 4 lesions were all based on the feature selection method of RFE. Therefore, RFE may be the most appropriate method for selecting the radiomics features of ABVS images in our study. Similar to previous ML studies,^{32, 33} we also applied a variety of ML algorithms in our study and found that the DNN has the best predictive performance in both the training cohort (AUC: 0.972) and the testing cohort (AUC: 0.954). DNN is a special neural network that has three or more “hidden layers” between the input layer and output layer.³⁴ DNNs have achieved great success in various fields of medicine, often obtaining higher accuracy than traditional ML methods and comparable performance to trained human specialists.³⁵ Therefore, it was not surprising that DNN model had the highest AUC in our study. Interestingly, we noticed that the AdaBoost model has the best predictive performance except for DNN in the training cohort, but it has the worst performance except for SVM in the testing cohort. This may be due to the poor robustness of the AdaBoost algorithm in our research, and this is why we selected the top six ML models for further verification in the testing cohort.

In our study, the ML (DNN-RFE) is superior to radiologists, especially less experienced radiologists (R1), in predicting benign and malignant BI-RADS 4 lesions. The main reason is that ML methods are based on the radiomics features of ABVS images, which may not be identified by visual interpretation but may potentially be associated with important clinical outcomes.³⁶ In addition, the underestimated performance of the radiologists may be another reason. In clinical practice, radiologists often evaluate breast lesions based on comprehensive information, including not only dynamic real-time US images but also other examination results and medical histories.³⁷ In this study, in order to ensure the consistency between the radiologist and the ML, only three static ABVS images are provided to the radiologist, which will undoubtedly increase the difficulty of the radiologist. However, from another aspect, this also indirectly confirmed the potential advantage of ML in predicting BI-RADS 4 benign and malignant lesions.

There were several limitations in the present study. First, the number of lesions from a single center was relatively small, and a lack of external validation may have lead to incomplete or biased results. Second, radiomics features were extracted from the two-dimensional (2D) ROI of the ABVS images instead of the 3D ROI. However, 3D ROIs would significantly increase the workload and time of radiologists, and a study revealed that 2D single-slice texture analysis affords fairly comparable results to those afforded by 3D whole-tumor analyses.³⁸ Third, the value of ICC analysis is limited as all radiologists performed segmentation on the ABVS images selected by a single radiologist, rather than performing independent selection of the images. Last, manual segmentation of lesions may yield inter-operator variability. With the development of artificial intelligence, semiautomatic, or automatic segmentation can be performed in the foreseeable future.

5 CONCLUSIONS

ML based on ABVS radiomics may be a potential tool to distinguish between benign and malignant BI-RADS 4 lesions and reduce unnecessary biopsies in patients. Considering its noninvasively, objectivity, and convenience, this method is worthy of further validation in large-scale and multicenter studies.

ACKNOWLEDGMENTS

The authors gratefully acknowledge the financial support by the Guangdong Province Key Field Research and Development Project (2018B030332001), as well as the Sun Yat-sen University 5010 Program Cultivation Project (2016016).

CONFLICT OF INTEREST

The authors declare no conflicts of interest.

AUTHOR CONTRIBUTIONS

Conceptualization: Shi-jie Wang, Hua-qing Liu, Tao Yang, Ming-quan Huang, Bo-wen Zheng, Tao Wu, Lan-qing Han, Yong Zhang, and Jie Ren; Methodology: Shi-jie Wang, Hua-qing Liu, Tao Yang, Ming-quan Huang, Bo-wen Zheng, Tao Wu, Lan-qing Han, Yong Zhang, and Jie Ren; Software: Shi-jie Wang, Hua-qing Liu, Lan-qing Han, and Jie Ren; Formal Analysis: Shi-jie Wang, Hua-qing Liu, Tao Yang, Ming-quan Huang, Bo-wen Zheng, Tao Wu, Lan-qing Han, Yong Zhang, and Jie Ren; Investigation: Shi-jie Wang, Hua-qing Liu, Tao Yang, Ming-quan Huang, Bo-wen Zheng, Tao Wu, Lan-qing Han, Yong Zhang, and Jie Ren; Resources: Shi-jie Wang, Tao Yang, Ming-quan Huang, and Jie Ren; Data Curation: Shi-jie Wang, Hua-qing Liu, Tao Yang, Ming-quan Huang, Bo-wen Zheng, Tao Wu, Lan-qing Han, Yong Zhang, and Jie Ren; Writing—Original Draft Preparation: Shi-jie Wang, Hua-qing Liu, Bo-wen Zheng, Tao Wu, Yong Zhang, and Jie Ren; Writing—Review and Editing: Shi-jie Wang, Hua-qing Liu, Tao Yang, Ming-quan Huang, Bo-wen Zheng, Tao Wu, Lan-qing Han, Yong Zhang, and Jie Ren; Supervision: Yong Zhang, and Jie Ren; Funding Acquisition: Jie Ren All authors have read and agreed to the published version of the manuscript.

Open Research

DATA AVAILABILITY STATEMENT

The data that support the findings of this study are available from the corresponding author upon reasonable request.

REFERENCES

1Berg WA, Bandos AI, Mendelson EB, Lehrer D, Jong RA, Pisano ED. Ultrasound as the primary screening test for breast cancer: analysis from ACRIN 6666. J Natl Cancer Inst. 2016; 108: djv367.
10.1093/jnci/djv367
PubMed Google Scholar
2D'Orsi CJ, Mendelson EB, Morris E. Breast Imaging Reporting and Data System: ACR BI-RADS Atlas. American College of Radiology 2013.
Google Scholar
3Mercado CL. Bi-RADS update. Radiol Clin N Am. 2014; 52: 481-487.
10.1016/j.rcl.2014.02.008
PubMed Web of Science® Google Scholar
4Bent CK, Bassett LW, D'Orsi CJ, Sayre JW. The positive predictive value of BI-RADS microcalcification descriptors and final assessment categories. AJR Am J Roentgenol. 2010; 194: 1378-1383.
10.2214/AJR.09.3423
PubMed Web of Science® Google Scholar
5Kerlikowske K, Hubbard RA, Miglioretti DL, et al. Comparative effectiveness of digital versus film-screen mammography in community practice in the United States: a cohort study. Ann Intern Med. 2011; 155: 493-502.
10.7326/0003-4819-155-8-201110180-00005
PubMed Web of Science® Google Scholar
6Elezaby M, Li G, Bhargavan-Chatfield M, Burnside ES, DeMartini WB. ACR BI-RADS assessment category 4 subdivisions in diagnostic mammography: utilization and outcomes in the National Mammography Database. Radiology. 2018; 287: 416-422.
10.1148/radiol.2017170770
PubMed Web of Science® Google Scholar
7Calhoun BC. Core needle biopsy of the breast: an evaluation of contemporary data. Surg Pathol Clin. 2018; 11: 1-16.
10.1016/j.path.2017.09.001
PubMed Google Scholar
8Kim SH, Kim HH, Moon WK. Automated breast ultrasound screening for dense breasts. Korean J Radiol. 2020; 21: 15.
10.3348/kjr.2019.0176
PubMed Web of Science® Google Scholar
9Rella R, Belli P, Giuliani M, et al. Automated breast ultrasonography (ABUS) in the screening and diagnostic setting. Acad Radiol. 2018; 25: 1457-1470.
10.1016/j.acra.2018.02.014
PubMed Web of Science® Google Scholar
10Zanotel M, Bednarova I, Londero V, et al. Automated breast ultrasound: basic principles and emerging clinical applications. Radiol Med. 2018; 123: 1-12.
10.1007/s11547-017-0805-z
PubMed Web of Science® Google Scholar
11Zhang X, Lin X, Tan Y, et al. A multicenter hospital-based diagnosis study of automated breast ultrasound system in detecting breast cancer among Chinese women. Chinese J Cancer Res. 2018; 30: 231-239.
10.21147/j.issn.1000-9604.2018.02.06
PubMed Web of Science® Google Scholar
12Zheng FY, Yan LX, Huang BJ, et al. Comparison of retraction phenomenon and BI-RADS-US descriptors in differentiating benign and malignant breast masses using an automated breast volume scanner. Eur J Radiol. 2015; 84: 2123-2129.
10.1016/j.ejrad.2015.07.028
PubMed Web of Science® Google Scholar
13van Zelst J, Mann RM. Automated three-dimensional breast US for screening: technique, artifacts, and lesion characterization. Radiographics. 2018; 38: 663-683.
10.1148/rg.2018170162
PubMed Web of Science® Google Scholar
14Lin X, Wang J, Han F, Fu J, Li A. Analysis of eighty-one cases with breast lesions using automated breast volume scanner and comparison with handheld ultrasound. Eur J Radiol. 2012; 81: 873-878.
10.1016/j.ejrad.2011.02.038
PubMed Web of Science® Google Scholar
15Tang G, An X, Xiang H, Liu L, Li A, Lin X. Automated breast ultrasound: Interobserver agreement, diagnostic value, and associated clinical factors of coronal-plane image features. Korean J Radiol. 2020; 21: 550.
10.3348/kjr.2019.0525
PubMed Web of Science® Google Scholar
16Hou C, Zhong X, He P, et al. Predicting breast cancer in Chinese women using machine learning techniques: algorithm development. JMIR Med Inform. 2020; 8:e17364.
10.2196/17364
PubMed Web of Science® Google Scholar
17Lee S, Park H, Ko ES. Radiomics in breast imaging from techniques to clinical applications: a review. Korean J Radiol. 2020; 21: 779.
10.3348/kjr.2019.0855
PubMed Web of Science® Google Scholar
18Zheng X, Yao Z, Huang Y, et al. Deep learning radiomics can predict axillary lymph node status in early-stage breast cancer. Nat Commun. 2020; 11: 1236.
10.1038/s41467-020-15027-z
CAS PubMed Web of Science® Google Scholar
19Chen S, Shu Z, Li Y, et al. Machine learning-based Radiomics nomogram using magnetic resonance images for prediction of Neoadjuvant chemotherapy efficacy in breast cancer patients. Front Oncol. 2020; 10: 1410.
10.3389/fonc.2020.01410
PubMed Web of Science® Google Scholar
20Marcon M, Ciritsis A, Rossi C, et al. Diagnostic performance of machine learning applied to texture analysis-derived features for breast lesion characterisation at automated breast ultrasound: a pilot study. Euro Radiol Experim. 2019; 3: 44.
10.1186/s41747-019-0121-6
PubMed Google Scholar
21Wang F, Liu X, Yuan N, et al. Study on automatic detection and classification of breast nodule using deep convolutional neural network system. J Thorac Dis. 2020; 12: 4690-4701.
10.21037/jtd-19-3013
PubMed Web of Science® Google Scholar
22Lambin P, Leijenaar R, Deist TM, et al. Radiomics: the bridge between medical imaging and personalized medicine. Nat Rev Clin Oncol. 2017; 14: 749-762.
10.1038/nrclinonc.2017.141
PubMed Web of Science® Google Scholar
23Collins LC. Precision pathology as applied to breast core needle biopsy evaluation: implications for management. Mod Pathol. 2021; 34: 48-61.
10.1038/s41379-020-00666-w
PubMed Web of Science® Google Scholar
24Liu Z, Li Z, Qu J, et al. Radiomics of multiparametric MRI for pretreatment prediction of pathologic complete response to Neoadjuvant chemotherapy in breast cancer: a multicenter study. Clin Cancer Res. 2019; 25: 3538-3547.
10.1158/1078-0432.CCR-18-3190
CAS PubMed Web of Science® Google Scholar
25Nakagawa M, Nakaura T, Namimoto T, et al. Machine learning to differentiate T2-weighted Hyperintense uterine Leiomyomas from uterine sarcomas by utilizing multiparametric magnetic resonance quantitative imaging features. Acad Radiol. 2019; 26: 1390-1399.
10.1016/j.acra.2018.11.014
PubMed Web of Science® Google Scholar
26Liu G, Zhang M, He Y, Liu Y, Li X, Wang Z. BI-RADS 4 breast lesions: could multi-mode ultrasound be helpful for their diagnosis? Gland Surg. 2019; 8: 258-270.
10.21037/gs.2019.05.01
PubMed Web of Science® Google Scholar
27Park SY, Kang BJ. Combination of shear-wave elastography with ultrasonography for detection of breast cancer and reduction of unnecessary biopsies: a systematic review and meta-analysis. Ultrasonography. 2021; 40: 318-332.
10.14366/usg.20058
PubMed Web of Science® Google Scholar
28Liang YC, Jia CM, Xue Y, Lu Q, Chen F, Wang JJ. Diagnostic value of contrast-enhanced ultrasound in breast lesions of BI-RADS 4. Zhonghua Yi Xue Za Zhi. 2018; 98: 1498-1502.
CAS PubMed Google Scholar
29Clauser P, Krug B, Bickel H, et al. Diffusion-weighted imaging allows for downgrading MR BI-RADS 4 lesions in contrast-enhanced MRI of the breast to avoid unnecessary biopsy. Clin Cancer Res. 2021; 27: 1941-1948.
10.1158/1078-0432.CCR-20-3037
CAS PubMed Web of Science® Google Scholar
30Romeo V, Cuocolo R, Apolito R, et al. Clinical value of radiomics and machine learning in breast ultrasound: a multicenter study for differential diagnosis of benign and malignant lesions. Eur Radiol. 2021; 31: 9511-9519.
10.1007/s00330-021-08009-2
PubMed Web of Science® Google Scholar
31Remeseiro B, Bolon-Canedo V. A review of feature selection methods in medical applications. Comput Biol Med. 2019; 112:103375.
10.1016/j.compbiomed.2019.103375
CAS PubMed Web of Science® Google Scholar
32Wang H, Song B, Ye N, et al. Machine learning-based multiparametric MRI radiomics for predicting the aggressiveness of papillary thyroid carcinoma. Eur J Radiol. 2020; 122:108755.
10.1016/j.ejrad.2019.108755
PubMed Web of Science® Google Scholar
33Sun W, Jiang M, Dang J, Chang P, Yin FF. Effect of machine learning methods on predicting NSCLC overall survival time based on Radiomics analysis. Radiat Oncol. 2018; 13: 197.
10.1186/s13014-018-1140-9
PubMed Web of Science® Google Scholar
34Sheu YH. Illuminating the black box: interpreting deep neural network models for psychiatric research. Front Psych. 2020; 11:551299.
10.3389/fpsyt.2020.551299
Web of Science® Google Scholar
35Gulshan V, Peng L, Coram M, et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA. 2016; 316: 2402-2410.
10.1001/jama.2016.17216
PubMed Web of Science® Google Scholar
36Luo W, Huang Q, Huang X, Hu H, Zeng F, Wang W. Predicting breast cancer in breast imaging reporting and data system (BI-RADS) ultrasound category 4 or 5 lesions: a nomogram combining Radiomics and BI-RADS. Sci Rep. 2019; 9: 11921.
10.1038/s41598-019-48488-4
PubMed Web of Science® Google Scholar
37Zhao C, Xiao M, Liu H, et al. Reducing the number of unnecessary biopsies of US-BI-RADS 4a lesions through a deep learning method for residents-in-training: a cross-sectional study. BMJ Open. 2020; 10:e35757.
10.1136/bmjopen-2019-035757
PubMed Web of Science® Google Scholar
38Lubner MG, Stabo N, Lubner SJ, et al. CT textural analysis of hepatic metastatic colorectal cancer: pre-treatment tumor heterogeneity correlates with pathology and clinical outcomes. Abdom Imaging. 2015; 40: 2331-2337.
10.1007/s00261-015-0438-4
PubMed Web of Science® Google Scholar

Citing Literature

Volume32, Issue5

September 2022

Pages 1577-1587

This article also appears in:

Machine learning based on automated breast volume scanner (ABVS) radiomics for differential diagnosis of benign and malignant BI-RADS 4 lesions

Abstract

1 INTRODUCTION

2 METHODS AND MATERIALS

2.1 Patients enrollments

2.2 ABVS images and clinicopathological information acquisition

2.3 Lesion segmentation and radiomics feature extraction

2.4 Machine learning based on ABVS radiomics

2.5 Training and testing of machine learning models

2.6 Statistical analysis

3 RESULTS

3.1 Clinicopathologic characteristics of breast lesions

3.2 Feature extraction and the intra- and interobserver agreement

3.3 Predictive performance of multiple ML pipelines

3.4 Test the predictive performance of selected ML models

4 DISCUSSION

5 CONCLUSIONS

ACKNOWLEDGMENTS

CONFLICT OF INTEREST

AUTHOR CONTRIBUTIONS

Open Research

DATA AVAILABILITY STATEMENT

REFERENCES

Citing Literature

Figures

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

Machine learning based on automated breast volume scanner (ABVS) radiomics for differential diagnosis of benign and malignant BI-RADS 4 lesions

Abstract

1 INTRODUCTION

2 METHODS AND MATERIALS

2.1 Patients enrollments

2.2 ABVS images and clinicopathological information acquisition

2.3 Lesion segmentation and radiomics feature extraction

2.4 Machine learning based on ABVS radiomics

2.5 Training and testing of machine learning models

2.6 Statistical analysis

3 RESULTS

3.1 Clinicopathologic characteristics of breast lesions

3.2 Feature extraction and the intra- and interobserver agreement

3.3 Predictive performance of multiple ML pipelines

3.4 Test the predictive performance of selected ML models

4 DISCUSSION

5 CONCLUSIONS

ACKNOWLEDGMENTS

CONFLICT OF INTEREST

AUTHOR CONTRIBUTIONS

Open Research

DATA AVAILABILITY STATEMENT

REFERENCES

Citing Literature

Figures

References

Related

Information