Could Ultrasound-Based Radiomics Noninvasively Predict Axillary Lymph Node Metastasis in Breast Cancer?
Abstract
Objectives
This work aimed to investigate whether quantitative radiomics imaging features extracted from ultrasound (US) can noninvasively predict breast cancer (BC) metastasis to axillary lymph nodes (ALNs).
Methods
Presurgical B-mode US data of 196 patients with BC were retrospectively studied. The cases were divided into the training and validation cohorts (n = 141 versus 55). The elastic net regression technique was used for selecting features and building a signature in the training cohort. A linear combination of the selected features weighted by their respective coefficients produced a radiomics signature for each individual. A radiomics nomogram was established based on the radiomics signature and US-reported ALN status. In a receiver operating characteristic curve analysis, areas under the curves (AUCs) were determined for assessing the accuracy of the prediction model in predicting ALN metastasis in both cohorts. The clinical value was assessed by a decision curve analysis.
Results
In all, 843 radiomics features per case were obtained from expert-delineated lesions on US imaging in this study. Through radiomics feature selection, 21 features were selected to constitute the radiomics signature for predicting ALN metastasis. Area under the curve values of 0.778 and 0.725 were obtained in the training and validation cohorts, respectively, indicating moderate predictive ability. The radiomics nomogram comprising the radiomics signature and US-reported ALN status showed the best performance for ALN detection in the training cohort (AUC, 0.816) but moderate performance in the validation cohort (AUC, 0.759). The decision curve showed that both the radiomics signature and nomogram displayed good clinical utility.
Conclusions
This pilot radiomics study provided a noninvasive method for predicting presurgical ALN metastasis status in BC.
Abbreviations
-
- ALN
-
- axillary lymph node
-
- ALND
-
- axillary lymph node dissection
-
- AUC
-
- area under the curve
-
- BC
-
- breast cancer
-
- CI
-
- confidence interval
-
- HER2
-
- human epidermal growth factor receptor 2
-
- LNM
-
- lymph node metastasis
-
- ROC
-
- receiver operating characteristic
-
- US
-
- ultrasound
Breast cancer (BC) represents one of the most common malignancies and the deadliest cancer in women.1 About 98.6% of women survive for 5 years on diagnosis of localized BC. However, this rate is reduced to 84.4% in cases of regional lymph node metastasis (LNM).2 The axillary lymph node (ALN) status constitutes the most important independent prognostic factor for predicting disease-free survival as well as overall survival in patients with BC.3
Clinically, the diagnosis of ALN involvement in BC is determined by sentinel lymph node biopsy and axillary lymph node dissection (ALND), which is an important treatment option for invasive BC. It is noteworthy that both techniques are invasive, which might result in a cohort of complications to various extents. Indeed, ALND causes lymphedema, infection, shoulder motion restriction, and major vessel and nerve injuries due to the aggressiveness and extensiveness of surgery.4, 5 Meanwhile, as an alternative to conventional ALND, sentinel lymph node biopsy is associated with less-extensive complications, including abnormal function, nerve injury, lymphedema, and upper arm numbness.6, 7 Furthermore, whether lymph node–negative patients could benefit from ALND remains largely unclear.8, 9 Therefore, accurate preoperative identification of the ALN status could facilitate clinical decision making, and noninvasive assessments of the ALN status in BC is attracting growing attention.
Axillary ultrasound (US) is a routine imaging modality for preoperatively evaluating the status of ALNs, but its value in detecting ALN metastasis is limited. It has been reported that there is a certain relationship between the morphologic characteristics of primary BC lesions and tumor biological behavior such as LNM. These features mainly focus on morphologic changes, such as the tumor dimension, nonsmooth margins, calcification, and blood flow classification.10 However, these imaging features are visible to the naked eye on the basis of the subjective experience of physicians. The diagnostic efficiency has been limited, and false-negative and false-positive tumor diagnoses have occurred.
Radiomics has been a research hot topic in recent years. It uses advanced feature analysis algorithms to extract high-dimensional information such as image shape, intensity, and texture features from medical images.11 In recent years, it has been applied to establish a prediction model for providing potential noninvasive biomarkers for clinical decision making, especially in the field of oncology.11-13 In BC, radiomics plays a role in differentiating between malignant and benign lesions,14, 15 analyzing molecular subtypes,16-19 and predicting the prognosis and chemotherapy response.20-27 However, previous works mainly focused on computed tomography and magnetic resonance imaging, but computed tomography uses radiation, and magnetic resonance imaging is expensive. They both cannot be used as conventional evaluation methods. Compared with these imaging technologies, US has the advantages of its radiation-free nature, simple operation, fast imaging, and low price. It is widely used in clinical diagnosis in China, especially in thyroid and breast examinations. Many scholars have extended radiomics to US imaging. Therefore, in this study, we aimed to investigate whether quantitative radiomics imaging features extracted from US could noninvasively predict the ALN status in BC.
Materials and Methods
Patients
This study had approval from the Institutional Review Board of the First Affiliated Hospital, College of Medicine, Zhejiang University Hospital, which waved the requirement for informed consent. A total of 256 Chinese women with invasive BC (confirmed by pathology) were recruited from January 2017 to March 2018 in our center. Inclusion criteria were as follows: (1) overt lesion on preoperative US images; (2) histopathologically confirmed BC in the surgical specimen; and (3) lymph node status assessed by sentinel lymph node biopsy and ALND. Exclusion criteria were as follows: (1) preoperative radiotherapy, chemotherapy, or endocrine therapy; (2) neoadjuvant chemotherapy before surgery; and (3) contralateral or multifocal breast tumors. Ultimately, 196 lesions were analyzed, including 21 (10.7%) triple-negative, 22 (11.2%) human epidermal growth factor receptor 2 (HER2)–enriched, 41 (20.9%) luminal A, and 112 (57.7%) luminal B lesions.
Ultrasound Examination and US-Reported ALN Status
All patients were examined with a MyLab Twice scanner (Esaote SpA, Genoa, Italy) using a 5–13-MHz LA523 transducer, A LOGIQ E9 scanner (GE Healthcare, Chicago, IL) using 6–13-MHz linear transducer, an iU22 scanner (Philips Healthcare, Andover, MA) using a 5–12-MHz L12-5 transducer, and an Acuson S2000 scanner (Siemens Medical Solutions, Mountain View, CA) using a 5.5–18-MHz 18 L6 HD transducer. All images were collected by an image archiving and communication system and interpreted by 2 breast radiology experts (10 years of experience) in an independent fashion. Any discrepancy was resolved by consensus.
Each lesion was assigned a category (4A, 4B, 4C, or 5) according to the second edition of the American College of Radiology Breast Imaging Reporting and Data System for US.28 An ALN was believed to be positive for suspicious metastasis when it had one of the following characteristics: irregular cortical thickness of greater than 3 mm, longest-to-shortest axis ratio of less than 2, or absence of a fatty hilum.29
Feature Extraction
All US images in the Digital Imaging and Communications in Medicine format were acquired and imported into 3D Slicer version 4.8.1 software (Brigham and Women's Hospital, Boston, MA) one by one. The regions of interest of the image focus were delineated in the largest sectional area by a radiologist with 10 years of clinical experience with using the Segment Editor platform integrated in the 3D Slicer. An example of BC lesion delineation is shown in Figure 1.

The image texture features were extracted on the PyRadiomics platform (https://github.com/Radiomics/pyradiomics) integrated in the 3D Slicer. Wavelet transform was used to analyze the spatial time-frequency of the obtained 2-dimensional images, which were divided into 8 subimages: HHH, HLL, HLH, HHL, LLH, LHL, LHH, and LLL. Finally, there were 843 radiomics features for each case from the expert-delineated lesions on the original US images and 8 wavelet transform subimages. The feature categories included shape, first-order, gray-level co-occurrence matrix, gray-level size zone matrix, gray-level distance zone matrix, neighborhood gray-tone difference matrix, and gray-level run length matrix.
Feature Selection and Radiomics Model Generation
Forty-six nodules were randomly selected from the total sample for region of interest segmentation and feature extraction once more by the same radiologist and a different one. Interclass correlation coefficients were used to assess interobserver agreement for feature extraction. Then random sampling in the training and validation sets (ratio approximating 7:3) was performed. The training cohort was used to select features and generate the prediction model. The elastic net30 logistic regression algorithm, a combination of the lasso and ridge regression approaches, was used for selecting the most efficient predictive indices with nonzero coefficients from among the obtained texture features with good reproducibility by adjusting the coefficients λ and α. A model generated via elastic net regression was the linear weighted sum of the chosen radiomics features. The most useful predictive features retained helped build a logistic regression model for predicting ALN metastasis in patients with BC. The associations between ALN metastasis and the BC lesion Breast Imaging Reporting and Data System category, clinical factors (including age, childbearing history, menopause history, and family history), and US-reported ALN status were assessed by univariate analyses. The factors with P < .05 in the univariate analyses were incorporated with the radiomics signature to build a radiomics nomogram as a quantitative tool to predict ALN metastasis using a stepwise selection logistic regression analysis. In a receiver operating characteristic (ROC) curve analysis, the predictive accuracy was assessed by determining the area under curve (AUC) in the training cohort.
Validation and Clinical Utility of the Prediction Model
The generated prediction model was further verified in the validation cohort on the basis of the thresholds found in the training cohort. The relevant AUC was also calculated. A decision curve analysis was performed to assess the clinical utility of the newly developed model, and net benefits at various threshold probabilities were quantified in the training set.
Statistical Analysis
R version 3.5.1 statistical software (R Foundation for Statistical Computing, Vienna, Austria) was used for the statistical analysis. An elastic net logistic regression analysis was performed with the glmnet package. Nomogram construction was performed with the rms package. Receiver operating characteristic curve generation was performed with the pROC package, which was compared by the DeLong test. The decision curve analysis was performed with the rmda package. Bilateral P < .05 indicated statistical significance.
Results
Patient Features
Table 1 summarizes the features of BC cases in both training and validation sets. The training cohort (n = 141; mean age ± SD, 53.35 ± 10.97 years) included 18 (12.8%) triple-negative, 13 (9.2%) HER2-positive, 2 6(18.4%) luminal A, and 84 (59.6%) luminal B lesions. The validation cohort (n = 55; mean age, 53.35 ± 12.15 years) included 3 (5.5%) triple-negative, 9 (16.4%) HER2-positive, 14 (25.5%) luminal A, and 29 (52.7%) luminal B lesions. Both sets had comparable ALN prevalence rates (P = .107). Axillary lymph node metastasis positivity rates were 31.9% and 45.5% in the training and validation cohorts, respectively.
Characteristic | Primary Cohort | Validation Cohort | P |
---|---|---|---|
Age, y | 53.35 ± 10.97 | 53.35 ± 12.15 | .996 |
Luminal A | 26 (18.4) | 14 (25.5) | .274 |
Luminal B | 84 (59.6) | 29 (52.7) | .477 |
HER2-enriched | 13 (9.2) | 9 (16.4) | .241 |
Triple-negative | 18 (12.8) | 3 (5.5) | .219 |
Estrogen receptor | 109 (77.3) | 42 (76.4) | >.999 |
Progesterone receptor | 96 (68.1) | 41 (74.5) | .476 |
HER2 | 32 (22.7) | 13 (23.6) | >.999 |
Childbearing history | 137 (97.2) | 54 (98.2) | >.999 |
Menopause history | 69 (48.9) | 32 (58.2) | .315 |
Family history | 6 (4.3) | 0 (0.0) | .275 |
ALN metastasis positive | 45 (31.9) | 25 (45.5) | .107 |
- Data are presented as mean ± SD and number (percent) where applicable.
Radiomics Features
In all, 843 features were obtained from US images that masked the selected lesions. Among these features, 626 showing an interclass correlation coefficient of greater than 0.75 were used for the subsequent analysis. A total of 21 parameters (Table 2) with nonzero coefficients were chosen by using the elastic net logistic regression model in the training cohort (Figure 2).
Image Type | Feature Class | Feature Name | Coefficient |
---|---|---|---|
Wavelet.HHH | GLCM | JointEntropy | 6.606913619 |
Wavelet.HHH | GLCM | JointEnergy | −10.61526552 |
Wavelet.HLL | GLSZM | GrayLevelVariance | 0.091019557 |
Wavelet.HLL | GLSZM | GrayLevelNonUniformityNormalized | −0.043373295 |
Wavelet.HLL | GLSZM | SizeZoneNonUniformityNormalized | −0.028913255 |
Wavelet.HLL | GLSZM | GrayLevelNonUniformity | 0.002582036 |
Wavelet.HLL | GLSZM | HighGrayLevelZoneEmphasis | 0.115264983 |
Wavelet.HLL | GLSZM | LowGrayLevelZoneEmphasis | −0.452924866 |
Wavelet.LHL | GLDM | DependenceVariance | 0.126561699 |
Wavelet.LHL | GLSZM | SizeZoneNonUniformity | 0.001372282 |
Wavelet.LHL | GLSZM | GrayLevelNonUniformity | 0.001874068 |
Wavelet.LHL | NGTDM | Strength | −0.190068297 |
Wavelet.LHH | GLDM | DependenceEntropy | −0.281946732 |
Wavelet.LLH | GLDM | SmallDependenceEmphasis | 4.763295426 |
Wavelet.LLH | GLCM | Imc1 | 0.787816784 |
Wavelet.LLH | GLRLM | ShortRunEmphasis | 1.713537119 |
Wavelet.LLL | GLDM | SmallDependenceHighGrayLevelEmphasis | 0.010883966 |
Wavelet.LLL | GLDM | LargeDependenceLowGrayLevelEmphasis | 0.036722449 |
Wavelet.LLL | GLRLM | LongRunLowGrayLevelEmphasis | 0.073736589 |
Wavelet.LLL | GLSZM | SizeZoneNonUniformity | 0.000651089 |
Wavelet.HHL | First-order | Skewness | −0.108771155 |
- GLCM indicates gray-level co-occurrence matrix; GLDM, gray-level distance zone matrix; GLRLM, gray-level run length matrix; GLSZM, gray-level size zone matrix; and NGTDM, neighborhood gray-tone difference matrix.

Performance of the Radiomics Classifier
A nomogram was constructed with the US-reported ALN status and radiomics signature (Figure 3). The AUC values were higher for both the radiomics signature and nomogram for predicting ALN metastasis (AUC, 0.816 [95% confidence interval (CI), 0.740–0.891]; and 0.778 [95% CI, 0.659–0.861], respectively), than that of the US-reported ALN status alone (AUC, 0.626 [95% CI, 0.549–0.703]; P < .001) in the training cohort. However, there was no significant difference between the radiomics signature and nomogram (P < .001) in the training cohort, and there was no statistical difference in accuracy among the 3 models (AUC, 0.759 [95% CI, 0.629–0.889]; 0.725 [95% CI, 0.588–0.863]; and 0.687 [95% CI, 0.577–0.796]; P > .001) in the validation cohort. The predictive performances for discriminating ALN metastasis, as determined by ROC curves, are shown in Figure 4.


Clinical Value of the Prediction Model
The decision curve analyses based on the US-reported ALN status, radiomics signature, and radiomics nomogram are depicted in Figure 5. The net benefit of applying the newly developed radiomics nomogram for predicting ALN metastasis achieved the most clinical utility when the threshold probability for a patient was greater than 0.03.

Discussion
This work used radiomics based on US, which is used in BC screening and diagnosis more widely than radiography and magnetic resonance imaging in China, to assess associations of BC ALN metastasis with radiomics features. The outcomes reported in this are promising. We identified 21 texture features derived from US of primary tumors for predicting ALN metastasis in BC, with AUC values of 0.778 and 0.725 in the training and validation sets, respectively. Integrated with the US-reported ALN status, the radiomics nomogram could not further improve its predictive accuracy for ALN metastasis, with AUC values of 0.816 and 0.759 in the training and validation sets. Patients could obtain a pronounced net benefit from these 3 models compared with the treat-all or treat-none scheme. However, the radiomics nomogram achieved the most clinical utility to predict ALN metastasis when the threshold probability for a patient was greater than 0.03.
The ALN status is of great importance to staging, treatment, and prognosis in BC and also represents one of the most important reference indicators for postoperative radiotherapy and chemotherapy.31 It plays a critical role in the development of personalized treatment plans for BC cases. Klar et al32 designed the Memorial Sloan-Kettering Cancer Center nomogram based on 3786 cases undergoing lymph node biopsy, with an AUC of 0.754 in determining the odds of LNM. Additionally, many works33-36 have generated multivariate models to predict LNM referring to clinicopathologic data. However, clinicopathologic data can only be obtained after surgery and an immunohistochemical examination. Clinically, there is a great and increasing need for accurate and noninvasive methods for predicting preoperative ALN metastasis. Compared with previous clinicopathologic reports, this work had the advantage that ALN metastasis could be assessed preoperatively and in a noninvasive fashion.
Some related research has shown that pathologic features of primary tumors can predict ALN metastasis in BC.8, 33 Thus, the quantitative image features of primary tumors could also be used for predicting ALN metastasis. Previously, a few reports could predict the LNM status in BC using radiomics features obtained from primary tumors, with good accuracy (AUC values ranging from 70.99% to 89.54%).37-42 In some of these studies, multivariate models were generated to predict LNM on the basis of the radiomics signature and clinical data. In this study, a radiomics signature was generated solely on the basis of features extracted from the US images of primary tumors. As shown above, the radiomics signature was effective in predicting ALN metastasis. Moreover, this method only relied on US, which is widely used for BC diagnosis; therefore, it may serve as a routine checkup tool for BC.
Yu et al42 also used the radiomics features extracted from US images to construct a nomogram to predict axillary LNM in early-stage invasive BC with a larger data set (426 patients). The radiomics signature, which consisted of 14 selected ALN status–related features, achieved moderate prediction efficacy, with AUCs of 0.78 and 0.71 in the primary and validation cohorts, respectively, and the radiomics nomogram comprising the tumor size, US-reported ALN status, and radiomics signature achieved better predictive efficacy (AUC, 0.84 and 0.81 in the primary and validation cohorts). The results were similar to our results. In their study, only early tumor cases were selected, and 96 radiomics features were extracted, whereas in our study, cases included advanced tumors, which expanded the range of the tumor stage. Wavelet transform was used to analyze the spatial time-frequency of the obtained 2-dimensional images, and finally, a total of 843 radiomics features were extracted, which greatly expanded the feature dimension.
The development of radiomics has very important clinical application prospects. It also brings new opportunities for the prediction of ALN metastasis in BC. The number of features extracted from medical images is usually dozens, hundreds, or even thousands. There is feature repeatability between different image categories, so redundancy needs to be removed. However, there is no unified standard to eliminate redundancy, so the view that the more parameters, the better may be misleading. Radiomics data usually characterized by a small sample size and large variable size would increase the overfitting in classification, theoretically leading to deception of the diagnostic results of some training data, which may not be extended to the new test data. At present, the research on ALN metastasis prediction by radiomics based on US images is still in its infancy. Its prediction efficiency has not reached the ideal state, so the model needs to be further optimized.
The limitations of this study should be mentioned. Firstly, the sample size was relatively small. Second, the features extracted in this study were based on a single section per case, and their value was used to represent the whole 3-dimensional lesion. Third, this was a retrospective analysis of breast US images from a single center. There is still a long way to go before ALN metastasis can be predicted by radiomics analysis based on breast US. Further studies assessing large multicenter cohorts with standardized 3-dimensional image acquisition tools are needed.
In conclusion, this pilot radiomics study provided an accurate and noninvasive method for predicting the presurgical ALN metastasis status in BC, which is very important in guiding treatment and avoiding needless invasive removal of lymph nodes.