Establishment and verification of a radiomics nomogram to predict distant metastasis in patients with descending type of nasopharyngeal carcinoma
Qin Yang and Yu Chen contributed equally to this study.
Abstract
Distant metastasis is one of the main reasons for the failure of nasopharyngeal carcinoma (NPC) treatment, and descending type of nasopharyngeal carcinoma (type D NPC) is more prone to distant metastasis. Few people have explored the relationship between the radiomics characteristics of lymph nodes and the distant metastasis of type D NPC. Therefore, we establish a nomogram based on radiomics risk factors to predict distant metastasis in patients with type D NPC. This study retrospectively included 144 type D NPC (T1-2N2-3MO, AJCC 8th). 2600 features were extracted each from CT and MRI examinations conducted before treatment, respectively. Feature selection was performed by least absolute shrinkage and selection operator regression. A binary logistic regression model was used to construct a nomogram, and the C-index and calibration curve were used to evaluate the discrimination and accuracy of the nomogram. Combining CT and MRI radiomics features with a multimodal radiomics model, the average area under curve of the synthetic minority oversampling technique (SMOTE) data set was 0.873 (95% confidence interval [CI]: 0.797–0.949). The C-index in the training and validation sets of the original data set were 0.91 (95% CI: 0.848–0.972) and 0.815 (95% CI: 0.664–0.967); the sensitivity were 0.75 and 0.545, the specificity were 0.932 and 0.903, and the accuracy were 0.882 and 0.81. Therefore, we concluded that the multimodal radiomics model in predicting distant metastasis in descending type of NPC patients was good. The proposed model can provide a reference for precise treatment and prognosis prediction.
Graphical Abstract
We established a nomogram based on lymph nodes radiomic risk factors to predict distant metastasis in type D nasopharyngeal carcinoma patients. The area under curve values, sensitivity, specificity, and accuracy of the three models of CT, MRI, and multimodal radiomics were compared and analyzed. The multimodal radiomics model was significantly better than the CT or MRI single model in these aspects.
1 INTRODUCTION
Nasopharyngeal carcinoma (NPC) is a common malignant tumor of the head and neck. There were approximately 129,000 new cases in 2018, and more than 70% of the new cases were distributed in East Asia and Southeast Asia. The age-standardized rate of NPC in the Chinese population is approximately 3/100,000,1 and more men than women are affected. Literature published in 2015 reported that the mortality rate of NPC patients in China was 34.1/1000.2 The main treatment method is intensity-modulated radiotherapy, which can be combined with chemotherapy based on the clinical stage of the patient; for most recurrent and metastatic NPCs, chemotherapy is still the first choice. According to the characteristics of natural disease progression, NPC is classified into the following three categories: (1) T3-4N0-1, defined as the ascending type (type A); (2) T1-2N2-3, defined as the descending type (type D); and (3) T3-4N2-3, defined as the mixed type.3, 4 A number of studies have shown that patients with type D NPC are more likely to develop distant metastases than patients with type A NPC, and N stage is an independent risk factor for type D NPC distant metastasis.5, 6 With the development of modern medical technology, the prognosis of NPC has gradually improved, but 15%–30% of NPC patients still have distant metastasis after radical treatment,7, 8 and it is one of the main reasons for treatment failure.
As radiotherapy and imaging technologies continue to advance, radiomics has begun to be recognized for its value in studying the clinical characteristics and survival outcomes of patients with different NPC subtypes. A number of studies have shown that the multiparameter MRI radiomics Nomogram prediction model can better predict the response and prognosis of NPC treatment.9-11 Due to the biological characteristics of the tumor itself, the boundary between the tumor and normal tissue is not clear, and there is no consensus regarding the most suitable method of segmentation.12 Patients with locally advanced NPC have complex skull base structures and many confounding factors. There is currently no radiomic signature of lymph nodes to predict distant metastasis of type D NPC. In this study, a multimodal CT and MRI approach combined with radiomics technology was used to extract the imaging features of positive lymph nodes in patients with type D NPC using artificial intelligence methods, and a nomogram model was constructed to study the value of a multimodal combination of CT and MRI.13 The intrinsic relationship found between radiomics features and NPC prognosis provides a scientific basis for the precise and individualized treatment of NPC.
2 RESULTS
2.1 Screening of radiomics features
To prevent the classification results from being biased towards untransferred samples, images with few distant transfer samples were ignored, and the training set (102 cases/144 cases) was data balanced using a SMOTE before building the training model, resulting in a 196-case SMOTE data set. Model parameters were determined by five repeats 10-folds cross-validation on the SMOTE data set, and the importance ranking was based on the features of the model with the optimal lambda. LASSO regression further filters the features. Finally, the top five features were selected from the CT and MRI radiomic features respectively, and features with a p-value > 0.05 were excluded and finally included in the logistic regression model. Figure S1-3 for specific function names and model parameters.
2.2 Construction and verification of the model
We use the selected features to build a logistic regression model. Type D NPC uses the SMOTE data set training set (196 cases) to train the model, and the model is internally validated by five repeats 10-folds cross-validation, the original data set (102 cases/144 cases) and the test set (42 cases/144 cases) to test the generalization ability and portability of the model.
2.2.1 CT prediction model
To determine the model parameters of the CT radiomics prediction model and test the reproducibility of the model, five repeats 10-folds cross-validation was performed using 196 samples in the SMOTE data set, and the average receiver operating characteristic (ROC) curve was constructed. The average area under curve (AUC) was 0.859 (95% confidence interval [CI]: 0.809–0.910) (Figure 1A). The C-index of the trained model on the original data set (102 cases/144 cases) was 0.845 (95% CI: 0.76–0.93), the sensitivity was 0.679, the specificity was 0.824, and the accuracy was 0.784 (Figure 1B,D). The model was verified by 42 cases/144 cases samples in the original data set. The C-index was 0.771 (95% CI: 0.605–0.938), the sensitivity was 0.545, the specificity was 0.742, and the accuracy was 0.69 (Figure 1C,D).

2.2.2 MRI prediction model
With the above method, the SMOTE data set underwent five repeats 10-folds cross-validation with an average AUC of 0.841 (95% CI: 0.786–0.896) (Figure 2A). The C-index on the original data training set was 0.815 (95% CI: 0.73–0.9), with a sensitivity of 0.714, a specificity of 0.73, and an accuracy of 0.725 (Figure 2B,D). The model is validated with the original data validation set. The C-index was 0.73 (95% CI: 0.578–0.883), the sensitivity was 0.364, the specificity was 0.71, and the accuracy was 0.619 (Figure 2C,D).

2.3 Construction and verification of a multimodal radiomics model
After feature screening, a multimodal radiomics model was established combining CT and MRI features. Using the statistical methods described above, the following results were obtained: Average ROC curves were constructed using 196 samples from the SMOTE data set for five repeats 10-folds cross-validation. The average AUC was 0.873 (95% CI: 0.797–0.949) (Figure 3A). The C-index for the training set was 0.91 (95% CI: 0.848–0.972), with a sensitivity of 0.75, a specificity of 0.932, and an accuracy of 0.882 (Figure 3B,D). The C-index for the validation set was 0.815 (95% CI: 0.664–0.967), with a sensitivity of 0.545, a specificity of 0.903, and an accuracy of 0.81 (Figure 3C,E).

In conclusion, the AUC values, sensitivity, specificity and accuracy of the three models were compared and analyzed, and the multimodal radiomics model was significantly better than the CT or MRI single model in these aspects (Table 1).
Characteristic | The synthetic minority oversampling technique (SMOTE) data set | The trained model | The verified model | ||||||
---|---|---|---|---|---|---|---|---|---|
The average AUC | C-index | sensitivity | specificity | accuracy | C-index | sensitivity | specificity | accuracy | |
CT Model | 0.859 | 0.845 | 0.679 | 0.824 | 0.784 | 0.771 | 0.545 | 0.742 | 0.69 |
MRI Model | 0.841 | 0.815 | 0.714 | 0.73 | 0.725 | 0.73 | 0.364 | 0.71 | 0.619 |
Multimodal model | 0.873 | 0.91 | 0.75 | 0.932 | 0.882 | 0.815 | 0.545 | 0.903 | 0.81 |
2.4 Construction and verification of a radiomics nomogram
Finally, a nomogram was established, as shown in Figure 4A. The discriminative power of the nomogram model was assessed using the calibration curves of the training and validation sets (Figure 4B,C). The decision curve indicated that the application of this model in clinical decision-making could lead to greater benefit for the patient (Figure 4D).

3 DISCUSSION
In this study, the median follow-up time for type D NPC was 81.9 months, and the proportion of patients with distant metastasis was 27.1%. Among the patients with type D NPC, 71.8% of the patients with distant metastases within 5 years of the end of treatment. Compared with other head and neck cancers, the incidence of distant metastasis from NPC is higher.14 Approximately 10% of patients have distant metastases at the time of diagnosis,15 and approximately 15%–20% of patients with NPC have distant metastases after radical treatment.7, 8 Zhang et al. retrospectively analyzed 105 patients with NPC and found that the 3-year metastasis rate was 25%.16 Chen et al. retrospectively analyzed 717 patients with NPC over a median follow-up time of 31 months and found that 11.9% of patients had distant metastasis after treatment.17 Yao et al. retrospectively analyzed 5194 patients with stage III-IVA NPC. The median follow-up time of type D NPC patients was 51.9 months. The results showed that the 5-year DMFS of type D NPC was 82.2%.5 Li et al. retrospectively analyzed 520 nonmetastatic NPC patients over a median follow-up time was 88.4 months, and they found that the 5-year distant metastasis rate was 15.1%.18 Fan et al. retrospectively analyzed 628 patients with stage III-IV NPC at Sichuan Cancer Hospital over a median follow-up time was 57.4 months. Their results showed that the 5-year DMFS of type D NPC patients was 77.7%.6 The present study is similar to the results of Fan et al. and may be related to the fact that all patients in the study had lymph node metastases, as many previous studies have shown that lymph node positivity is an independent risk factor for distant metastases.5-8, 19
Distant metastasis is one of the main reasons for treatment failure in NPC. The most common sites of NPC metastasis are the bone (20%), lung (13%), and liver (9%).20 The majority of patients with distant metastases after treatment in this study had no obvious clinical manifestations, with imaging abnormalities being the main reason for treatment initiation. The bone, liver, and lungs are the most common metastatic sites, which is consistent with the results reported in the literature by Abdel et al.20 If the risk of distant metastasis can be effectively predicted in the early stage, individualized and precise treatment of these patients may reduce the incidence of distant metastasis after treatment. Moreover, patient follow-up adherence could be improved so that distant metastases can be detected as soon as possible and timely treatment can be provided with the hopes of improving prognosis.
Imaging modalities have the characteristics of repeatability, reproducibility, and noninvasiveness. A large amount of information can be extracted from CT, MRI, PET, and other imaging approaches, and the use of high-throughput technology via radiomics can help physicians to perform deeper data mining analyses of massive amounts of imaging data to make the most accurate predictions of diagnosis and prognosis.21, 22 Yao et al. retrospectively analyzed 217 NPC patients, randomly dividing 153 patients into the training set and 64 patients into the validation set, and six features were selected from 1300 CT radiomics features: hglre, srhgle, inverse variance, energy, roundness, and compactness; these features could be used as an effective method to distinguish type A from type D NPC.23 And the descending type is more prone to distant metastasis.6 Therefore, this study explored the relationship between radiomics and the distant metastasis of type D NPC.
For type D NPC patients, a total of six features (normalize_glcm_ClusterProminence_CT, speclenoise_glszm_ZoneEntropy_CT, laplaciansharpening_ngtdm_Busyness_CT, normalize_firstorder_Maximum_T1, normalize_glszm_SmallAreaLowGrayLevelEmphasis_T2, curvatureflow_glrlm_ShortRunLowGrayLevelEmphasis_T1) were selected via radiomics and found to be clearly related to distant metastasis. Peng et al. retrospectively included 85 patients with stage III-IV NPC who received standard treatment and analyzed the PET/CT images of the patients before treatment. The results suggested that SGE_GLGLM is a predictor of recurrence and metastasis.24 Zhang et al. retrospectively analyzed 176 NPC patients, integrated radiomics and clinical data, and constructed a distant metastasis MRI-based model (DMMM) for NPC.25 The results showed that the 5-year overall survival rate of the high-risk group of patients who received concurrent radiotherapy and chemotherapy was lower (p < 0.001) than that of the low-risk group, and the treatment decision could be improved by distinguishing between high-risk and low-risk patients with long-distance conversion.25 In addition, five repeats 10-folds cross-validation was used in the training set of type D NPC radiomics, and the results showed good model repeatability. Internal validation on the training set of the original data set, and the validation set all show that the model has good portability and generalization ability.
In summary, the C-index, sensitivity, specificity, and accuracy of the combined CT and MRI radiomics model were better than those of the single CT or MRI radiomics models, and the multimodal model showed significant improvement in the ability to predict the risk of possible distant metastasis, and the predicted results are more accurate. However, the sample size of this study was relatively small, and the selected population was limited to patients treated at Sichuan Cancer Hospital, so partial selection bias may exist; the effectiveness needs to be further verified by large-scale multicenter prospective clinical trials.
The limitations of this study were as follows. First, this study was a retrospective case study. The data were collected through the electronic medical record system of the hospital, and the imaging data were collected through the PACS system of our hospital. However, the data in this study were complete, and the same CT and MRI machines were used for all procedures. The scanning data were complete, authentic and reliable, which was also one of the advantages of this research. Second, the sample size of this study was relatively small, so there may have been selection bias in the statistical process. Thus, the effectiveness of this model needs to be verified further in large-scale prospective studies. Finally, the median follow-up time in this study was long, which may also be one of the reasons why there was a high proportion of patients with distant metastases reported in this study, which makes the experimental results more reliable.
In conclusion, the proposed prediction model, which was developed based on a multimodal radiomics approach, showed good performance in predicting distant metastasis from type D NPC and can be used as a reference to guide clinical treatment and predict prognosis.
4 MATERIALS
4.1 Research object
This study included patients who were diagnosed with type D NPC at Sichuan Cancer Hospital from January 2009 to December 2014. The inclusion criteria for this study were as follows: (1) age 18–70 years; (2) clear pathological diagnosis of NPC; (3) staging according to the eighth edition of the American Joint Committee on Cancer (AJCC) staging system; (4) Eastern Cooperative Oncology Group (ECOG) score of 0–2 points; (5) no previous history of head and neck surgery; and (6) complete medical records and clear CT and MRI sequence images. The exclusion criteria were as follows: (1) patients with a second primary head and neck tumor and (2) patients who could not undergo examinations and complete follow-up on schedule.
According to the above inclusion and exclusion criteria, a total of 144 patients with type D NPC were included (number of nonevents: number of events = 105:39), the median follow-up time was 81.9 months, and 39 patients developed distant metastasis, and the median The period is 33 months. The original data set was randomly divided into training and validation groups (102:42). Clinical baseline values are shown in Table 2. The SMOTE data set was used to train the model and determine model parameters (Type D NPC nonevent: event = 112:84).
Characteristic | d-NPC |
---|---|
Age (years) (mean ± SD) | 46.42 ± 10.305 |
Gender (male/female) | 96/48 |
Cigarette smoking (yes/no) | 50/94 |
Alcohol consumption (yes/no) | 29/115 |
Family history of cancer (yes/no) | 17/127 |
T stage (N%) | |
1 | 21 (14.6%) |
2 | 123 (85.4%) |
N stage (N%) | |
2 | 103 (71.5%) |
3 | 41 (28.5%) |
TNM stage | |
Ⅲ | 103 (71.5%) |
ⅣA | 41 (28.5%) |
Treatment modality | |
CCRT | 26 (18.1%) |
IC + CCRT | 83 (57.6%) |
CCRT + AC | 16 (11.1%) |
IC + CCRT + AC | 18 (12.5%) |
RT | 1 (0.7%) |
Distant metastasis (yes/no) | 39/105 |
- Abbreviations: AC, adjuvant chemotherapy; CCRT, concurrent chemoradiotherapy; D-NPC, type D NPC; IC, induction chemotherapy; NPC, nasopharyngeal carcinoma; RT, radiotherapy.
4.2 Data collection
For the CT examination, the Brilliance Big Bore 16-row CT analog positioning scanner from Philips, the Netherlands, was used. The patients did not need to fast before the examination, and each patient was placed in the supine position. The scanning parameters were as follows: layer thickness 3 mm, reconstruction layer thickness 3 mm, layer distance 3 mm, tube voltage 120 kV, and tube current 200 mAs. Iopamidol was used as the contrast agent and was administered via forearm intravenous injection; the injection dose was 1.5 ml/kg, and the rate was controlled in the range of 1.5–2.0 ml/s. A head and neck scan was performed during the arterial phase after the injection of the contrast agent. The scanning range included the top of the skull to 2 cm below the clavicle.
For the MRI examination, a MAGNETOM Avanto 1.5T MRI scanner from Siemens, Germany, was used. The patients did not need to fast before the examination, and each patient was placed in the supine position. Gadopentetate meglumine (Gapensuanpu'an Zhusheye) was used as the contrast agent for enhanced sequence scanning, which was administered through the vein of the forearm. The total injection dose was 18 ml, and the injection rate was 1.5 ml/s. The scanning parameters are shown in Table S1. The scanning range included the top of the skull to 2 cm below the clavicle.
Multiparameter MRI and enhanced localization CT scans of each patient were performed within 2 weeks before any antitumor treatment. Enhanced localization CT, T1-weighted imaging (T1WI), T2-weighted imaging fat suppression (T2WI FS), and contrast-enhanced T1WI fat suppression (CE T1WI FS) sequences were retrieved from the PACS system of our hospital, which were exported, combined and imported into MIM Maestro (www.mimsoftware.com, version 7.0.6) to perform image registration for enhanced localization CT and MRI. Positive lymph nodes (gross tumor volume of node, GTVnd) of NPC were selected as the regions of interest (ROIs). During image registration, enhanced localization CT was selected as the benchmark, and the window width and frame position under soft tissue conditions were selected. The CT and MRI of all enrolled patients and the layer-by-layer delineation of tumor-positive lymph nodes on the two fusion images, as well as the image registration and delineation, were completed by at least two experienced radiologists.
4.3 Imaging diagnostic criteria for positive lymph nodes
The imaging diagnostic criteria for tumor-positive lymph nodes were as follows: (1) the shortest diameter of the posterior pharyngeal lymph node in the lateral group was ≥0.5 cm; (2) the shortest diameter of the cross-section of the submandibular lymph node and submental lymph node was ≥1.1 cm, while the shortest diameter of the cross-section of the other cervical lymph nodes was ≥1 cm; (3) extracapsular invasion was present; (4) central necrosis or ring enhancement was present; (5) the medial group had retropharyngeal lymph nodes of any size; and (6) the number of lymph nodes was ≥3 or there were asymmetrically prominent clustered lymph nodes.26-29
4.4 Feature extraction and analysis
The calculation and extraction of radiological features were carried out according to the guidelines of the Image Biomarker Standardization Initiative.30 After all the delineated images were processed, they were imported into the United Imaging Intelligence (uAI, Version: 430 sp1) Uploader for recognition, and the data were unified into the data set corresponding to uAI. For feature extraction, uAI software was used, with Original, Box Mean, Additive Gaussian Noise, Binomial Fifteen and filters including Blur Image, Curvature Flow, Boxsigma Image, LoG, wavelet, Normalize, Laplacian Sharpening, Discrete Gaussian, Mean, Speckle Noise, Recursive Gaussian, and Shot Noise, forming 15 image categories, which were completed in uAI software preliminary feature extraction. Seven types of features were extracted: first-order features (also called intensity features, first order), morphological features (shape), gray level co-occurrence matrix (GLCM) features, gray level run length matrix (GLRLM) features, gray level size zone matrix (GLSZM) features, gray level dependence matrix (GLDM) features and neighboring gray tone difference matrix (NGTDM) features. The full radiomic features are presented in Table S2-3.
4.5 Follow-up
The main endpoint of this study was distant metastasis, which was defined as another lesion in addition to the primary tumor identified on CT, MRI, bone scan, PET-CT or other imaging, with or without pathological biopsy. After treatment, all patients were followed up every 3–6 months for the first 1–3 years, every 6 months or a year for years 4–5, and once a year for the 5th year and thereafter. Routine consultation, physical examination, routine blood tests, biochemistry, peripheral blood EBV DNA copy number detection, enhanced MRI of the nasopharynx and neck, nasopharyngoscopy, chest CT scan, abdominal ultrasound, or upper abdominal CT were performed during each review along with a plain scan and whole-body bone scan. The date of the last follow-up visit for this study was December 31, 2020. In this study, the median follow-up time of type D NPC was 81.9 months.
4.6 Statistical methods
Subsequently, Features are filtered using methods such as LASSO regression, and build a logistic regression model. The established model was repeatedly verified in the SMOTE data set (the processed data set was the SMOTE data set) and the original data set. The ROC curve and calibration curve were drawn, and the DeLong value and confidence interval were calculated.31 Finally, the nomogram prediction model was established, and decision curve analysis was used to judge the clinical applicability of the multimodal radiomics combined model. All statistical analyses in this study were completed by R software (https://www.r-project.org/, version 4.0.3), mainly using “survival” package, “rms” package, “glmnet” package and “pROC” package (p < 0.05 indicates a statistically significant difference).
AUTHOR CONTRIBUTIONS
Qin Yang: Data curation (lead); investigation (lead); methodology (lead); project administration (lead); writing – original draft (lead). Yu Chen: Formal analysis (equal); investigation (equal); methodology (equal); writing – original draft (equal). Rui Huang: resources (equal); writing – review and editing (equal). Wenya Yin: Data curation (equal); investigation (equal). Shuang Zhang: Data curation (equal); investigation (equal). Qianlong Tang: Data curation (equal). Xinyue Chen: Formal analysis (equal); methodology (equal); software (equal); writing – original draft (equal). Jinyi Lang: project administration (equal); resources (equal); writing – review and editing (equal). Gang Yin: Project administration (equal); resources (equal); software (equal); writing – review and editing (equal). Peng Zhang: Conceptualization (equal); project administration (equal); resources (equal); software (equal); writing – review and editing (equal). All authors have read and approved the manuscript.
ACKNOWLEDGMENTS
This work was supported by the Key R&D Program of Sichuan Science and Technology Department (No. 2019YFG0185), Key R&D Program of Liangshan Science and Technology Bureau (No. 18YYJS0094), Beijing Medical and Health Public Welfare Foundation (No. YWJKJJHKYJJ-B17483), Key R & D support Plan of Chengdu Science and Technology Bureau, item (No. 2021-YF05-02382-SN) and The Science and Technology Project of The Health Commission of Sichuan (No. 20PJ114).
CONFLICT OF INTEREST
Xinyue Chen is an employee of Siemens Healthineers but she has no potential relevant financial or nonfinancial interests to disclose. The remaining authors have no conflicts of interest to declare.
ETHICS STATEMENT
This is an observational study, so the Sichuan Cancer Hospital Research Ethics Committee decided to waive the requirement to get informed consent (SCCHEC-02-2022-160).
Open Research
DATA AVAILABILITY STATEMENT
The data supporting this study are available from the corresponding author upon reasonable request.