Noninvasive diagnosis of pulmonary nodules using a circulating tsRNA-based nomogram
Abstract
Evaluating the accuracy of pulmonary nodule diagnosis avoids repeated low-dose computed tomography (LDCT)/CT scans or invasive examination, yet remains a main clinical challenge. Screening for new diagnostic tools is urgent. Herein, we established a nomogram based on the diagnostic signature of five circulating tsRNAs and CT information to predict malignant pulmonary nodules. In total, 249 blood samples of patients with pulmonary nodules were selected from three different lung cancer centers. Five tsRNAs were identified in the discovery and training cohorts and the diagnostic signature was established by the randomForest algorithm (tRF-Ser-TGA-003, tRF-Val-CAC-005, tRF-Ala-AGC-060, tRF-Val-CAC-024, and tiRNA-Gln-TTG-001). A nomogram was developed by combining tsRNA signature and CT information. The high level of accuracy was identified in an internal validation cohort (n = 83, area under the receiver operating characteristic curve [AUC] = 0.930, sensitivity 100.0%, specificity 73.8%) and external validation cohort (n = 66, AUC = 0.943, sensitivity 100.0%, specificity 86.8%). Furthermore, the diagnostic ability of our model discriminating invasive malignant ones from noninvasive lesions was assessed. A robust performance was achieved in the diagnosis of invasive malignant lesions in both training and validation cohorts (discovery cohort: AUC = 0.850, sensitivity 86.0%, specificity 81.4%; internal validation cohort: AUC = 0.784, sensitivity 78.8%, specificity 78.1%; and external validation cohort: AUC = 0.837, sensitivity 85.7%, specificity 84.0%). This novel circulating tsRNA-based diagnostic model has potential significance in predicting malignant pulmonary nodules. Application of the model could improve the accuracy of pulmonary nodule diagnosis and optimize surgical plans.
Abbreviations
-
- AIS
-
- adenocarcinoma in situ
-
- AUC
-
- area under ROC curve
-
- CI
-
- confidence interval
-
- CT
-
- computed tomography
-
- CTC
-
- circulating tumor cell
-
- ctDNA
-
- circulating tumor DNA
-
- FS
-
- frozen section
-
- GGN
-
- ground glass nodule
-
- GGO
-
- ground glass opacity
-
- IA
-
- invasive adenocarcinoma
-
- LDCT
-
- low-dose computed tomography
-
- LUAD
-
- lung adenocarcinoma
-
- Lung-DL
-
- lung-deep learning
-
- MIA
-
- minimally invasive adenocarcinoma
-
- miRNA
-
- microRNA
-
- NSCLC
-
- non-small-cell lung cancer
-
- OR
-
- odds ratio
-
- qPCR
-
- quantitative PCR
-
- ROC
-
- receiver operating characteristic
-
- tiRNA
-
- tRNA half
-
- tRF
-
- tRNA-derived fragment
-
- tsRNA
-
- tRNA-derived small RNA
1 INTRODUCTION
Currently, pulmonary nodules are usually detected by LDCT screening and are subjected to invasive diagnostic procedures (e.g. surgical, percutaneous, and bronchoscopic biopsies).1 According to the Lung Imaging Reporting and Data System (lung-RADS) classification and guidelines, nodules larger than 6 mm are detected as positive. However, the high rate of false positive results remains a main clinical challenge. Nowadays, distinguishing the small percentage of malignant nodules from benign nodules is mainly determined by clinical experience.2 Nevertheless, the rate of missed diagnosis and over-diagnosis are significantly different in hospitals of different levels, resulting in repeated LDCT/CT scans or delayed diagnosis of lung cancer. Current clinical tumor markers such as carcinoembryonic antigen (CEA), carbohydrate antigen 19-9 (CA19-9), and neuron-specific enolase (NSE) are not sensitive enough for the diagnosis of malignant nodules.3 Furthermore, although malignant nodules and tumor categories can be detected using invasive approaches, these invasive approaches could cause complications such as infection and pneumothorax, and even death. Therefore, screening for new methods to improve the diagnosis of malignant nodules is crucial.
Sublobar resection has been reported to be sufficient for AIS and MIA, while lobectomy is the preferred therapy for IA.4 Studies have shown that sublobar resection could achieve similar survival rates when compared with lobectomy in early-stage NSCLC with a tumor size ≤2 cm.5, 6 Choosing patients for sublobar resection is mainly dependent on intraoperative frozen section diagnosis or personal judgment on CT images. However, the diagnostic accuracy of the intraoperative frozen section is sometimes insufficient due to the difficulty in obtaining materials, especially for tumors with size ≤1 cm in maximum diameter.7 In addition, no feasible and precise method was applied to distinguish AIS and MIA from IA before operation. Previous studies have tried to predict IA by combining semantic-radiomic features and internal growth of LUAD.8, 9 However, novel methods are needed to optimize surgical decision-making.
Transfer RNA-derived small RNAs are fragments of precursor or mature tRNAs. Although tsRNAs were originally thought to be degradation debris, recent studies have revealed that tsRNAs are organ-specific, highly conserved, and structure-dependent.10, 11 Additionally, tsRNAs have been found dysregulated in various types of cancer and can be detected in 10 kinds of body fluids.12 Recent studies have indicated that individual tsRNA can serve as a biomarker in breast cancer, colorectal cancer, and gastric cancer.13-15 Jin et al. established a novel class of tsRNA signature for diagnosis and prognosis of pancreatic cancer.16 Based on this evidence, tsRNAs could have potential value in the diagnosis and treatment of NSCLC. Furthermore, compared with CTCs and ctDNA, which were widely investigated for noninvasive cancer early detection and survival prediction, tsRNAs have higher abundance and easier technical detection.17
In order to distinguish malignant pulmonary nodules from benign nodules and predict tumor invasive components, we established a five-tsRNA diagnostic model through high-throughput sequencing and RT-qPCR. Combining the five tsRNAs, clinical characteristics, and imaging data, we established a novel, systematic, and reliable diagnostic model for pulmonary nodules and validated its diagnostic performance for predicting malignant nodules and tumor invasive components in other two cohorts. In general, this circulating five-tsRNA based model has potential ability in prediction of malignant pulmonary nodules by a cost-effective, noninvasive approach and will be of benefit for optimizing surgical plans.
2 MATERIALS AND METHODS
2.1 Ethics approval and consent to participate
This study was approved by the Ethics Committee of Jiangsu Cancer Hospital (No. 2020129), Xuzhou Central Hospital (XYFM2020002), and Taixing People's Hospital (KD2020069). All participants provided written informed consent before phlebotomy and surgery. The study was carried out in accordance with the Declaration of Helsinki.
2.2 Study design and participants
The flowchart of the study is shown in Figure 1. A multicenter, retrospective cohort was enrolled using plasma samples collected from patients with malignant and benign pulmonary nodules from three thoracic surgery departments in China, between May 2020 to August 2021. Samples from the discovery cohort (n = 100) and internal validation cohort (n = 83) were collected from The Affiliated Cancer Hospital of Nanjing Medical University, while specimens collected between March 2021 and January 2022 from Xuzhou Central Hospital and Taixing People's Hospital served as the external validation cohort (n = 66). It should be emphasized that our research did not interfere with the treatment of patients. The characteristics of the 249 participants are shown in Table S1.

Inclusion criteria included 18–85 years old, single pulmonary nodule, detected by LDCT with maximum diameter ≤2 cm, presented with solid, GGNs, pure GGNs, receiving surgery, and postoperative pathological diagnosis. Exclusion criteria included pregnant or lactating women, other cancer history, multiple pulmonary nodules, metastasis nodules, and receiving neo-adjuvant chemotherapy or radiotherapy, or receipt of transfusion within 30 days prior to enrollment. In addition, clinical information including LDCT imaging reports and pathology reports were collected. Blood samples were centrifuged within 1 h of collection for 10 min at 3500 rpm at 4°C to harvest plasma samples before storing at −80°C for analysis.
2.3 High-throughput sequencing
According to inclusion criteria, five temporarily untreated LUAD patients were selected and their tumor tissues and adjacent tissues were collected for tRF and tiRNA sequencing. An Agilent Bioanalyzer 2100 (Agilent Technologies) was used to quantify the sequencing libraries. The sequencing analysis was carried out using the Illumina NextSeq 500 platform according to the manufacturer's protocol.
2.4 RNA extraction
Total RNAs of blood samples and tissues were extracted using TRIzol LS Reagent (Invitrogen) according to the manufacturer's protocol, and 25 fmol of Caenorhabditis elegans cel-miR-39-3p RNA was added to each sample as a spike-in control.14, 15 Total RNAs were extracted from 200 μL of each plasma sample and eluted with 10 μL RNase-free water. RNA concentration and quality were measured on an Agilent Bioanalyzer 2100. All total RNAs were stored at −80°C within 3 days before analysis.
2.5 Reverse transcription-qPCR
Total RNA (100 ng) was used for reverse transcription using Bulge-Loop MicroRNA qRT-PCR Primer Sets (one RT primer and a pair of qPCR primers for each set), specially designed for tsRNAs (RiboBio). According to the protocol, the reverse transcription reaction assays were carried out at 42°C for 60 min followed by 70°C for 10 min. Then quantitative real-time PCR was carried out to analyze tsRNA expression on a Q6 RT-qPCR machine. In brief, the RT-qPCR assay was run in 96-well and 384-well plates at 95°C for 1 min, followed by 40 cycles at 95°C for 10 s, 60°C for 30 s, and finally 95°C for 15 s, 60°C for 60s, 95°C for 15 s. The expression of tsRNA was normalized to cel-miR-39-3p. Relative expression levels were analyzed by the −ΔCt method.
2.6 Intraoperative FS diagnosis
Two pathologists with different levels of clinical experience (senior pathologist has worked as a cancer subspecialty pathologist for over 20 years, junior pathologist has worked as a cancer subspecialty pathologist for 2 years) were instructed to independently diagnose IA by intraoperative FS diagnosis during operations. Neither pathologist was involved in previous evaluation of patients.
2.7 Comparison with predictive models
The Mayo Clinic model for malignancy in pulmonary nodules calculated the malignancy probability as follows18: Probability of malignancy = ex/(1 + ex), where x = −6.8272 + (0.0391 × age) + (0.7917 × smoking) + (1.3388 × cancer) + (0.1274 × nodule diameter) + (1.0407 × speculation) + (0.7838 × upper lobe).
The specific Brock University cancer prediction equation is as follows19: Cancer probability = 100 × (e(Log_odds) / (1 + e(Log_odds))), where Log_odds = (0.0287 × (age − 62)) + sex (female = 0.6011, male = 0) + family_history_lung_Ca (0.2961) + emphysema (0.2953) − (5.3854 × ((nodule_size/10) − 0.5–1.58113883)) + nodule_type + nodule_upper_lung (0.6581) − (0.0824 × (nodule_count − 4)) + speculation (0.7729) − 6.7892.
Our previous study constructed the CT-based Lung-DL model20 based on CT information to predict MIA and IA. To further evaluate the diagnostic performance of our nomograms, the Lung-DL model was also used for comparison.
2.8 Statistical analysis and in silico bioinformatics analysis
Statistical analysis was performed using R (version 4.1.2) and GraphPad Prism 8 statistical software. Most graphs contain graphs for each data point and show the mean ± SD. To test the significance, a t-test was carried out, and the p value was denoted by an asterisk. Point-biserial correlation is used in order to reflect the correlation between continuous variables and binary categorical variables. Analysis of differential expression of tsRNA between five tumor tissues and five adjacent tissues was undertaken with the DESeq2 R package. The circulating tsRNA score was constructed according to the randomForest R package.21 The ROC analyses were carried out using the pROC R package. Construction of the malignant nodules probability nomogram and invasive adenocarcinoma probability nomogram, as well as calibration curves for the above, was undertaken using the rms R package.22
3 RESULTS
3.1 Development and validation of circulating tsRNA score for malignant nodule diagnosis
The tRF and tiRNA sequencing was carried out in five tumor samples and five adjacent tissues from five temporarily untreated early-stage LUAD patients, who were strictly selected according to inclusion criteria, to explore tsRNAs specifically expressed in early-stage LUAD patients. Through differential expression analysis of the tsRNA expression in tumor tissues and adjacent tissues, a total of nine dysregulated tsRNAs were selected based on fold changes and p values (p value < 0.001 and fold-change > 10; Figure 2A). To exam the diagnostic performance of malignant nodules, the expression levels of nine dysregulated tsRNAs were validated using RT-qPCR in the discovery cohort, including 86 cases of malignant nodules and 14 cases of benign nodules confirmed in the subsequent diagnosis, assessed by two senior pathologists independently. Each RT-qPCR product of the five tsRNAs was confirmed by cloning sequences that contained the full-length sequence (Figure 2B). Among the nine tsRNA candidates, tRF-Ser-TGA-003, tRF-Val-CAC-005, tRF-Ala-AGC-060, tRF-Val-CAC-024, and tiRNA-Gln-TTG-001 were found significantly upregulated (p < 0.05, Student's t-test) in the plasma of the malignant nodules group (Figure 2C).

In order to accelerate the capability of circulating tsRNAs to detect malignant pulmonary nodules, we focused on the five upregulated tsRNAs in malignant pulmonary nodules and established a five-tsRNA diagnostic model for detection of malignant nodules. Combining the five upregulated tsRNAs, the circulating tsRNA score was calculated using the following formula: Circulating tsRNA score = 10.631148 × relative expression of Ser-TGA-003 + 2.310571 × relative expression of Val-CAC-005 + 6.588001 × relative expression of Ala-AGC-060 + 6.588001 × relative expression of Val-CAC-024 + 2.242256 × relative expression of Gln-TTG-001 using the randomForest R package (Figure 2D). The results indicated that circulating tsRNA scores were significantly different between patients with benign and malignant pulmonary nodules (Figure 3A) and was associated with malignant nodules (Tables 1 and 2; OR 17.00; 95% CI, 7.348–36.300). To further evaluate the accuracy of detection of the circulating tsRNA score, ROC curves were generated and AUCs were calculated. The AUC for the circulating tsRNA score was 0.890, with a sensitivity of 79.1% and a specificity of 92.9%, which was better than any identified tsRNA. The AUCs were 0.873, 0.767, 0.887, 0.682, and 0.686 for tRF-Ser-TGA-003, tRF-Val-CAC-005, tRF-Ala-AGC-060, tRF-Val-CAC-024, and tiRNA-Gln-TTG-001, respectively (Figure 3B).

Characteristic | All patients | Nodules | p value | |
---|---|---|---|---|
Malignant | Benign | |||
Age (years) | ||||
<65 | 63 (63.0) | 51 (59.3) | 12 (85.7) | 0.0577 |
≥65 | 37 (37.0) | 35 (40.7) | 2 (14.3) | |
Gender | ||||
Male | 59 (59.0) | 52 (60.5) | 7 (50.0) | 0.5451 |
Female | 41 (41.0) | 34 (39.5) | 7 (50.0) | |
Smoking | ||||
Ever | 38 (38.0) | 33 (38.4) | 5 (35.7) | 0.8493 |
Never | 62 (62.0) | 53 (61.6) | 9 (64.3) | |
Maximum diameter of tumor (cm) | ||||
<1.2 | 67 (67.0) | 55 (64.0) | 12 (85.7) | 0.1083 |
≥1.2, <2 | 33 (33.0) | 31 (36.0) | 2 (14.3) | |
GGO component | ||||
Pure | 45 (45.0) | 40 (46.5) | 5 (35.7) | 0.0307* |
Part-solid | 37 (37.0) | 34 (39.5) | 3 (21.4) | |
Solid | 18 (18.0) | 12 (13.9) | 6 (42.9) | |
Lesion location | ||||
Right upper | 32 (32.0) | 27 (31.4) | 5 (35.7) | 0.8344 |
Right middle | 20 (20.0) | 17 (19.8) | 3 (21.4) | |
Right lower | 8 (8.0) | 6 (7.0) | 2 (14.3) | |
Left upper | 18 (18.0) | 16 (18.6) | 2 (14.3) | |
Left lower | 22 (22.0) | 20 (23.2) | 2 (14.3) | |
Circulating tRF score | ||||
High | 24 (24.0) | 12 (14.0) | 12 (85.7) | <0.0001* |
Low | 76 (76.0) | 74 (86.0) | 2 (14.3) |
- Note: p values calculated by Fisher's exact test.
- Abbreviations: GGO, ground glass opacity; tRF, tRNA-derived fragment.
- *p < 0.05.
Variable | Odds ratio | 95% confidence interval | p value |
---|---|---|---|
Age | 0.243 | 0.051–1.153 | 0.075 |
Gender | 1.529 | 0.0492–4.750 | 0.462 |
Smoking | 1.121 | 0.346–3.634 | 0.849 |
Maximum diameter | 0.296 | 0.062–1.408 | 0.126 |
GGO component | |||
Presence | |||
Pure | 0.706 | 0.1571–3.1718 | 0.935 |
Absence | 0.176 | 0.0381–0.8184 | 0.027* |
Lesion location | |||
Right upper | |||
Right middle | 1.049 | 0.222–4.967 | 0.952 |
Right lower | 0.556 | 0.086–3.580 | 0.536 |
Left upper | 1.053 | 0.844–1.314 | 0.644 |
Left lower | 1.852 | 0.325–10.538 | 0.487 |
Circulating tRF score | 17.000 | 7.348–36.300 | <0.001* |
- Abbreviations: GGO, ground glass opacity; tRF, tRNA-derived fragment.
- *p < 0.05.
To validate the value of the circulating tsRNA score for malignant lung nodule detection, we prospectively collected clinical data and peripheral blood samples from 83 patients with pulmonary nodules in The Affiliated Cancer Hospital of Nanjing Medical University as an internal validation cohort. The enrollment criteria are the same as the discovery cohort. The specific expression level of the five tsRNAs are shown in Figure S1. Consistently, the circulating tsRNA score achieved a robust performance in the internal validation cohort (Figure 3C,D, Tables 3 and 4; AUC = 0.801; 95% CI, 1.588–15.177; sensitivity 69.2%, specificity 83.3%). Furthermore, an external validation cohort was collected from Xuzhou Central Hospital and Taixing People's Hospital, containing clinical data and peripheral blood samples of 66 patients with pulmonary nodules. The enrollment criteria were the same as the discovery cohort. As expected, the external validation cohort also revealed robust detection ability (Figure 3E,F, Tables 5 and 6; AUC = 0.765; 95% CI, 2.416–39.576; sensitivity 83.0%, specificity 76.9%).
Characteristic | All patients | Nodules | p value | |
---|---|---|---|---|
Malignant | Benign | |||
Age (years) | ||||
<65 | 44 (53.0) | 34 (52.3) | 10 (55.6) | 0.8070 |
≥65 | 39 (47.0) | 31 (47.7) | 8 (44.4) | |
Gender | ||||
Male | 57 (68.7) | 48 (73.8) | 9 (50.0) | 0.0536 |
Female | 26 (31.3) | 17 (26.2) | 9 (50.0) | |
Smoking | ||||
Ever | 32 (38.6) | 27 (41.5) | 5 (27.8) | 0.2885 |
Never | 51 (61.4) | 38 (58.5) | 13 (72.2) | |
Maximum diameter (cm) | ||||
<1.2 | 48 (57.8) | 35 (53.8) | 13 (72.2) | 0.1624 |
≥1.2, <2 | 35 (42.2) | 30 (46.2) | 5 (27.8) | |
GGO component | ||||
Pure | 37 (44.6) | 30 (46.2) | 7 (38.9) | 0.0219* |
Presence | 28 (33.7) | 25 (38.5) | 3 (16.7) | |
Absence | 18 (21.7) | 10 (15.3) | 8 (44.4) | |
Lesion location | ||||
Right upper | 24 (28.9) | 19 (29.2) | 5 (27.6) | 0.7521 |
Right middle | 12 (14.5) | 9 (13.8) | 3 (16.7) | |
Right lower | 8 (9.6) | 6 (9.2) | 2 (11.1) | |
Left upper | 21 (25.3) | 15 (23.2) | 6 (33.3) | |
Left lower | 18 (21.7) | 16 (24.6) | 2 (11.1) | |
Circulating tRF score | ||||
High | 63 (75.9) | 54 (83.1) | 9 (50.0) | 0.0037* |
Low | 20 (24.1) | 11 (16.9) | 9 (50.0) |
- Note: p value were calculated by Fisher's exact test.
- Abbreviations: GGO, ground glass opacity; tRF, tRNA-derived fragment.
- *p < 0.05.
Variable | Odds ratio | 95% confidence interval | p value |
---|---|---|---|
Age | 1.140 | 0.399–3.255 | 0.807 |
Gender | 0.350 | 0.121–1.040 | 0.060 |
Smoking | 1.847 | 0.589–5.795 | 0.293 |
Maximum diameter | 2.229 | 0.712–6.974 | 0.169 |
GGO component | |||
Presence | |||
Pure | 0.514 | 0.120–2.199 | 0.370 |
Absence | 0.150 | 0.033–0.683 | 0.014* |
Lesion location | |||
Right upper | |||
Right middle | 0.789 | 0.154–4.055 | 0.777 |
Right lower | 0.789 | 0.121–5.170 | 0.805 |
Left upper | 0.658 | 0.168–2.580 | 0.548 |
Left lower | 2.105 | 0.359–12.354 | 0.410 |
Circulating tRF score | 4.909 | 1.588–15.177 | 0.006* |
- Abbreviations: GGO, ground glass opacity; tRF, tRNA-derived fragment.
- *p < 0.05.
Characteristic | All patients | Nodules | p value | |
---|---|---|---|---|
Malignant | Benign | |||
Age (years) | ||||
<65 | 29 (43.9) | 25 (47.2) | 4 (30.8) | 0.2857 |
≥65 | 37 (56.1) | 28 (52.8) | 9 (69.2) | |
Gender | ||||
Male | 45 (68.2) | 34 (64.2) | 11 (84.6) | 0.1557 |
Female | 21 (31.8) | 19 (35.2) | 2 (15.4) | |
Smoking | ||||
Ever | 39 (59.1) | 30 (56.6) | 9 (69.2) | 0.4067 |
Never | 27 (40.9) | 23 (43.4) | 4 (30.8) | |
Maximum diameter (cm) | ||||
<1.2 | 48 (72.7) | 37 (69.8) | 11 (84.6) | 0.2828 |
≥1.2, <2 | 18 (27.3) | 16 (30.2) | 2 (15.4) | |
GGO component | ||||
Pure | 20 (30.3) | 16 (30.2) | 4 (30.8) | 0.0467* |
Presence | 35 (53.0) | 31 (58.5) | 4 (30.8) | |
Absence | 11 (16.7) | 6 (11.3) | 5 (38.4) | |
Lesion location | ||||
Right upper | 21 (31.8) | 16 (30.2) | 5 (38.4) | 0.5902 |
Right middle | 10 (15.1) | 7 (13.2) | 3 (23.1) | |
Right lower | 13 (19.7) | 10 (18.9) | 3 (23.1) | |
Left upper | 4 (6.1) | 4 (7.5) | 0 (0.0) | |
Left lower | 18 (27.3) | 16 (30.2) | 2 (15.4) | |
Circulating tRF score | ||||
High | 49 (74.2) | 44 (83.0) | 5 (38.4) | <0.0001* |
Low | 17 (25.8) | 9 (17.0) | 8 (61.6) |
- Note: p values were calculated by Fisher's exact test.
- Abbreviation: GGO, ground glass opacity; tRF, tRNA-derived fragment.
- *p < 0.05.
Variable | Odds ratio | 95% confidence interval | p value |
---|---|---|---|
Age | 2.009 | 0.550–7.337 | 0.291 |
Gender | 0.325 | 0.065–1.624 | 0.171 |
Smoking | 0.580 | 0.158–2.121 | 0.410 |
Maximum diameter | 2.378 | 0.472–11.979 | 0.294 |
GGO component | |||
Presence | |||
Pure | 0.516 | 0.114–2.340 | 0.391 |
Absence | 0.155 | 0.032–0.751 | 0.021* |
Lesion location | |||
Right upper | |||
Right middle | 0.729 | 0.135–3.930 | 0.713 |
Right lower | 1.042 | 0.203–5.343 | 0.961 |
Left upper | 3.000 | 0.138–65.079 | 0.484 |
Left lower | 2.500 | 0.422–14.828 | 0.313 |
Circulating tRF score | 9.778 | 2.416–39.576 | 0.001* |
- Abbreviations: GGO, ground glass opacity; tRF, tRNA-derived fragment.
- *p < 0.05.
Additionally, the AUCs for the circulating tsRNA score were both superior to any single tsRNA in the internal validation and external validation cohorts (Figure 3D,F). These results indicated that the circulating tsRNA score had potential translational value in distinguishing malignant nodules from benign nodules by liquid biopsy.
Chest CT is the most commonly used noninvasive method to determine the type of pulmonary nodules in preoperative examination of pulmonary nodules with a maximum diameter <2 cm.23, 24 In the assessment of risk factors for malignant pulmonary nodules in the discovery cohort, internal validation cohort, and external validation cohort, we found that GGO components in pulmonary nodules, especially mixtures (with both solid and GGO components), were associated with the risk of malignant pulmonary nodules. We constructed a Malignant Nodules Probability nomogram by integrating tsRNA and CT feature vectors to improve the accuracy of malignant lung nodule detection (Figure 4A). The calibration curve showed that malignant nodules were predicted with good accuracy in the three cohorts (Figure 4B). The results of the ROC analyses (Figure 4C) indicated that our nomogram could accurately predict malignant nodules in the discovery cohort (AUC = 0.953, sensitivity 88.4%, specificity 100.0%), internal cohort (AUC = 0.930, sensitivity 100.0%, specificity 73.8%), and external cohort (AUC = 0.943, sensitivity 100.0%, specificity 86.8%).

When compared with the Mayo Clinic Model and Brock University cancer prediction equation, our Malignant Nodules Probability nomogram outperformed both clinical models in the three cohorts. The AUCs of the Mayo Clinic model were 0.637, 0.634, and 0.813 and the AUCs of the Brock University cancer prediction equation were 0.660, 0.673, and 0.745, respectively (Figure 4C). The Malignant Nodules Probability nomogram enhances the detection accuracy of the circulating tsRNA score for malignant lung nodules by integrating CT feature vectors.
3.2 Prediction of invasive adenocarcinoma
According to the International Association for the Study of Lung Cancer/American Thoracic Society/European Respiratory Society IASLC/ATS/ERS classification, LUAD was classified as AIS, MIA, or IA.5 Sublobar resection is currently considered to be sufficient for AIS and MIA, while lobectomy is the preferred therapy for IA. However, surgeons can only determine the subsequent operation method and the scope of resection according to the solid component in CT images and intraoperative FS diagnosis, which is widely used to distinguish MIA from IA during surgery and considered as the gold standard in clinical practice. A previous study showed that the diagnostic accuracy of the intraoperative FS for tumors ≤1 cm in diameter was 79.6%,8 while this accuracy rate was even lower for junior pathologists or medical centers with less experience. Thus, it is necessary to develop a new, noninvasive method that provides a reference for the invasiveness degree before surgery, so as to reduce the occurrence of inappropriate surgical plan choices and optimize the distribution of medical resources. Next, we evaluated the ability of our circulating tsRNA score to identify tumor categories in order to help the diagnosis of NSCLC. The specific expression of each tsRNA is shown in Figure 5A. Interestingly, most selected tsRNAs were highly expressed in IA patients' plasma. We then tested whether the panel was able to identify patients with IA. As expected, the circulating tsRNA score was significantly higher in the IA group compared to the MIA group in the discovery cohort (Figure 5B) and was associated with IA (Table 7; OR 4.253; 95% CI, 1.502–17.864). In addition, point-biserial correlation was carried out and indicated that high circulating tsRNA score was highly correlated with IA (p = 0.001542; Figure 5C). To further confirm the performance of the circulating tsRNA score in IA prediction, the internal validation cohort and external validation cohort were also involved. Consistently, the circulating tsRNA score show good potential in the prediction of IA (Figure 5E,F,H,I, Tables 8 and 9). The AUC of preoperative circulating tsRNA score for detecting invasiveness of NSCLC were 0.6816 (discovery cohort), 0.6638 (internal validation cohort), and 0.7186 (external validation cohort), better than the performance of any individual tsRNA (Figure 5D,G,J).

Variable | Odds ratio | 95% confidence interval | p value |
---|---|---|---|
Age | 1.539 | 0.527–3.446 | 0.678 |
Gender | 0.832 | 0.245–2.533 | 0.235 |
Smoking | 1.295 | 0.567–2.546 | 0.325 |
Maximum diameter | 3.453 | 1.762–8.409 | <0.001* |
GGO component | |||
Presence | |||
Pure | 0.264 | 0.084–0.890 | 0.025* |
Absence | 0.128 | 0.010–0.674 | 0.013* |
Lesion location | |||
Right upper | |||
Right middle | 1.204 | 0.764–2.412 | 0.665 |
Right lower | 0.945 | 0.443–4.620 | 0.840 |
Left upper | 1.953 | 0.608–14.564 | 0.586 |
Left lower | 1.745 | 0.453–8.402 | 0.456 |
Circulating tRF score | 4.253 | 1.502–17.864 | 0.002* |
- Abbreviations: GGO, ground glass opacity; tRF, tRNA-derived fragment.
- *p < 0.05.
Variable | Odds ratio | 95% confidence interval | p value |
---|---|---|---|
Age | 1.027 | 0.209–3.452 | 0.807 |
Gender | 1.442 | 0.121–4.135 | 0.060 |
Smoking | 1.068 | 0.343–3.005 | 0.293 |
Maximum diameter | 3.576 | 1.827–9.424 | <0.001* |
GGO component | |||
Presence | |||
Pure | 0.365 | 0.178–0.942 | 0.038* |
Absence | 0.341 | 0.127–0.806 | 0.033* |
Lesion location | |||
Right upper | |||
Right middle | 2.080 | 0.525–3.951 | 0.277 |
Right lower | 1.151 | 0.428–4.710 | 0.612 |
Left upper | 0.668 | 0.325–2.862 | 0.395 |
Left lower | 1.130 | 0.280–3.006 | 0.870 |
Circulating tRF score | 2.171 | 1.298–5.085 | 0.014* |
- Abbreviations: GGO, ground glass opacity; tRF, tRNA-derived fragment.
- *p < 0.05.
Variable | Odds ratio | 95% confidence interval | p value |
---|---|---|---|
Age | 1.519 | 0.712–3.002 | 0.531 |
Gender | 2.270 | 0.162–3.888 | 0.618 |
Smoking | 1.319 | 0.213–2.538 | 0.505 |
Maximum diameter | 3.576 | 1.827–9.424 | <0.009* |
GGO component | |||
Presence | |||
Pure | 0.365 | 0.056–0.845 | 0.031* |
Absence | 0.341 | 0.142–0.744 | 0.014* |
Lesion location | |||
Right upper | |||
Right middle | 2.270 | 0.752–3.838 | 0.484 |
Right lower | 1.821 | 0.620–3.956 | 0.696 |
Left upper | 2.501 | 0.778–2.336 | 0.119 |
Left lower | 2.543 | 0.340–3.767 | 0.311 |
Circulating tRF score | 2.728 | 1.298–6.104 | 0.041* |
- Abbreviations: GGO, ground glass opacity; tRF, tRNA-derived fragment.
- *p < 0.05.
However, the circulating tsRNA score is insufficient to detect the invasiveness of NSCLC. Compared with the postoperative pathological examination of the majority opinion of pathologists, the AUCs of preoperative circulating tsRNA scores for detecting invasiveness of NSCLC were 0.682 (discovery cohort), 0.664 (internal validation cohort), and 0.719 (external validation cohort). Additionally, the AUCs of the diagnostic accuracy for pathologists were 0.860, 0.862, and 0.773 in the senior pathologist and 0.534, 0.569, and 0.621 in the junior pathologist, respectively, using intraoperative FS diagnosis (Figure 6A–C). These results showed that the diagnostic accuracy of the circulating tsRNA score was slightly higher than that of junior pathologists but significantly lower than that of senior pathologists.

To improve the accuracy of IA prediction, CT image data of the participants was involved. As maximum diameter and GGO composition were correlated with NSCLC invasiveness (Tables 7–9), a nomogram was established to explore more precise prediction for invasive adenocarcinoma probability controlling for the GGO component and maximum diameter (Figure 6D). The calibration curve showed that the nomogram achieved a good performance in predicting invasive adenocarcinoma (Figure 6E). Once again, ROC analysis was carried out. The five-tsRNA signature robustly distinguished IA patients from MIA patients in the discovery cohort (Figure 6F; AUC = 0.850, sensitivity 86.0%, specificity 81.4%), internal validation cohort (Figure 6G; AUC = 0.784, sensitivity 78.8%, specificity 78.1%), and external validation cohort (Figure 6H; AUC = 0.837, sensitivity 85.7%, specificity 84.0%). To further evaluate the diagnostic performance of the Invasive Adenocarcinoma Nodules Probability nomogram, the CT-based Lung-DL model, which was constructed based on CT information in our previous study, was used for comparison. The AUCs of diagnostic accuracy for the Lung-DL model were 0.840 (sensitivity 87.8%, specificity 77.1%, discovery cohort), 0.786 (sensitivity 48.9%, specificity 96.9%, internal validation cohort), and 0.759 (sensitivity 50.0%, specificity 96.0%, external validation cohort). Through comparison, the diagnostic accuracy of the Malignant Nodules Probability nomogram could achieve a similar level to the Lung-DL model. The specificity of the Malignant Nodules Probability nomogram was slightly lower than the Lung-DL model, but achieved better sensitivity.
In general, our five-tsRNA diagnostic model showed prospective capacity in predicting invasive components in malignant lesions, which could aid surgical decision-making and provide hints in FS diagnosis.
4 DISCUSSION
Recent CT screening trials including The National Lung Screening Trial (NLST), The European Multicentric Italian Lung Detection (MILD) study, and the Nederlands-Leuvens Longkanker Screenings ONderzoek (NELSON) have confirmed the significance of LDCT in high-risk patients.25-27 The current judgment of pulmonary nodules is mainly dependent on the experience of pathologists or assessment of clinical risk models such as the Mayo Clinic Model. High-risk patients are recommended for biopsy or surgical resection, and low-risk patients are monitored with follow-up CT.28 However, the current clinical management of pulmonary nodules produces a high rate of diagnostic errors, which drives the patients toward unnecessary invasive biopsy or repeated CT scanning, especially in the nonsmoking population.29 Thus, a highly accurate noninvasive risk assessment strategy is critical.
This investigation first integrated blood samples and CT information into a risk assessment model for the clinical management of pulmonary nodules. The model used a novel biomarker, circulating tsRNAs, which showed prospective prediction capacity in cancer diagnosis and prognosis. Notably, the combined biomarker model enabled improved performance across different lesions in discovery and validation cohorts from three different thoracic surgery centers. Additionally, we found that our five-tsRNA based model showed reliable capacity in distinguishing noninvasive from invasive lesions.
Originally, tRNA halves, called “tRNA-derived stress-induced RNA” or tiRNA, were identified as being produced under cellular stress.30 With the development of next generation sequencing, sequencing data showed that tRNA halves as well as shorter tRFs were also produced constitutively. Transfer RNA-derived fragments can be further divided into tRF-1, 3′tRF, 5′tRF, and i-tRF while tiRNAs can be divided into 5′tiRNA and 3′tiRNA.31 Given their presence in many conditions, tRFs were actively identified as noninvasive biomarkers attributed to tissue state, disease type, and personal attributes, and their presence in circulation.32, 33 Due to high conservation and stability, the detection of tsRNAs is quite feasible and cost-effective in various biofluids compared to other liquid biopsies, such as ctDNA. In addition, using a sequencing-based method like ctDNA or CTC for initial screening was costly, throughput was limited, and they yielded high numbers of false positives.34 Furthermore, although the length of tsRNA was similar to miRNA, tsRNAs showed significant differences in the expression patterns across 10 kinds of body fluids compared with miRNA.11 Thus, it was reported that the diagnostic value of tsRNAs might be better than miRNAs in CRC.35 Wu et al. reported that 5′-tRF-GlyGCC can serve as a novel biomarker for colorectal cancer diagnosis.14 However, the diagnostic capability of individual circulating tsRNA markers was limited. In our study, five tsRNAs were specifically identified with malignant lesions and improved the accuracy of the diagnostic model, which provided potentially translational value.
Liquid biopsy has drawn lasting attention in the diagnosis of pulmonary nodules by cell-free DNA, miRNA, and CTC biomarkers.36 Compared with the difficulty of tissue biopsy for small pulmonary nodules, it has become a promising clinical strategy with noninvasive and repeatable advantages. However, it also has significant limitations. It is important to recognize that redundancy of features that are thought to be overfitting, the inability for external research groups to replicate the results, and the absence of clinical information like nodule size or solid component results in insufficient sensitivity and specificity for clinical application.37 Radiomic features used to be considered as feasible and reliable indicators for pulmonary nodules based on abundant clinical experience. However, radiomic features could not build direct associations with relevant biological features, which ignored the heterogenesis in genetics and epigenetics of pulmonary nodules. Based on the abovementioned consideration, we combined the five-tsRNA based signature with radiomics into the Malignant Nodules Probability model, which built a bridge between radiomics and biological biomarkers. This multi-omics model could take enough advantage of both clinical image features and biological biomarkers, which achieve precise and sensitive effects on prediction.
Sublobar resection was considered to be carried out in noninvasive peripheral small-sized cancer dependent on the accuracy of intraoperative FS diagnosis.7 The results of clinical trials JCOG0802 and CALGB0503 also confirmed that segmentectomy should be the standard surgical procedure for patients with small peripheral NSCLC.38 Thus, the surgical choice was mainly dependent on the solid component ratio and nodule size based on CT image or intraoperative FS diagnosis. The bias of pathological sampling leads to the reduction of the accuracy of FS by junior pathologists. Sometimes surgeons have been misguided by CT scans and performed inadequate or excessive surgical procedures. Therefore, our model provided a feasible and precise approach in distinguishing noninvasive from invasive lesions before operation, which would benefit surgical choice and could be a hint for intraoperative FS pathologic diagnosis.
To further evaluate the accuracy of our model in predicting malignant pulmonary nodules, we compared it with the Mayo Clinic Model and Brock University cancer prediction equation within three cohorts. Our Malignant Nodules Probability Nomogram showed superior performance compared to both clinical models across all three cohorts, indicating its high accuracy. One possible explanation for this discrepancy could be related to differences in patient recruitment methods. Specifically, the Mayo Clinic Model identified patients based on visible lung nodules on chest radiographs,16 whereas our cohort was identified through CT scans that likely included a larger proportion of smaller nodules. Additionally, as our cohorts only included patients with a single pulmonary nodule, we cannot fully assess the accuracy of the Brock University cancer prediction equation, which incorporates variables such as nodule count.17 However, it should be noted that the inclusion of only patients with a single pulmonary nodule is a limitation in our study. Furthermore, as a retrospective study, our study lacks prospective data support; thus, a prospective clinical cohort is necessary to further validate our model's accuracy. Finally, additional validation with larger sample size is warranted.
In conclusion, our study evaluated the diagnostic value of tsRNAs and established a tsRNA-based diagnostic model for patients with pulmonary nodules. Significantly, the Malignant Nodules Probability nomogram has potential diagnostic value and translational utilization in the detection of malignant nodules and invasive lesions, which could aid the clinical decision-making process and reduce unnecessary clinical procedures.
AUTHOR CONTRIBUTIONS
W.Q.L. and S.X.M. were responsible for the experimental section, manuscript writing, and data analysis. Q.C. was responsible for pathological biopsy interpretation and language editing. Z.F., C.Q., D.G.C., and X.W.J. provided tissues and plasma. X.L., J.F., and M.Q.X. provided financial and technical support. All authors read and approved the final manuscript.
ACKNOWLEDGMENTS
None.
FUNDING INFORMATION
This study was supported by grants from the National Natural Science Foundation of China (Grant Nos. 82073211, 82002434, and 82003106), The Project of Invigorating Health Care through Science, Technology and Education, Jiangsu Provincial Medical Innovation Team (CXTDA2017002), The Project of Invigorating Health Care through Science, Technology and Education, Jiangsu Provincial Medical Outstanding Talent (JCRCA2016001), and the Young Talents Program of Jiangsu Cancer Hospital (23).
CONFLICT OF INTEREST STATEMENT
The authors declare that they have no competing interests.
ETHICS STATEMENTS
Approval of the research protocol by an institutional review board: This study was approved by the Ethics Committee of Jiangsu Cancer Hospital (No. 2020129), Xuzhou Central Hospital (XYFM2020002), and Taixing People's Hospital (KD2020069).
Informed consent: All participants provided written informed consent before phlebotomy and surgery.
Registry and the registration no. of the study/trial: N/A.
Animal studies: N/A.
Open Research
DATA AVAILABILITY STATEMENT
Please contact the authors for data requests.