Volume 96, Issue 11 pp. 1385-1395
RESEARCH ARTICLE
Free Access

A novel 4-mRNA signature predicts the overall survival in acute myeloid leukemia

Zizhen Chen

Zizhen Chen

State Key Laboratory of Experimental Hematology, National Clinical Research Center for Blood Diseases, Institute of Hematology & Blood Diseases Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Tianjin, China

Search for more papers by this author
Junzhe Song

Junzhe Song

State Key Laboratory of Experimental Hematology, National Clinical Research Center for Blood Diseases, Institute of Hematology & Blood Diseases Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Tianjin, China

Search for more papers by this author
Wenjun Wang

Wenjun Wang

State Key Laboratory of Experimental Hematology, National Clinical Research Center for Blood Diseases, Institute of Hematology & Blood Diseases Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Tianjin, China

Search for more papers by this author
Jiaojiao Bai

Jiaojiao Bai

State Key Laboratory of Experimental Hematology, National Clinical Research Center for Blood Diseases, Institute of Hematology & Blood Diseases Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Tianjin, China

Search for more papers by this author
Yuhui Zhang

Yuhui Zhang

Department of Hematology, The Second Affiliated Hospital of Tianjin Medical University, Tianjin, China

Search for more papers by this author
Jun Shi

Jun Shi

State Key Laboratory of Experimental Hematology, National Clinical Research Center for Blood Diseases, Institute of Hematology & Blood Diseases Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Tianjin, China

Search for more papers by this author
Jie Bai

Corresponding Author

Jie Bai

Department of Hematology, The Second Affiliated Hospital of Tianjin Medical University, Tianjin, China

Correspondence

Yuan Zhou, State Key Laboratory of Experimental Hematology, National Clinical Research Center for Blood Diseases, Institute of Hematology & Blood Diseases Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Tianjin 300020, China.

Email: [email protected]

Jie Bai, Department of Hematology, The Second Affiliated Hospital of Tianjin Medical, University, Tianjin 300211, China.

Email: [email protected]

Search for more papers by this author
Yuan Zhou

Corresponding Author

Yuan Zhou

State Key Laboratory of Experimental Hematology, National Clinical Research Center for Blood Diseases, Institute of Hematology & Blood Diseases Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Tianjin, China

Correspondence

Yuan Zhou, State Key Laboratory of Experimental Hematology, National Clinical Research Center for Blood Diseases, Institute of Hematology & Blood Diseases Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Tianjin 300020, China.

Email: [email protected]

Jie Bai, Department of Hematology, The Second Affiliated Hospital of Tianjin Medical, University, Tianjin 300211, China.

Email: [email protected]

Search for more papers by this author
First published: 02 August 2021
Citations: 11

Zizhen Chen and Junzhe Song contributed equally

Funding information: CAMS Innovation Fund for Medical Sciences, Grant/Award Number: 2017-I2M-3-018; Data Center of Management Science, National Natural Science Foundation of China - Peking University, Grant/Award Numbers: 81890990, 81970120; National Key Research and Development Program of China, Grant/Award Number: 2020YFE0203000

Abstract

Acute myeloid leukemia (AML) is an aggressive cancer of myeloid cells with high levels of heterogeneity and great variability in prognostic behaviors. Cytogenetic abnormalities and genetic mutations have been widely used in the prognostic stratification of AML to assign patients into different risk categories. Nevertheless, nearly half of AML patients assigned to intermediate risk need more precise prognostic schemes. Here, 336 differentially expressed genes (DEGs) between AML and control samples and 206 genes representing the intratumor heterogeneity of AML were identified. By applying a LASSO Cox regression model, we generated a 4-mRNA prognostic signature comprising KLF9, ENPP4, TUBA4A and CD247. Higher risk scores were significantly associated with shorter overall survival, complex karyotype, and adverse mutations. We then validated the prognostic value of this 4-mRNA signature in two independent cohorts. We also proved that incorporation of the 4-mRNA-based signature in the 2017 European LeukemiaNet (ELN) risk classification could enhance the predictive accuracy of survival in patients with AML. Univariate and multivariate analyses showed that this signature was independent of traditional prognostic factors such as age, WBC count, and unfavorable cytogenetics. Finally, the molecular mechanisms underlying disparate outcomes in high-risk and low-risk AML patients were explored. Therefore, our findings suggest that the 4-mRNA signature refines the risk stratification and prognostic prediction of AML patients.

1 INTRODUCTION

Acute myeloid leukemia (AML) is a heterogeneous hematological malignancy characterized by the accumulation of leukemic cells because of abnormalities in the development and differentiation of myeloid precursors.1 Acute myeloid leukemia is the most common acute leukemia in adults, and the clinical features of AML include anemia, recurrent infections, and easy bruising or bleeding due to the inhibition of normal hematopoiesis by malignant clones.2 According to data from 2010 to 2016, the 5-year survival rate of patients with AML is 28.7%.3 The major challenge associated with finding a cure for AML stems from disease heterogeneity, which complicates the effective diagnostic criteria and risk stratification schema.

Until now, the classification and prognosis of AML have been mainly based on genomic events, including gene mutations, fusion genes, and chromosomal abnormalities.4-6 Regarding the 2017 European LeukemiaNet recommendations for the genetic risk stratification of AML (ELN 2017), non-APL AMLs are classified into three subgroups (favorable, intermediate, and poor/adverse) based on cytogenetic parameters. However, nearly half of patients with AML stratified as “intermediate” who lack significant genetic predictors face treatment failure and relapse,7 and there are enormous heterogeneities between them at the transcriptional level. Due to the development of next-generation sequencing (NGS) technology, The Cancer Genome Atlas (TCGA) Research Network7 and the International Microarray Innovations in Leukemia Study Group8 have proven that effectively partitioning AML patients into distinct prognostic categories according to transcriptome data is applicable. The use of gene expression profiles to determine risk factors and decipher the molecular pathogenesis of AML can provide new insight into individualized prognosis and treatment. With advances in data mining technology, efforts to establish potent predictive models from numerous mRNA expression data are ongoing. Stanley W. K. Ng et al. proposed a prognostic model comprising 17 leukemia stem cell (LSC)-related genes based on the least absolute shrinkage and selection operator (LASSO) technique to elucidate the link between risk scores and the survival probabilities of AML patients.9 They also discussed the correlation of the prognostic score and sensitivity to standard chemotherapy in AML patients. Sarah Wagner et al. applied an artificial neural network (ANN) to build a three-gene risk model and validated its correlation with event-free survival (EFS) and overall survival (OS).10

In this study, we developed a scoring system that is strongly correlated with OS to detect the survival outcomes of AML patients. The differentially expressed genes (DEGs) between AML and control samples were evaluated first. Then, given the heterogeneity of AMLs, we applied nonnegative matrix factorization (NMF) to the mRNA expression data of 232 samples from the Beat AML cohort to classify patients into three subgroups and illustrated the distinct mRNA signatures between them. Finally, we proposed a 4-mRNA risk formula and validated them in two independent cohorts. Moreover, refined ELN 2017 classification with the 4-mRNA signature better segregate OS in AML. We hope this model can provide an improved risk stratification option for patients with AML and shed new light on potential targeted therapeutic strategies.

2 METHODS

2.1 Patients

This study included 1187 AML samples from four cohorts: the GSE13159 microarray dataset (n = 573),8 the Beat AML dataset (RNA-seq, n = 232),11 the TCGA LAML dataset (RNA-seq, n = 132),7 and the AMLCG dataset (RNA-seq, n = 250).12 The RNA-seq and clinical data of the Beat AML and TCGA LAML cohorts were downloaded from the GDC Data Portal (https://portal.gdc.cancer.gov/). The microarray dataset GSE13159 and the AMLCG dataset GSE106291 were downloaded from the GEO database.

2.2 Cytogenetic analysis

Chromosomal abnormalities were analyzed based on the United Kingdom Medical Research Council (MRC) cytogenetic risk category as previously described.6

2.3 Gene expression profiling

To identify AML-specific genes, a total of 573 bone marrow samples (501 AML samples vs 72 control samples) in the GSE13159 dataset were selected. This dataset was obtained from the MILE Study (Microarray Innovations in LEukemia) program using Affymetrix HG-U133 Plus 2.0 GeneChips. The DEGs were identified by using the “limma” package, with an adjusted p value <0.05 and a log2-fold change >0.15.13

2.4 Unsupervised consensus clustering

To investigate the heterogeneity of AML, 232 patients in the Beat AML dataset whose transcriptional profiles were obtained from bone marrow were included. We generated a quantile-normalized and log2-transformed FPKM matrix. Next, we performed unsupervised consensus clustering by using the “NMF” package in R with the default Brunet algorithm and 50 and 100 iterations for the rank survey and clustering, respectively.14 Given NMF outputs, we identified the DEGs between subgroups using the R package “limma”.13

2.5 Generating the mRNA-based prognostic signature

We merged AML-specific genes and DEGs between AML subgroups. To assess the association between expression and overall survival, we performed univariable Cox proportional hazards regression.15 In the Beat AML cohort, 198 patients with OS information were included in a training cohort to generate the prognostic signature. Among them, 181 patients with whole exome sequencing profile were included for analysis of genetic mutations. RNA-seq data were normalized with the VST algorithm in the “DESeq2” package in R (version 4.0.3).16 To build a predictive model, we performed linear regression based on the LASSO algorithm using the “glmnet” package in the training cohort.17 Selected genes and their regression coefficients were used to generate a Cox regression model. Further, patients were ranked according to the 4-mRNA score and dichotomized into high-risk and low-risk groups using an appropriate cut-off point calculated by maximally selected rank test from R package MaxStat.18 The overall study design and implementation were shown in Figure S1.

2.6 Statistical analysis

All statistical analyses were performed in R (version 4.0.3). Survival analyses were performed using the “survival” and “survminer” packages. All genes were assessed via univariate analysis using the “survival” package in R, and variables with a p value <0.05 were selected for further analysis. The OS probabilities were estimated using the Kaplan–Meier method. Comparisons of the discrete variables between patients with high-risk and low-risk scores were performed using the chi-square test or Fisher's exact test. Student t test or the Wilcoxon rank-sum test was used to compare continuous variables. Univariate and multivariate Cox regression analyses were used to confirm the independent prognostic factors. In analyses assessing goodness of fit for OS, the R package “rms” was used for the estimation of the concordance index (c-index).

3 RESULTS

3.1 Identification of AML-specific genes

To delineate the underlying molecular changes in AML, we analyzed the microarray dataset GSE13159. Filtered specimens from the bone marrow of 72 control samples (those with nonleukemia and healthy donors) and 501 AML samples were subjected to gene expression analysis. The “limma” package was used to compare the gene expression profiles between AML and control bone marrow specimens. A list of DEGs was obtained, and 41 up-regulated and 295 down-regulated genes were found in the AML group (|log2FC| ≥ 0.15, adjusted p value <0.05). The heatmap and volcano plots are shown in Figure 1A and B .

Details are in the caption following the image
Identifying AML-specific genes. (A), Heatmap displaying DEGs between AML vs control bone marrow samples (those with nonleukemia and healthy donors). The differentially expressed genes with a log2 fold change >0.15 and an adjusted p value <0.05 were considered significant. (B), Volcano plot of the DEGs. Each point represents the average value of one transcript. (C), KEGG enrichment results of the differentially expressed genes between AML and control samples. (D), TRRUST analysis revealing the potential upstream transcription factors of different genes between AML and control samples [Color figure can be viewed at wileyonlinelibrary.com]

The KEGG enrichment analysis was performed to define the pathways affected specifically in the bone marrow of AML patients, and the altered genes were enriched in the biological processes “Hematopoietic cell lineage”, “Transcriptional misregulation in cancer,” “Acute myeloid leukemia”, and “Cytokine-cytokine receptor interaction” (Figure 1C), which is consistent with the characteristics of AML. Our results indicate that the DEGs we obtained represent the specific transcriptional signatures of AML samples vs control samples. Hematopoiesis is controlled by multiple transcription factors that activate lineage-specific genes. The dysregulation of a small group of crucial transcription factors was further verified to be involved in the process of AML.19 Here, we explored the potential transcription factor regulatory networks of AML-specific genes. Based on the TRRUST dataset, a total of 15 transcription factors, including GATA1, SPI1, MYB, RUNX1, IRF1, and NFKB1 (Figure 1D), which are frequently dysregulated in AML or other hematopoietic malignancies, were identified, indicating that the list of our DEGs is AML specific.

3.2 Identification of DEGs between AML subgroups

Note, AML is a highly heterogeneous disease. There are at least 14 different subtypes of AML according to their genomic abnormalities.1 The deep sequencing results in TCGA database revealed nearly 2000 mutated genes in 200 patients with AML. The DEGs between AML and control bone marrow samples defined the AML-specific genes but did not reflect the intragroup heterogeneity of AML. Therefore, it is meaningful to identify the transcriptional characteristics of AML subgroups. We performed NMF clustering using the mRNA expression profiles of 232 samples from the Beat AML cohort to determine the heterogeneity of mRNA expression across AML subgroups. We first solved the issue of cluster stability using the R package “NMF”. We used the default “Brunet” algorithm, where the k value of the cluster ranges from two to six, and each k value is repeated 50 times to obtain the consistency matrix (Figure S2A–E). The consistency matrix can illustrate the stability of clustering. When k = 3, the magnitude of the cophenetic correlation coefficient begins to fall (Figure S2F). This analysis suggested an optimum of three subgroups. Then, we used gene expression data to divide the 232 samples into three subtypes, of which there were 96 patients in subtype one, 71 patients in subtype two, and 65 patients in subtype three (Figure 2A). Next, we used the “limma” package to identify the genes specific to each subtype. The characteristic genes were defined as having a log2 fold change greater than two and an FDR less than 0.05. A heat map based on the expression values of a total of 206 characteristic genes in the three subtypes of the Beat AML cohort is illustrated in Figure 2B.

Details are in the caption following the image
Unsupervised nonnegative matrix factorization (NMF) clustering of transcription profiles of samples from the Beat AML cohort. (A), NMF consensus map with a scale bar showing the consensus membership value. (B), Expression heatmap of mRNAs that were significantly differentially expressed between each AML subgroup. (C), Covariate tracks demonstrating the genetic features and FAB classifications of each subgroup. D Kaplan–Meier plot displaying the overall survival (OS) status of patients in each subgroup [Color figure can be viewed at wileyonlinelibrary.com]

Consensus clustering showed a noticeable difference in the molecular features of different subgroups. We used the chi-square goodness-of-fit test to identify significant associations (p < 0.05) between subgroups and covariates (Figure 2C). The chi-square test showed that subtype two was associated with FAB subtypes M4 and M5, and mutations in genes involved in RAS signaling. Subtype one was associated with FLT3, NPM1, CEBPA, and WT1 mutations. Subtype three was associated with RUNX1, TET2, SRSF2, BCOR, and ASXL1 mutations. The cytogenetic features listed in Table S1 demonstrated patients with complex karyotype were mainly distributed in subtype two and three, but not in subtype one. Furthermore, patients in the subtype two had shorter OS than patients in subtype one and three (Figure 2D).

3.3 AML predictive model by LASSO regression

A total of 542 genes including 336 AML-specific genes signified the differences between AML and control samples, and 206 genes representing the intragroup heterogeneities of AML, were analyzed to identify mRNAs correlated with disease prognosis. Univariate Cox PH regression analysis was performed on the expression values of these genes in 198 patients with OS information from the Beat AML cohort. Ultimately, 28 genes (p < 0.05, univariable Cox PH regression), including 18 protective genes (HR, 1.06–1.15) and 10 risk-associated genes (HR, 0.86–0.95), were selected (Table S2). We next developed an mRNA expression-based AML prognostic signature with the 28 predictive genes by using the LASSO regression algorithm. The model was established with 4 mRNAs (Figure 3A), three of which are risk genes and one of which is a protective gene (Figure 3B): Risk Score = (0.153 × ENPP4) + (0.2 × KLF9) + (0.104 × TUBA4A)–(0.3 × CD247).

Details are in the caption following the image
Identification of a risk signature for OS by LASSO regression analysis in the Beat AML cohort (A) Predictor equation with coefficients of the 4 mRNA features in the OS prognostic signature. (B) Heatmap of the expression levels of the 4 mRNAs across patients in the Beat AML cohort. (C) The risk score of each patient in the Beat AML cohort. (D) Genomic alterations in each patient; only genes that differed between the two groups are shown. (E–G), Kaplan–Meier plots displaying OS differences between patients in the high-risk and low-risk groups in the Beat AML training cohort, TCGA LAML validation cohort, and AMLCG validation cohort [Color figure can be viewed at wileyonlinelibrary.com]

We further calculated the risk score for each patient in the training cohort as the weighted sum of the expression of the four mRNAs to investigate the potential clinical efficiency of our model. As a continuous variable, risk score arranges from 0.15 to 4.23 in the Beat AML cohort. Patients were ranked according to the 4-mRNA signature and dichotomized into high-risk and low-risk groups using cut-off point calculated by maximally selected rank test from R package MaxStat (Figure 3C). Our model-based patient classification was significantly associated with OS. Patients with higher risk scores had inferior outcomes (HR, 1.8; 95% CI, 1.4–2.3; p < 0.01; univariable Cox PH regression). Furthermore, patients in the high-risk group in the Beat AML cohort had shorter OS (median OS = 11.7 months vs 54.4 months; p < 0.001; Figure 3E).

3.4 Validation of the 4-mRNA signature

To further validate the prognostic value of our model in other cohorts, we evaluated the association of the risk score with survival in two additional independent AML cohorts: TCGA and AMLCG (GEO accession GSE106291). In the TCGA LAML cohort (n = 132), patients with a high-risk score had a significantly poorer outcome than patients with a low score (HR, 1.6; 95% CI, 1.1 to 2.2; p < 0.01; univariable Cox PH regression). Similar results were observed in the AMLCG cohort (HR, 1.6; 95% CI, 1.2 to 2.0; p < 0.01; univariable Cox PH regression). Kaplan–Meier analysis showed that patients in the high-risk groups in the TCGA LAML cohort (median OS =12.0 months vs 28.1 months; p = 0.004; Figure 3F) and the AMLCG cohort (median OS 15.1 months vs not reached; p < 0.01 Figure 3G) had shorter OS than those in the low-risk groups.

Clinical features and molecular lesions have been implemented in the prognostic stratification of AML. In our 4-mRNA scoring system, patients in the high-risk and low-risk groups possessed distinct genetic signatures. Higher scores were more often associated with mutations in SRSF2 or TET2, which are independent poor prognosis markers in AML. The AML patients with biallelic CEBPA mutations were associated with better survival, with a higher complete remission rate, and were mainly included in the low-risk group. Patients with PTPN11 mutation had lower scores, and frequently had mutated NPM1 (data not shown), which is consistent with the study by Metzeler et al.20 (Figures 3D and S3, Tables S3 and S4). Note, AML with gene mutations in chromatin and/or RNA splicing regulators, as well as AML with TP53 mutations and/or aneuploidies were shown to be heterogeneous genomic categories in addition to currently defined AML subgroups. Occurrence of ASXL1 and/or SRSF2 mutations were independently correlated with poor survival of AMLs, so as TP53 mutation and/or complex karyotype.1 Here we showed that the high-risk scores were strongly associated with the occurrence of ASXL1 or SRSF2 mutations (p < 0.001), as well as mutated TP53 or complex karyotype (p < 0.01) (Table S5).

We also investigated the correlations of clinical characteristics with the risk score. The results showed that patients with high-risk scores were older in age (median 63 vs. 53, p < 0.001), had a higher incidence of FLT3 internal tandem duplication mutation (FLT3-ITD) (30.3%, 33 in 109 vs 10.3%, nine in 89, p = 0.001), and higher white blood cell counts in both the Beat AML cohort (median 24.85 vs. 13.74, p = 0.013) and TCGA LAML cohort (30.8 vs. 5.6, p < 0.001). Analysis of cytogenetic characteristics showed that the complex karyotype was significantly associated with high-risk score in the Beat AML cohort (p = 0.017), and tended to be associated in the TCGA LAML cohort (p = 0.18) (Tables S6 and S7).

Though ELN 2017 is widely used for to risk stratify patients with AML, patients classified as “intermediate” still have substantial prognostic heterogeneity. We then compared our model with ELN 2017 in the Beat AML cohort. Non-APL adult AML patients were selected and categorized based on the ELN 2017 classification (n = 182). The results indicated that 73% of patients in the ELN-adverse group were reclassified into our high-risk group, and 70% of patients in the ELN-favorable group were reclassified into our low-risk group (Figure 4A). By introducing the 4-mRNA signature, AML patients stratified by the ELN 2017 classification could be further categorized into six subgroups. Notably, in the ELN-favorable and ELN-intermediate groups, patients with high or low 4-mRNA risk scores had distinct outcomes (Figure S4). According to the OS status of patients classified by the ELN 2017 and 4-mRNA signature, we refined ELN 2017 risk classification: patients with ELN 2017 favorable and low 4-mRNA score were regrouped as favorable group; those with ELN favorable and high 4-mRNA score or ELN intermediate with low score as intermediate group; other three subgroups were regrouped as adverse group. Importantly, the refined 2017 risk classification with 4-mRNA signature could well assign the patients into three distinct groups, with significant differences in OS (Figure 4B).

Details are in the caption following the image
Refining the ELN classification with the 4-mRNA signature (A) Sankey plot showing the numbers of patients reclassified based on the 4-mRNA prognostic signature. (B) Kaplan–Meier plots displaying OS differences in patients stratified based on ELN 2017 classification (left) and refined ELN 2017 classification with the 4-m RNA signature (right). (C) Forest plots of the multivariable Cox proportional hazards model. The 4-mRNA signature is an independent prognostic factor in the Beat AML cohort when using the ELN 2017 risk classification (left) or MRC cytogenetics classification (right) as a covariable. In the forest diagram, the HR of each factor is represented by a black dot, and the 95% CI is represented by a horizontal line. The statistical p value is listed at the right [Color figure can be viewed at wileyonlinelibrary.com]

3.5 The 4-mRNA signature is an independent prognostic factor

To investigate whether the risk score in our model functions as an independent prognostic factor, univariate and multivariate Cox regression analyses of the risk score as well as several well-known clinical prognostic factors were performed. Considering the lack of clinical data on the AMLCG cohort, only the Beat AML cohort and the TCGA LAML cohort were included in the Cox regression analyses.

In the univariate analysis, the poor prognostic factors included older age (HR, 2.3; 95% CI, 1.4–3.8; p < 0.001), FLT3-ITD mutation (HR, 1.6; 95% CI, 1–2.5; p = 0.045), cytogenetic abnormalities (HR, 2.1; 95% CI, 1.4–3.3; p < 0.001) and sex (HR, 1.6; 95% CI, 1–2.4; p = 0.03) in the Beat AML cohort (Table S8) and older age (HR, 2;95% CI, 1.2–3.2; p < 0.01) in the TCGA LAML cohort (Table S9). Notably, a high-risk score was associated with poor overall survival in both the Beat AML cohort (p < 0.001) and the TCGA LAML cohort (p < 0.01). The multivariate analysis indicated that the 4-mRNA risk score was an independent predictor of poor OS in both the Beat AML cohort (HR, 1.87; 95% CI, 1.36–2.6; p < 0.001) after considering the age, WBC count, sex, FLT3-ITD, NPM1 mutation, and cytogenetics (Figure 4C) and the TCGA LAML cohort (HR, 1.51; 95% CI, 1.02–2.2; p < 0.05) after considering the age, WBC count, sex, NPM1 mutation, and cytogenetics (Figure S5A). In addition, 4-mRNA risk score was still an independent prognostic factor when considering ELN 2017 in the multivariate analysis (p = 0.004) (Figure 4C). To evaluate the prediction performance of 4-mRNA signature, c-index was estimated by R package “rms”. In the Beat AML cohort and the TCGA LAML cohort, 4-mRNA signature and ELN 2017 had similar c-index. The inclusion of the 4-mRNA signature markedly improved the predictive ability of ELN 2017 (Figure S5B).

3.6 Correlation of the AML score with pathway activity

To determine the possible molecular mechanisms that are mostly altered in the high-risk and low-risk patients, we compared the RNA-seq data of the 30 patients with the highest scores (as high-risk group) to that of the 30 patients with the lowest scores (as low-risk group) in the Beat AML cohort. GSEA showed that TNFA signaling via the NFκB, hypoxia, inflammatory response, and TGFβ pathways was upregulated in the high-risk group (Figure S6A). As Pathway Responsive Genes (PROGENy) analysis can display pathway alterations in each patient, we applied this method to uncover the most affected signaling pathways between high-risk and low-risk patients.21 We found that the hypoxia and TGFβ pathways were among the most dysregulated pathways in high-risk patients (Figure S6B, Table S10).

4 DISCUSSION

Cytogenetic and molecular genetic investigations provide crucial evidence for the risk stratification of AMLs. Nevertheless, 50% of patients remain poorly categorized owing to a deficiency in typical chromosome abnormalities or gene mutations.22 According to the central dogma of molecular biology, mRNA has a more direct impact on phenotype than the DNA sequence.23 While traditional sequencing methods are unable to determine genome-wide expression profiles at a given time, NGS allows access to massive amounts of transcriptional data, which can provide additional information to detect the underlying pathophysiology of AML and to optimize treatment decisions. In this study, we first determined DEGs between AML patients and healthy donors and verified that these genes were consistent with the biological properties of AML by performing functional analysis and upstream regulator analysis. Then, we used unsupervised learning methods to compartmentalize AMLs in the Beat AML cohort into three subgroups, each with distinct transcriptional signatures and clinical outcomes. Finally, we built a powerful 4-mRNA prognostic signature based on these two sets of mRNA candidates by LASSO regression and validated it in two independent cohorts.

With respect to current transcriptome datasets, the number of genes is much greater than that of samples. Only a few genes have a strong relationship with disease outcome. The LASSO algorithm provides good prediction accuracy by increasing model interpretability and reducing overfit. The LASSO algorithm was also applied to develop a linear regression model composed of six membrane protein genes applicable for cytogenetically normal acute myeloid leukemia (CN-AML)24 (MPG6), and the LSC17 scoring system, which was comprised of 17 genes representing LSCs, to predict the survival probabilities of AML patients.9 In addition, Wagner et al. employed another machine learning method, ANN, for AML stratification with prognostic implications by building a three-gene prognostic index (ANN3-gene PI).10 We acknowledge that increasing gene number initially used for modeling has a positive effect on accuracy. However, starting with a large number of candidate genes is highly time consuming and may bring excessive genes in the prognostic panel, and some of them are hard to interpret. Nonetheless, we established a 4-mRNA signature and validated its prognostic effect in three independent cohorts. We further compared the prognostic value between our model and other three models (MPG6, LSC17, ANN three-gene PI). In both the TCGA and Beat AML cohorts, all patients predicted to be at high risk had significantly shorter OS than those predicted to be at low risk by our model and the three other formulas (data not shown).

The ELN 2017 risk classification based on certain cytogenetic and molecular aberrations is commonly used for adult AMLs. The ELN risk classification enlightened us to identify high-risk AML patients with poor overall survival. In our research, the 4-mRNA signature can further distinguish the heterogeneous subgroups in ELN-favorable and intermediate AMLs. What's more, this study showed the 4-mRNA signature was an independent prognostic factor after considering ELN classification. Calculation of c-index identified that 4-mRNA signature and ELN 2017 have similar c-index in the Beat AML cohort and the TCGA LAML cohort, and the prediction performance was improved by incoprating the 4-mRNA signature into the ELN classification. This illuminated the limitation of the stratification only using ELN risk classification or the 4-mRNA signature. Incorporating genomic and transcriptomic markers can improve the accuracy of the prognostic classification of AML.

Immune dysfunction plays a pivotal role in tumorigenesis. In our predictive model, a critical immune factor, CD247, was included. Numerous studies have indicated a link between CD247, a T-cell surface glycoprotein CD3 zeta chain, and the prognosis of certain cancers. Extensive research has shown that T cells are among the most important factors for the intrinsic antitumor mechanism. T cell exhaustion and senescence have been reported from patients with AML at diagnosis.25 CD247 participates in the TCR signaling pathway and contributes to T cell activation. High expression of “TCR signal-triggering module” genes, including CD247, was identified as a favorable prognostic factor (associated with prolonged OS) in several cancer types (breast invasive carcinoma, cervical squamous cell carcinoma and endocervical adenocarcinoma, head and neck squamous cell carcinoma, lung adenocarcinoma, and sarcoma).26 In this study, we found a strong correlation between CD247 as a favorable predictor and the low-risk group in AML.

Although evidence supporting the roles of ENPP4 and TUBA4A in myeloid malignancies is lacking, the ectonucleotide pyrophosphatase/phosphodiesterase (ENPP/NPP) family has been reported to promote tumor initiation, progression, and metastasis in breast cancer and glioblastoma.27, 28 So, ENPP4 is recognized as a procoagulant enzyme for stimulating platelet degranulation and aggregation.29 Also, TUBA4A, a tubulin family member, was discovered to be enriched in exosomes from non-small cell lung cancer cell lines.30 It was also included as an adverse factor in the IPSLUAD prognostic signature in lung adenocarcinoma.31 The excessive production of reactive oxygen species (ROS) is frequently observed in cancer and strongly influences hematopoietic cell function. More than 60% of primary AML blasts constitutively produce elevated levels of ROS, which promotes the proliferation of AML cell lines and primary AML blasts.32 In addition, Zhou et al. reported an increase in several oxidative stress markers in samples from AML patients at relapse compared with samples at the initial diagnosis, suggesting that ROS production may be an important factor in AML progression.33 In response to the elevation in intracellular ROS, the expression of Klf9 can be stimulated by Nrf2, resulting in further Klf9-dependent increases in ROS and amplifying oxidative stress.34 In our model, elevated KLF9, as an oxidant signal, was recognized as a prognostic risk gene, indicating that oxidative stress may drive disease progression in AML. However, the precise roles of ENPP4, TUBA4A and KLF9 in AML have not been clarified yet. Whether they just function as prognostic biomarkers or actually contribute to the malignant phenotypes of AML remain to be fully elucidated in the future.

Several studies have demonstrated that ROS mediate the induction of hypoxia-inducible factor-1α (HIF1α) and contribute to a switch from oxidative to glycolytic metabolism.35, 36 As the key regulator of glycolysis, HIF1α is also transcribed under hypoxic conditions. The HIF-1α-induced quiescence supports the chemoresistance of AML cells.37 In this study, we compared the transcriptional profiles of the 30 patients with the highest risk scores with those of the 30 patients with the lowest risk scores. We found that genes related to hypoxia were significantly upregulated, indicating that abnormal oxidative metabolism is closely related to the progression of AML.

Understand, TGFβ signaling is an important pathway involved in carcinogenesis and cancer progression. In the hematopoietic compartment, the TGFβ pathway has been shown to be closely related to the pathophysiology of myelodysplastic syndrome (MDS).38 Consistent with the enhanced TGFβ signaling in the high-risk group, nearly half (46.7%) of the 30 patients with the highest risk scores had AML with myelodysplasia-related changes (AML-MRC), while only 3% of the 30 patients with the lowest risk scores had AML-MRC, suggesting that our model is able to sensitively distinguish AML-MRC patients, who generally have a poor prognosis.

AML is a highly heterogeneous hematologic malignancy, despite three independent cohorts were used for validating this 4-mRNA signature, the limited number of patients per cohort may still influence the accuracy of this model. Further validation and modulation with larger cohort is warranted. In addition, another limitation to this study is that we took advantage only of transcript-level data for the 4-mRNA predictor; thus, the results may not be closely related to all the clinical characteristics of AML. Though our 4-mRNA score functioned as an independent factor in Beat AML cohort when considering ELN 2017 classification or unfavorable cytogenetics as a variable, other prognostic genetic features such as TP53 and other high-risk mutations were not included as individual variables in multivariable analysis. These mutations also imbalanced between the high-risk and low-risk mRNA signature groups, so we cannot determine the independent significance if they were considered individually.

In this article, we proved that incorporation of the 4-mRNA-based signature in the ELN 2017 classification could enhance the predictive accuracy of survival in patients with AML, whether including other established risk factors, such as older age, high white blood cell counts, and positive minimal residual disease to improve the predictive performance of the signature, need to be considered.

In summary, a robust 4-gene scoring signature was designed on the basis of distinct mRNA expression profiles and three AML subgroups. We observed different overall survival rates between the high- and low-risk groups. The application of our proposed model has great potential for optimal disease classification and provides a fresh perspective on therapeutic options.

ACKNOWLEDGMENTS

The authors thanks Dr. Xin Gao for providing helpful discussions. This work was supported in part by National Natural Science Foundation of China (81970120, 81890990), National Key Research and Development Program of China (2020YFE0203000) and CAMS Innovation Fund for Medical Sciences (2017-I2M-3-018)

    CONFLICT OF INTEREST

    The authors declare no conflicts of interests.

    AUTHOR CONTRIBUTIONS

    Yuan Zhou and Zizhen Chen designed the study. Zizhen Chen, Junzhe Song, Wenjun Wang, Jiaojiao Bai, Yuhui Zhang performed the analyses. Zizhen Chen, Junzhe Song, Jie Bai, Junzhe Song and Yuan Zhou discussed the data. Zizhen Chen, Junzhe Song, Jie Bai and Yuan Zhou wrote the manuscript. All authors reviewed, edited, and approved the manuscript.

    DATA AVAILABILITY STATEMENT

    All data supporting the findings of this study are available within the article and other supplementary information files are available from the corresponding author upon request.

      The full text of this article hosted at iucr.org is unavailable due to technical difficulties.