Volume 2025, Issue 1 8659536
Research Article
Open Access

Identification of the SUMOylation Gene Signature in Colon Cancer by Transcriptome and Mendelian Randomization Integration

Xiaobin Zhang

Xiaobin Zhang

Department of Gastrointestinal Surgery , The Sixth Affiliated Hospital , School of Medicine , South China University of Technology , No. 120 Guidan Road, Foshan , China , scut.edu.cn

Search for more papers by this author
Qingshui Yang

Qingshui Yang

Department of Gastrointestinal Surgery , The Sixth Affiliated Hospital , School of Medicine , South China University of Technology , No. 120 Guidan Road, Foshan , China , scut.edu.cn

Search for more papers by this author
Zongyu Liang

Zongyu Liang

Department of Gastrointestinal Surgery , The Sixth Affiliated Hospital , School of Medicine , South China University of Technology , No. 120 Guidan Road, Foshan , China , scut.edu.cn

Search for more papers by this author
Zhu Li

Zhu Li

Department of Gastrointestinal Surgery , The Sixth Affiliated Hospital , School of Medicine , South China University of Technology , No. 120 Guidan Road, Foshan , China , scut.edu.cn

Search for more papers by this author
Chunlei Lu

Corresponding Author

Chunlei Lu

Digestive Disease Center , Wuxi Mingci Hospital , No. 599 Zhongnan Road, Wuxi , China , ddcofnj.com

Search for more papers by this author
First published: 27 March 2025
Academic Editor: Mohammad Reza Kalhori

Abstract

Background: SUMOylation is a posttranslational protein modification, which is involved in tumorigenesis, aggression, metastasis, drug resistance, and prognosis, while the molecular characteristics and prognostic values of the SUMOylation remain unclear.

Methods: The transcriptomic data were downloaded from The Cancer Genome Atlas (TCGA) and the Gene Expression Omnibus (GEO), and summary-level data of genome-wide association studies (GWAS) and expression quantitative trait locus (eQTL) from European ancestry were collected. The SUMOylation patterns of CRC patients, tumor microenvironment (TME) immune cell infiltrating characteristics, biological function therapeutic responses, and prognostic signatures were identified. Mendelian randomization (MR) analysis explored the causality between prognostic signatures and CRC.

Results: Three SUMOylation-related clusters were classified, and Cluster 2 showed the worst survival status, most populations of infiltrated immune cells, responses to anti-CTLA-4 and anti-PD-1 therapies, and sensitivity to chemotherapy. Nine SUMOylation-related signatures (NDC1, PPARGC1A, CDKN2A, UHRF2, NUP54, PIAS3, H4C4, CHD3, and SUMO2) were selected and validated as prognostic signatures. A predictive nomogram was constructed and validated. Finally, NUP54 was positive, but PPARGC1A was negatively associated with the risk of CRC.

Conclusion: This study first comprehensively explored the molecular characteristics and prognostic values of SUMOylation and identified the possible biomarkers for treatment in CRC.

1. Introduction

Colorectal cancer (CRC), which comprises colon and/or rectum cancer, is one of common and serious gastrointestinal tumors, and CRC is the third most diagnosed and the second leading cause of cancer mortality, with more than 1.9 million new cases and 935,000 deaths in 2020 [1]. The risk factors for CRC include nonmodifiable risk factors (sex, age, race, the history of adenomatous polyps, inflammatory bowel disease history, and family history) and modifiable factors (carnivorous lifestyle, sedentary lifestyle, decreased physical activity, overweight, smoking, and alcohol consumption) [1, 2]. Patients with early stage can be treated by surgical approach, but most advanced patients with advanced stage likely adapt to chemotherapy [3, 4], targeted therapy [4, 5], gene therapy [6, 7], immunotherapy [811], adoptive T-cell therapy [12], complement inhibition [13], cytokine therapy [14], and other traditional natural products [1517]. Despite the development of the screening, diagnosis, and treatment for CRC in the past decades, recurrence, metastasis, and drug resistance remain the challenge for patients [2, 18]. Therefore, it is critical to explore the evaluable biomarkers for early identification and prevention of tumor aggression for improvement of the prognosis.

Small ubiquitin-like modifier (SUMO)ylation is one of the reversible posttranslational modifications (PTMs) and an important regulatory mechanism of cellular proteins [19]. SUMO protein is a type of protein similar to ubiquitin in molecular structure and is highly conserved in evolution; the mammals’ SUMO proteins contain three isoforms, which are SUMO-1, SUMO-2, and SUMO-3 [20, 21]. SUMOylation involves several biological function regulations, including DNA damage repair, immune responses, carcinogenesis, cell cycle progression, cell death, mitochondrial division, ion channels, and biological rhythms [22]. SUMOylation is widely found in carcinogenesis, DNA damage response, cancer cell proliferation, metastasis, and apoptosis in cancers [22, 23]. Increasingly, studies have indicated that SUMO pathway components play as possible cancer biomarkers [24, 25], and target SUMOylation has been applied in clinical trials [26]. Several SUMO biomarkers have been found in CRC. SUMOylation of METTL3 or IQGAP1 promotes CRC progression [27, 28]. SUMOylation of the KH3 domain of heterogeneous nuclear ribonucleoprotein K (hnRNPK) prevents the tumorigenesis of CRC [29]. Considering the specific roles of SUMOylation in tumorigenesis, aggression, and metastasis in CRC, the molecular characteristics and prognostic values of SUMOylation need to be explored.

Mendelian randomization (MR) is an epidemiological approach, which applies genetic variants as the instrumental variables (IVs) to explore the causality between exposures and outcomes [30]. MR is an approach that can overcome confounding which may be attributed to the inability to randomize the exposure variable in observational studies [31]. In recent years, several MR studies have uncovered causal roles of common biomarkers in cancers, for example, C-reactive protein (CRP) has been identified as a potential biomarker to assess the risk of overall survival (OS) of pancancers [32]. GREM1, CLSTN3, CSF2RA, and CD86 are identified as biomarkers for CRC, and POLR2F, CSF2RA, CD86, and MMP2 were identified as the possible drug targets for CRC [33].

In the present study, we first performed a comprehensive analysis of the SUMOylation molecular characteristics and identification of the biomarkers for prognosis of CRC based on the transcriptomic data and published genome-wide association studies (GWAS) summary statistics.

2. Materials and Methods

2.1. Transcriptomic Data Obtained and Acquisition of the SUMOylation-Related Genes

RNA expression profile data (fragments per kilobase million [FPKM]) and corresponding clinical data for CRC patients and the corresponding clinical information of CRC patients (The Cancer Genome Atlas [TCGA]–COAD cohort) were downloaded from TCGA (https://portal.gdc.cancer.gov/) database. Additionally, the RNA expression profile data and the corresponding clinical information of CRC patients were also downloaded from Gene Expression Omnibus (GEO, https://www.ncbi.nlm.nih.gov/geo/) using the getGEO function in the GEOquery package. The GSE17538 and GSE29623 datasets were used for further validation.

A total of 187 SUMOylation-related genes were identified from the MSigDB (https://www.gsea-msigdb.org/gsea/msigdb/index.jsp). The gene lists are shown in Table S1.

2.2. Screening Differentially Expressed Genes (DEGs) Between CRC Tumor Tissues and Normal Tissues

Limma package was conducted to identify the DEGs between CRC tumor tissues and normal tissues, and the differences were identified with parameters as |log2 fold change (FC)| > 1 and false discovery rate (FDR) < 0.01. The ggplot2 and heatmap packages were used to draw the volcano plots and heatmap.

2.3. Gene Set Enrichment Analysis (GSEA)

GSEA was performed with the SUMOylation-related gene sets from the “MsigDB/msigdb.v2023.1.Hs.symbols.gmt” using the GSVA package. Pathways with an adjusted p < 0.05 were considered significantly enriched.

2.4. Nonnegative Matrix Factorization (NMF) Clustering

Based on the expression of 187 SUMOylation-related genes, the NMF package was used to identify the SUMOylation clusters for CRC patients, the number of the clusters was determined by the k value, and the optimal k value was selected when the cophenetic correlation coefficient started to decrease. Then, principal component analysis (PCA) and t-distributed stochastic neighbor embedding (t-SNE) were performed to evaluate the clustering results. Meanwhile, the differences in Kaplan–Meier OS and clinical characteristics for separated clusters were investigated.

2.5. Identification of the DEGs and Biological Functions Between SUMOylation Clusters

The DEGs among the three clusters were identified by comparing each cluster with the rest clusters according to the thresholds of |log2 FC| > 0.585 and FDR > 0.05. The Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) were performed to evaluate the biological functions of each cluster using the clusterProfiler package.

2.6. Investigation of the Immune Characteristics Between SUMOylation Clusters

For immune characteristic analyses, the ESTIMATE package was conducted to evaluate the stromal score, immune score, ESTIMATE score, and tumor purity. Meanwhile, the single-sample GSEA (ssGSEA) method was used to estimate the infiltration of the immune cells. The differences among separated clusters were detected by the Kruskal–Wallis test. p < 0.05 was considered statistically significant. GSEA was performed with the HALLMARK gene sets from the “MsigDB/h.all.v2022.1.Hs.symbols.gmt” using the GSVA package. Pathways with an adjusted p < 0.05 were considered significantly enriched.

2.7. Prediction of the Response to Chemotherapy and Immunotherapy Between SUMOylation Clusters

The differentially expressed HLA Class I and II genes, IFN-γ signaling signatures, and the immune checkpoints among separated clusters were explored by the Kruskal–Wallis test. p < 0.05 was considered statistically significant. The responses to cytotoxic T lymphocyte antigen-4 (CTLA-4) and programmed cell death protein 1 (PD-1) blockers were predicted based on GenePattern (https://www.genepattern.org/). Half-maximal inhibitory concentration (IC50) values of antitumor drugs were calculated using oncoPredict package based on https://www.cancerrxgene.org (GDSC2, https://www.cancerrxgene.org). The differences in drug responses were tested by using the Kruskal–Wallis test. p < 0.05 was considered statistically significant.

2.8. Construction and Validation of the Prognostic SUMOylation Score

The hub SUMOylation-related genes were selected by intersecting the DEGs among separated clusters and 187 SUMOylation-related genes collected from the MSigDB database. The intersected SUMOylation-related genes were incorporated into a univariate Cox regression model to identify the prognostic SUMOylation-related genes with the p < 0.1. These prognostic SUMOylation-related genes were shrunk by the least absolute shrinkage and selection operator (LASSO) Cox regression model using the glmnet package. The SUMOylation risk score was calculated according to the expression and LASSO coefficient of each gene. CRC patients were then divided into high- and low-SUMOylation groups according to the median SUMOylation score, and the differences in Kaplan–Meier OS between high- and low-risk groups were investigated. The receiver operating characteristic (ROC) curves for 1, 3, and 5 years were drawn, and the area under the curve (AUC) was used to test the prediction efficiency of the SUMOylation risk score. Two external datasets (GSE17538 and GSE29623) also were used to validate the prediction efficiency of the SUMOylation risk score.

2.9. Construction of a Nomogram

Univariate and multivariate Cox regression analyses were performed to evaluate whether the SUMOylation risk model was independent of several clinical characteristics by incorporating clinical characteristics. The regplot package was used to develop a nomogram to predict the survival of CRC patients. ROC curves, decision curves, and calibration curves were drawn using survivalROC, d curves, and rms packages to determine the prediction efficiency, reliability, and sensitivity, respectively.

2.10. MR Analysis

Two-sample MR was used to assess the potential association between the SUMOylation gene signature and the risk of colon cancer. The expression of the SUMOylation gene signature was considered as the exposure, CRC was considered as the outcome, and the cis-genome-wide expression quantitative trait loci (cis-eQTLs) were used as the instrument variables (IVs). The cis-eQTLs summary data of SUMOylation gene signature were collected from the eQTLGen Consortium (31,684 cases) [34]. The GWAS summary statistics for CRC were obtained from the IEU OpenGWAS project (https://gwas.mrcieu.ac.uk/, 32,072 cases and 19,948 controls) [35]. All data were from individuals of European ancestry. The IVs selected obeyed the rules: (1) The IVs are strongly connected to the exposure; (2) the IVs are not connected to confounders; and (3) the IVs are connected to the outcome only through the exposure. The cis-eQTLs were selected as IVs with the parameters as follows: the cis-eQTLs associated with the exposures at genome-wide significance (p < 5 × 10−8), and the cutoff for linkage disequilibrium (LD) was R2 < 0.3 at a 10,000-kb window to ensure independence between IVs, minor allele frequency > 0.01, and F-statistic > 10. The causality between SUMOylation and CRC was investigated using the Wald ratio or inverse-variance weighted (IVW) [36], weighted median (WM) estimator [37], MR-Egger regression [38], MR robust adjusted profile score (MR.RAPS) [39], and Mendelian Randomization Pleiotropy RESidual Sum and Outlier (MR-PRESSO) methods [40]. MR analyses were performed by the “TwoSampleMR” and “mr.raps” packages in R (4.2.3), and p value < 0.05 was considered a significant association. Additionally, the MR-Egger intercept test and Cochran’s Q statistic were performed to test the pleiotropy and the heterogeneity, respectively [41].

The Steiger filtering method was conducted to examine the causal direction of IVs, and the leave-one-out method was used to assess the sensitivity of the MR results by sequentially removing an SNP to investigate whether the MR estimate was driven by a single SNP. Due to multiple analyses, Bonferroni correction p value < 0.05/7 (the number of genes) = 0.00714 was considered a significant difference. A p value ranging from 1 0.05 to 0.05/7 was considered a suggestive association.

3. Results

3.1. Identification of the DEGs and the SUMOylation-Related Pathways in CRC

A total of 3444 DEGs (1885 upregulated and 1559 downregulated) were identified between CRC tumor tissues and normal tissues with |log2 FC| > 1 and FDR < 0.01 (Figures 1(a), 1(b), Table S2). Moreover, we conducted GSEA to explore the SUMOylation pathway enrichment in CRC compared to the normal tissues by ranking the 3444 DEGs. The results indicated that the SUMOylation-related pathways were significantly enriched in CRC, such as SUMOylation of DNA damage and repair proteins, DNA replication proteins, RNA binding proteins, ubiquitination proteins, SUMOylation proteins, chromatin organization proteins, SUMOylation, DNA methylation proteins, intracellular receptors, and transcription cofactors and factors (Figure 1(c), Table S3).

Details are in the caption following the image
Identification of the DEGs and the SUMOylation-related pathways in CRC. (a) Volcano plot of the DEGs between CRC tumor tissues and normal tissues with |log2 FC| > 1 and FDR < 0.01. (b) Heatmap of the DEGs between CRC tumor tissues and normal tissues with |log2 FC| > 1 and FDR < 0.01. (c) GSEA of the enriched SUMOylation-related biological functions between tumor and normal tumor tissue.
Details are in the caption following the image
Identification of the DEGs and the SUMOylation-related pathways in CRC. (a) Volcano plot of the DEGs between CRC tumor tissues and normal tissues with |log2 FC| > 1 and FDR < 0.01. (b) Heatmap of the DEGs between CRC tumor tissues and normal tissues with |log2 FC| > 1 and FDR < 0.01. (c) GSEA of the enriched SUMOylation-related biological functions between tumor and normal tumor tissue.
Details are in the caption following the image
Identification of the DEGs and the SUMOylation-related pathways in CRC. (a) Volcano plot of the DEGs between CRC tumor tissues and normal tissues with |log2 FC| > 1 and FDR < 0.01. (b) Heatmap of the DEGs between CRC tumor tissues and normal tissues with |log2 FC| > 1 and FDR < 0.01. (c) GSEA of the enriched SUMOylation-related biological functions between tumor and normal tumor tissue.

3.2. The Characteristics of the Three SUMOylation-Related Clusters in CRC

The above results revealed that SUMOylation played a vital role in CRC progression. Therefore, the NMF algorithm was performed to explore the relationship between SUMOylation-related gene expression and tumor classification. Based on the 187 SUMOylation-related gene expression, the patients were clustered into three clusters (Cluster 1, n = 57 cases; Cluster 2, n = 121 cases, and Cluster 3, n = 258 cases) according to the k value, which was determined by the cophenetic correlation coefficient (Figures 2(a), 2(b), Table S4). Meanwhile, the PCA and t-SNE analyses indicated that the patients were distributed into three groups, which is consistent with the NMF results (Figures 2(c), 2(d)). Kaplan–Meier survival curve revealed the worst survival status for patients in Cluster 2 (Figure 2(e)). Moreover, we also found a correlation between SUMOylation clustering and microsatellite instability (MSI) in CRC (Figure 2(f)).

Details are in the caption following the image
The characteristics of the three SUMOylation-related clusters in CRC. (a)–(b) The NMF clustering heatmap of the mixture coefficient matrix and connectivity matrix with rank = 3. (c) PCA of the stratification into three SUMOylation-related clusters. (d) T-SNE analysis of the stratification into three SUMOylation-related clusters. (e) Kaplan–Meier overall survival curve of the patients between three SUMOylation-related clusters. (f) The heatmap shows the correlation between SUMOylation clustering and clinical features.
Details are in the caption following the image
The characteristics of the three SUMOylation-related clusters in CRC. (a)–(b) The NMF clustering heatmap of the mixture coefficient matrix and connectivity matrix with rank = 3. (c) PCA of the stratification into three SUMOylation-related clusters. (d) T-SNE analysis of the stratification into three SUMOylation-related clusters. (e) Kaplan–Meier overall survival curve of the patients between three SUMOylation-related clusters. (f) The heatmap shows the correlation between SUMOylation clustering and clinical features.
Details are in the caption following the image
The characteristics of the three SUMOylation-related clusters in CRC. (a)–(b) The NMF clustering heatmap of the mixture coefficient matrix and connectivity matrix with rank = 3. (c) PCA of the stratification into three SUMOylation-related clusters. (d) T-SNE analysis of the stratification into three SUMOylation-related clusters. (e) Kaplan–Meier overall survival curve of the patients between three SUMOylation-related clusters. (f) The heatmap shows the correlation between SUMOylation clustering and clinical features.
Details are in the caption following the image
The characteristics of the three SUMOylation-related clusters in CRC. (a)–(b) The NMF clustering heatmap of the mixture coefficient matrix and connectivity matrix with rank = 3. (c) PCA of the stratification into three SUMOylation-related clusters. (d) T-SNE analysis of the stratification into three SUMOylation-related clusters. (e) Kaplan–Meier overall survival curve of the patients between three SUMOylation-related clusters. (f) The heatmap shows the correlation between SUMOylation clustering and clinical features.
Details are in the caption following the image
The characteristics of the three SUMOylation-related clusters in CRC. (a)–(b) The NMF clustering heatmap of the mixture coefficient matrix and connectivity matrix with rank = 3. (c) PCA of the stratification into three SUMOylation-related clusters. (d) T-SNE analysis of the stratification into three SUMOylation-related clusters. (e) Kaplan–Meier overall survival curve of the patients between three SUMOylation-related clusters. (f) The heatmap shows the correlation between SUMOylation clustering and clinical features.
Details are in the caption following the image
The characteristics of the three SUMOylation-related clusters in CRC. (a)–(b) The NMF clustering heatmap of the mixture coefficient matrix and connectivity matrix with rank = 3. (c) PCA of the stratification into three SUMOylation-related clusters. (d) T-SNE analysis of the stratification into three SUMOylation-related clusters. (e) Kaplan–Meier overall survival curve of the patients between three SUMOylation-related clusters. (f) The heatmap shows the correlation between SUMOylation clustering and clinical features.

3.3. Biological Functions of Each SUMOylation-Related Cluster in CRC

A total of 5511 DEGs (1094 upregulated and 4417 downregulated), 2711 DEGs (2331 upregulated and 380 downregulated), and 1526 DEGs (291 upregulated and 1235 downregulated) were screened out in Cluster 1, Cluster 2, and Cluster 3, respectively (Tables S5S7). Biological function analyses indicated that three clusters were involved in differential biological processes, such as DEGs of Cluster 1, and were involved in nucleosome assembly, protein localization to chromosome, and negative regulation of myeloid cell differentiation (Figures 3(a), 3(b), 3(c)). KEGG analysis indicated that Cluster 1 is associated with viral carcinogenesis, alcoholism, and neutrophil extracellular trap formation (Figure 3(d)). The DEGs of Cluster 2 were associated with several biological processes, such as the intracellular steroid hormone receptor signaling pathway and intracellular receptor signaling pathway (Figures 3(e), 3(f), 3(g)) and involved in chemical carcinogenesis-receptor activation (Figure 3(h)). The DEGs of Cluster 3 were mainly enriched in nuclear transport, nucleocytoplasmic transport, protein SUMOylation, and nucleobase-containing compound transport (Figures 3(i), 3(j), 3(k)). KEGG analysis revealed that Cluster 3 was associated with platinum drug resistance and nucleocytoplasmic transport (Figure 3(l)).

Details are in the caption following the image
Biological functions of each SUMOylation-related cluster in CRC. (a)–(d) The bubble plots of the GO (biological process, BP; cellular component, CC; molecular function, MF) and KEGG pathway enrichment in SUMOylation-related Cluster 1. (e)–(h) The bubble plots of the GO (biological process, BP; cellular component, CC; molecular function, MF) and KEGG pathway enrichment in SUMOylation-related Cluster 2. (i)–(l) The bubble plots of the GO (biological process, BP; cellular component, CC; molecular function, MF) and KEGG pathway enrichment in SUMOylation-related Cluster 3.
Details are in the caption following the image
Biological functions of each SUMOylation-related cluster in CRC. (a)–(d) The bubble plots of the GO (biological process, BP; cellular component, CC; molecular function, MF) and KEGG pathway enrichment in SUMOylation-related Cluster 1. (e)–(h) The bubble plots of the GO (biological process, BP; cellular component, CC; molecular function, MF) and KEGG pathway enrichment in SUMOylation-related Cluster 2. (i)–(l) The bubble plots of the GO (biological process, BP; cellular component, CC; molecular function, MF) and KEGG pathway enrichment in SUMOylation-related Cluster 3.
Details are in the caption following the image
Biological functions of each SUMOylation-related cluster in CRC. (a)–(d) The bubble plots of the GO (biological process, BP; cellular component, CC; molecular function, MF) and KEGG pathway enrichment in SUMOylation-related Cluster 1. (e)–(h) The bubble plots of the GO (biological process, BP; cellular component, CC; molecular function, MF) and KEGG pathway enrichment in SUMOylation-related Cluster 2. (i)–(l) The bubble plots of the GO (biological process, BP; cellular component, CC; molecular function, MF) and KEGG pathway enrichment in SUMOylation-related Cluster 3.
Details are in the caption following the image
Biological functions of each SUMOylation-related cluster in CRC. (a)–(d) The bubble plots of the GO (biological process, BP; cellular component, CC; molecular function, MF) and KEGG pathway enrichment in SUMOylation-related Cluster 1. (e)–(h) The bubble plots of the GO (biological process, BP; cellular component, CC; molecular function, MF) and KEGG pathway enrichment in SUMOylation-related Cluster 2. (i)–(l) The bubble plots of the GO (biological process, BP; cellular component, CC; molecular function, MF) and KEGG pathway enrichment in SUMOylation-related Cluster 3.
Details are in the caption following the image
Biological functions of each SUMOylation-related cluster in CRC. (a)–(d) The bubble plots of the GO (biological process, BP; cellular component, CC; molecular function, MF) and KEGG pathway enrichment in SUMOylation-related Cluster 1. (e)–(h) The bubble plots of the GO (biological process, BP; cellular component, CC; molecular function, MF) and KEGG pathway enrichment in SUMOylation-related Cluster 2. (i)–(l) The bubble plots of the GO (biological process, BP; cellular component, CC; molecular function, MF) and KEGG pathway enrichment in SUMOylation-related Cluster 3.
Details are in the caption following the image
Biological functions of each SUMOylation-related cluster in CRC. (a)–(d) The bubble plots of the GO (biological process, BP; cellular component, CC; molecular function, MF) and KEGG pathway enrichment in SUMOylation-related Cluster 1. (e)–(h) The bubble plots of the GO (biological process, BP; cellular component, CC; molecular function, MF) and KEGG pathway enrichment in SUMOylation-related Cluster 2. (i)–(l) The bubble plots of the GO (biological process, BP; cellular component, CC; molecular function, MF) and KEGG pathway enrichment in SUMOylation-related Cluster 3.
Details are in the caption following the image
Biological functions of each SUMOylation-related cluster in CRC. (a)–(d) The bubble plots of the GO (biological process, BP; cellular component, CC; molecular function, MF) and KEGG pathway enrichment in SUMOylation-related Cluster 1. (e)–(h) The bubble plots of the GO (biological process, BP; cellular component, CC; molecular function, MF) and KEGG pathway enrichment in SUMOylation-related Cluster 2. (i)–(l) The bubble plots of the GO (biological process, BP; cellular component, CC; molecular function, MF) and KEGG pathway enrichment in SUMOylation-related Cluster 3.
Details are in the caption following the image
Biological functions of each SUMOylation-related cluster in CRC. (a)–(d) The bubble plots of the GO (biological process, BP; cellular component, CC; molecular function, MF) and KEGG pathway enrichment in SUMOylation-related Cluster 1. (e)–(h) The bubble plots of the GO (biological process, BP; cellular component, CC; molecular function, MF) and KEGG pathway enrichment in SUMOylation-related Cluster 2. (i)–(l) The bubble plots of the GO (biological process, BP; cellular component, CC; molecular function, MF) and KEGG pathway enrichment in SUMOylation-related Cluster 3.
Details are in the caption following the image
Biological functions of each SUMOylation-related cluster in CRC. (a)–(d) The bubble plots of the GO (biological process, BP; cellular component, CC; molecular function, MF) and KEGG pathway enrichment in SUMOylation-related Cluster 1. (e)–(h) The bubble plots of the GO (biological process, BP; cellular component, CC; molecular function, MF) and KEGG pathway enrichment in SUMOylation-related Cluster 2. (i)–(l) The bubble plots of the GO (biological process, BP; cellular component, CC; molecular function, MF) and KEGG pathway enrichment in SUMOylation-related Cluster 3.
Details are in the caption following the image
Biological functions of each SUMOylation-related cluster in CRC. (a)–(d) The bubble plots of the GO (biological process, BP; cellular component, CC; molecular function, MF) and KEGG pathway enrichment in SUMOylation-related Cluster 1. (e)–(h) The bubble plots of the GO (biological process, BP; cellular component, CC; molecular function, MF) and KEGG pathway enrichment in SUMOylation-related Cluster 2. (i)–(l) The bubble plots of the GO (biological process, BP; cellular component, CC; molecular function, MF) and KEGG pathway enrichment in SUMOylation-related Cluster 3.
Details are in the caption following the image
Biological functions of each SUMOylation-related cluster in CRC. (a)–(d) The bubble plots of the GO (biological process, BP; cellular component, CC; molecular function, MF) and KEGG pathway enrichment in SUMOylation-related Cluster 1. (e)–(h) The bubble plots of the GO (biological process, BP; cellular component, CC; molecular function, MF) and KEGG pathway enrichment in SUMOylation-related Cluster 2. (i)–(l) The bubble plots of the GO (biological process, BP; cellular component, CC; molecular function, MF) and KEGG pathway enrichment in SUMOylation-related Cluster 3.
Details are in the caption following the image
Biological functions of each SUMOylation-related cluster in CRC. (a)–(d) The bubble plots of the GO (biological process, BP; cellular component, CC; molecular function, MF) and KEGG pathway enrichment in SUMOylation-related Cluster 1. (e)–(h) The bubble plots of the GO (biological process, BP; cellular component, CC; molecular function, MF) and KEGG pathway enrichment in SUMOylation-related Cluster 2. (i)–(l) The bubble plots of the GO (biological process, BP; cellular component, CC; molecular function, MF) and KEGG pathway enrichment in SUMOylation-related Cluster 3.

3.4. Immune Characteristics in Tumor Microenvironment (TME) in SUMOylation-Related Clusters

We further investigated the immune characteristics in TME for each cluster. The ESTIMATE algorithm was used to assess the ESTIMATE score, stromal score, immune score, and tumor purity, which indicated the number of tumor cells, infiltrated immune cells, and stromal cells in TME, respectively. Here, we found the highest ESTIMATE score, immune score, and stromal score but the lowest tumor purity in Cluster 2 (Figures 4(a), 4(b), 4(c), 4(d)). Inversely, there were the lowest ESTIMATE score, immune score, and stromal score but the highest tumor purity in Cluster 3 (Figures 4(a), 4(b), 4(c), 4(d)). Then, we explored the different fractions of the infiltrated immune cells in TME using ssGSEA, and most of infiltrated immune cells were observed in Cluster 2, such as activated dendritic cells (DCs), CD56bright natural killer (NK) cells, CD56dim NK cells, eosinophil, immature DCs, myeloid-derived suppressor cells (MDSCs), macrophage, mast cells, NKT cells, neutrophil, plasmacytoid DCs, activated B cells, activated CD4 T cells, activated CD8 T cells, central memory CD4 T cells, central memory CD8 T cells, effector memory CD4 T cells, effector memory CD8 T cells, gamma delta T cells, immature B cells, memory B cells, regulatory T cells, T follicular helper cells, Type 1 T helper cells, and Type 2 T helper cells. Notably, the fraction of Type 17 T helper cells was the lowest (Figures 4(e), 4(f)). However, the opposite result was observed in Cluster 3. Moreover, different HALLMAKER pathways were enriched in different clusters; for example, DNA repair, MYC targets V1 and V2, oxidative phosphorylation, mTORc1 signaling, E2F targets, and G2M checkpoint were mainly enriched in Cluster 3 (Figure 4(g)). But most other pathways, including several immune-related pathways, were enriched in Cluster 2, such as the P53 pathway, TNF-α signaling vis NF-κB, inflammatory response, IL-6/JAK/STAT3 signaling, IL2/STAT5 signaling, interferon-gamma (INF-γ) response, interferon-alpha (INF-α) response, TGF-β signaling, and Notch signaling (Figure 4(g)).

Details are in the caption following the image
Immune characteristics in tumor microenvironment (TME) in SUMOylation-related clusters. (a)–(d) The boxplots show the differences in ESTIMATE score, immune score, stromal score, and tumor purity in the SUMOylation-related clusters. (e)–(f) The boxplots show the differential fraction of 28 types of immune cells in the SUMOylation-related clusters. (g) GSEA of the HALLMARKER pathways in the SUMOylation-related clusters.
Details are in the caption following the image
Immune characteristics in tumor microenvironment (TME) in SUMOylation-related clusters. (a)–(d) The boxplots show the differences in ESTIMATE score, immune score, stromal score, and tumor purity in the SUMOylation-related clusters. (e)–(f) The boxplots show the differential fraction of 28 types of immune cells in the SUMOylation-related clusters. (g) GSEA of the HALLMARKER pathways in the SUMOylation-related clusters.
Details are in the caption following the image
Immune characteristics in tumor microenvironment (TME) in SUMOylation-related clusters. (a)–(d) The boxplots show the differences in ESTIMATE score, immune score, stromal score, and tumor purity in the SUMOylation-related clusters. (e)–(f) The boxplots show the differential fraction of 28 types of immune cells in the SUMOylation-related clusters. (g) GSEA of the HALLMARKER pathways in the SUMOylation-related clusters.
Details are in the caption following the image
Immune characteristics in tumor microenvironment (TME) in SUMOylation-related clusters. (a)–(d) The boxplots show the differences in ESTIMATE score, immune score, stromal score, and tumor purity in the SUMOylation-related clusters. (e)–(f) The boxplots show the differential fraction of 28 types of immune cells in the SUMOylation-related clusters. (g) GSEA of the HALLMARKER pathways in the SUMOylation-related clusters.
Details are in the caption following the image
Immune characteristics in tumor microenvironment (TME) in SUMOylation-related clusters. (a)–(d) The boxplots show the differences in ESTIMATE score, immune score, stromal score, and tumor purity in the SUMOylation-related clusters. (e)–(f) The boxplots show the differential fraction of 28 types of immune cells in the SUMOylation-related clusters. (g) GSEA of the HALLMARKER pathways in the SUMOylation-related clusters.
Details are in the caption following the image
Immune characteristics in tumor microenvironment (TME) in SUMOylation-related clusters. (a)–(d) The boxplots show the differences in ESTIMATE score, immune score, stromal score, and tumor purity in the SUMOylation-related clusters. (e)–(f) The boxplots show the differential fraction of 28 types of immune cells in the SUMOylation-related clusters. (g) GSEA of the HALLMARKER pathways in the SUMOylation-related clusters.
Details are in the caption following the image
Immune characteristics in tumor microenvironment (TME) in SUMOylation-related clusters. (a)–(d) The boxplots show the differences in ESTIMATE score, immune score, stromal score, and tumor purity in the SUMOylation-related clusters. (e)–(f) The boxplots show the differential fraction of 28 types of immune cells in the SUMOylation-related clusters. (g) GSEA of the HALLMARKER pathways in the SUMOylation-related clusters.

3.5. Differential Response to Chemotherapy and Immunotherapy for SUMOylation Clusters

We further explored the differential response to therapy in different SUMOylation clusters. The boxplots indicated that the differential expression of the HLA I and II genes, HLA-B, HLA-C, HLA-DMA, HLA-DOA, HLA-DOB, HLA-DPA1, HLA-DPB1, HLA-DQA1, HLA-DQB1, HLA-DRA, HLA-DRB5, HLA-E, HLA-F, and HLA-F, was increased in Cluster 2 than other clusters, but the opposite results were observed in Cluster 1 (Figure 5(a)). We also explored the expression of IFN-γ signaling signatures [42], and the expression of most signatures was increased in Cluster 2 but decreased in Cluster 3, including IFGN, IFNGR1, JAK1, JAK2, and STAT1 (Figure 5(b)). We also analyzed the immune activation signatures and observed the highest expression of APAR, NLR, TCR, and TLR in Cluster 2 (Figure 5(d)). Then, we investigated the expression of the immune checkpoint genes in different SUMOylation clusters, and we found the increased expression of ADORA2A, BTLA, BTN2A1, CD160, CD209, CD226, CD27, CD274, CD276, CD28, CD40, CD40LG, CD47, CD80, CD86, CD96, CTLA 4, HAVCR2, HAL-DRB1, ICOS, IDO1, LAG3, PDCD1, PDCD1LG2, TIGIT, TNFRSF9, TNFSF14, TNFSF18, and TNFSF4 in Cluster 2 compared with other clusters (Figure 5(c)). Furthermore, we predicted the response to anti-CTLA-4 and anti-PD-1 therapies; submap results indicated that Cluster 2 may respond to anti-CTLA-4 and anti-PD-1 immunotherapies (Figures 5(d), 5(e)). Meanwhile, we observed that increased the sensitivity of types of antitumor drugs (AZD1332_1463, AZD8186_1918, BMS-754807_2171, JQ1_2172, NU7441_1038, and WZ4003_1614) in Cluster 2 compared with other clusters (Figure 5(f)).

Details are in the caption following the image
Differential response to chemotherapy and immunotherapy for SUMOylation clusters. (a)–(d) The boxplots show the differences of (a) HLA I and II genes, (b) IFN-γ signaling signatures, (c) immune checkpoint–related genes, and (d) immune activation signatures in the SUMOylation-related clusters. (e) Submap of the difference in response to anti-CTLA-4 and anti-PD-1 therapies in the SUMOylation-related clusters. (f) The boxplots show the differences in sensitivity of antitumor drugs (AZD1332_1463, AZD8186_1918, BMS-754807_2171, JQ1_2172, NU7441_1038, and WZ4003_1614) in the SUMOylation-related clusters.
Details are in the caption following the image
Differential response to chemotherapy and immunotherapy for SUMOylation clusters. (a)–(d) The boxplots show the differences of (a) HLA I and II genes, (b) IFN-γ signaling signatures, (c) immune checkpoint–related genes, and (d) immune activation signatures in the SUMOylation-related clusters. (e) Submap of the difference in response to anti-CTLA-4 and anti-PD-1 therapies in the SUMOylation-related clusters. (f) The boxplots show the differences in sensitivity of antitumor drugs (AZD1332_1463, AZD8186_1918, BMS-754807_2171, JQ1_2172, NU7441_1038, and WZ4003_1614) in the SUMOylation-related clusters.
Details are in the caption following the image
Differential response to chemotherapy and immunotherapy for SUMOylation clusters. (a)–(d) The boxplots show the differences of (a) HLA I and II genes, (b) IFN-γ signaling signatures, (c) immune checkpoint–related genes, and (d) immune activation signatures in the SUMOylation-related clusters. (e) Submap of the difference in response to anti-CTLA-4 and anti-PD-1 therapies in the SUMOylation-related clusters. (f) The boxplots show the differences in sensitivity of antitumor drugs (AZD1332_1463, AZD8186_1918, BMS-754807_2171, JQ1_2172, NU7441_1038, and WZ4003_1614) in the SUMOylation-related clusters.
Details are in the caption following the image
Differential response to chemotherapy and immunotherapy for SUMOylation clusters. (a)–(d) The boxplots show the differences of (a) HLA I and II genes, (b) IFN-γ signaling signatures, (c) immune checkpoint–related genes, and (d) immune activation signatures in the SUMOylation-related clusters. (e) Submap of the difference in response to anti-CTLA-4 and anti-PD-1 therapies in the SUMOylation-related clusters. (f) The boxplots show the differences in sensitivity of antitumor drugs (AZD1332_1463, AZD8186_1918, BMS-754807_2171, JQ1_2172, NU7441_1038, and WZ4003_1614) in the SUMOylation-related clusters.
Details are in the caption following the image
Differential response to chemotherapy and immunotherapy for SUMOylation clusters. (a)–(d) The boxplots show the differences of (a) HLA I and II genes, (b) IFN-γ signaling signatures, (c) immune checkpoint–related genes, and (d) immune activation signatures in the SUMOylation-related clusters. (e) Submap of the difference in response to anti-CTLA-4 and anti-PD-1 therapies in the SUMOylation-related clusters. (f) The boxplots show the differences in sensitivity of antitumor drugs (AZD1332_1463, AZD8186_1918, BMS-754807_2171, JQ1_2172, NU7441_1038, and WZ4003_1614) in the SUMOylation-related clusters.
Details are in the caption following the image
Differential response to chemotherapy and immunotherapy for SUMOylation clusters. (a)–(d) The boxplots show the differences of (a) HLA I and II genes, (b) IFN-γ signaling signatures, (c) immune checkpoint–related genes, and (d) immune activation signatures in the SUMOylation-related clusters. (e) Submap of the difference in response to anti-CTLA-4 and anti-PD-1 therapies in the SUMOylation-related clusters. (f) The boxplots show the differences in sensitivity of antitumor drugs (AZD1332_1463, AZD8186_1918, BMS-754807_2171, JQ1_2172, NU7441_1038, and WZ4003_1614) in the SUMOylation-related clusters.

3.6. Construction and Validation of a SUMOylation Score in CRC

To identify the prognostic SUMOylation-related signatures for CRC, the 77 differentially expressed SUMOylation-related genes were selected by overlapping the 7548 DEGs (Table S8) among SUMOylation-related clusters and 187 SUMOylation-related genes (Figure 6(a)). Univariate Cox analysis was used to identify 11 survival-related SUMOylation-related genes (Figure 6(b)), and then, 9 SUMOylation-related signatures (NDC1, PPARGC1A, CDKN2A, UHRF2, NUP54, PIAS3, H4C4, CHD3, and SUMO2) were selected using LASSO regression analysis (Figures 6(c), 6(d), 6(e)). The SUMOylation risk score was calculated, and then, patients were divided into high- and low-risk score groups (Figures 6(f), 6(g)). The KM survival curve indicated that patients with high-risk scores had worse survival status (Figure 6(h)). ROC curves were drawn to test the prediction efficiency of the risk score, and the AUC values for 1-, 3-, and 5-year survival were 0.669, 0.666, and 0.756, respectively (Figure 6(i)). Meanwhile, external datasets (GSE17538 and GSE17536 datasets) validated the predictive ability of the prognostic SUMOylation-related signatures for CRC; the results were consistent with those of the training dataset (Figures 6(j), 6(k), 6(l), 6(m), 6(n), 6(o), 6(p), 6(q)).

Details are in the caption following the image
Construction and validation of a SUMOylation score in CRC. (a) Venn plot of the intersected SUMOylation-related genes by overlapping DEGs among SUMOylation-related clusters and 187 SUMOylation-related genes downloaded from the MSigDB database. (b) The forest plot of univariate Cox analysis indicated the survival-related genes. (c)–(d) The LASSO regression analysis and partial likelihood deviance on the prognostic genes. (e) The correlation between the prognostic gene and the coefficient of each gene. (f) Risk score and survival outcome of each case in the TCGA–COAD cohort. (g) Heatmap of the differential expression of prognostic signatures between high-risk and low-risk groups. (h) Overall survival curve for the high-risk score and low-risk score groups in the TCGA–COAD cohort. (i) Multi-index ROC curve of risk score for 1-, 3-, and 5-year overall survival of patients in the TCGA–COAD cohort. (j) Risk score and survival outcome of each case in the GSE17538 cohort. (k) Heatmap of the differential expression of prognostic signatures between high-risk and low-risk groups. (l) Overall survival curve for the high-risk score and low-risk score groups in the GSE17538 cohort. (m) Multi-index ROC curve of risk score for 1-, 3-, and 5-year overall survival of patients in the GSE17538 cohort. (n) Risk score and survival outcome of each case in the GSE17536 cohort. (o) Heatmap of the differential expression of prognostic signatures between high-risk and low-risk groups. (p) Overall survival curve for the high-risk score and low-risk score groups in the GSE17536 cohort. (q) Multi-index ROC curve of risk score for 1-, 3-, and 5-year overall survival of patients in the GSE17536 cohort.
Details are in the caption following the image
Construction and validation of a SUMOylation score in CRC. (a) Venn plot of the intersected SUMOylation-related genes by overlapping DEGs among SUMOylation-related clusters and 187 SUMOylation-related genes downloaded from the MSigDB database. (b) The forest plot of univariate Cox analysis indicated the survival-related genes. (c)–(d) The LASSO regression analysis and partial likelihood deviance on the prognostic genes. (e) The correlation between the prognostic gene and the coefficient of each gene. (f) Risk score and survival outcome of each case in the TCGA–COAD cohort. (g) Heatmap of the differential expression of prognostic signatures between high-risk and low-risk groups. (h) Overall survival curve for the high-risk score and low-risk score groups in the TCGA–COAD cohort. (i) Multi-index ROC curve of risk score for 1-, 3-, and 5-year overall survival of patients in the TCGA–COAD cohort. (j) Risk score and survival outcome of each case in the GSE17538 cohort. (k) Heatmap of the differential expression of prognostic signatures between high-risk and low-risk groups. (l) Overall survival curve for the high-risk score and low-risk score groups in the GSE17538 cohort. (m) Multi-index ROC curve of risk score for 1-, 3-, and 5-year overall survival of patients in the GSE17538 cohort. (n) Risk score and survival outcome of each case in the GSE17536 cohort. (o) Heatmap of the differential expression of prognostic signatures between high-risk and low-risk groups. (p) Overall survival curve for the high-risk score and low-risk score groups in the GSE17536 cohort. (q) Multi-index ROC curve of risk score for 1-, 3-, and 5-year overall survival of patients in the GSE17536 cohort.
Details are in the caption following the image
Construction and validation of a SUMOylation score in CRC. (a) Venn plot of the intersected SUMOylation-related genes by overlapping DEGs among SUMOylation-related clusters and 187 SUMOylation-related genes downloaded from the MSigDB database. (b) The forest plot of univariate Cox analysis indicated the survival-related genes. (c)–(d) The LASSO regression analysis and partial likelihood deviance on the prognostic genes. (e) The correlation between the prognostic gene and the coefficient of each gene. (f) Risk score and survival outcome of each case in the TCGA–COAD cohort. (g) Heatmap of the differential expression of prognostic signatures between high-risk and low-risk groups. (h) Overall survival curve for the high-risk score and low-risk score groups in the TCGA–COAD cohort. (i) Multi-index ROC curve of risk score for 1-, 3-, and 5-year overall survival of patients in the TCGA–COAD cohort. (j) Risk score and survival outcome of each case in the GSE17538 cohort. (k) Heatmap of the differential expression of prognostic signatures between high-risk and low-risk groups. (l) Overall survival curve for the high-risk score and low-risk score groups in the GSE17538 cohort. (m) Multi-index ROC curve of risk score for 1-, 3-, and 5-year overall survival of patients in the GSE17538 cohort. (n) Risk score and survival outcome of each case in the GSE17536 cohort. (o) Heatmap of the differential expression of prognostic signatures between high-risk and low-risk groups. (p) Overall survival curve for the high-risk score and low-risk score groups in the GSE17536 cohort. (q) Multi-index ROC curve of risk score for 1-, 3-, and 5-year overall survival of patients in the GSE17536 cohort.
Details are in the caption following the image
Construction and validation of a SUMOylation score in CRC. (a) Venn plot of the intersected SUMOylation-related genes by overlapping DEGs among SUMOylation-related clusters and 187 SUMOylation-related genes downloaded from the MSigDB database. (b) The forest plot of univariate Cox analysis indicated the survival-related genes. (c)–(d) The LASSO regression analysis and partial likelihood deviance on the prognostic genes. (e) The correlation between the prognostic gene and the coefficient of each gene. (f) Risk score and survival outcome of each case in the TCGA–COAD cohort. (g) Heatmap of the differential expression of prognostic signatures between high-risk and low-risk groups. (h) Overall survival curve for the high-risk score and low-risk score groups in the TCGA–COAD cohort. (i) Multi-index ROC curve of risk score for 1-, 3-, and 5-year overall survival of patients in the TCGA–COAD cohort. (j) Risk score and survival outcome of each case in the GSE17538 cohort. (k) Heatmap of the differential expression of prognostic signatures between high-risk and low-risk groups. (l) Overall survival curve for the high-risk score and low-risk score groups in the GSE17538 cohort. (m) Multi-index ROC curve of risk score for 1-, 3-, and 5-year overall survival of patients in the GSE17538 cohort. (n) Risk score and survival outcome of each case in the GSE17536 cohort. (o) Heatmap of the differential expression of prognostic signatures between high-risk and low-risk groups. (p) Overall survival curve for the high-risk score and low-risk score groups in the GSE17536 cohort. (q) Multi-index ROC curve of risk score for 1-, 3-, and 5-year overall survival of patients in the GSE17536 cohort.
Details are in the caption following the image
Construction and validation of a SUMOylation score in CRC. (a) Venn plot of the intersected SUMOylation-related genes by overlapping DEGs among SUMOylation-related clusters and 187 SUMOylation-related genes downloaded from the MSigDB database. (b) The forest plot of univariate Cox analysis indicated the survival-related genes. (c)–(d) The LASSO regression analysis and partial likelihood deviance on the prognostic genes. (e) The correlation between the prognostic gene and the coefficient of each gene. (f) Risk score and survival outcome of each case in the TCGA–COAD cohort. (g) Heatmap of the differential expression of prognostic signatures between high-risk and low-risk groups. (h) Overall survival curve for the high-risk score and low-risk score groups in the TCGA–COAD cohort. (i) Multi-index ROC curve of risk score for 1-, 3-, and 5-year overall survival of patients in the TCGA–COAD cohort. (j) Risk score and survival outcome of each case in the GSE17538 cohort. (k) Heatmap of the differential expression of prognostic signatures between high-risk and low-risk groups. (l) Overall survival curve for the high-risk score and low-risk score groups in the GSE17538 cohort. (m) Multi-index ROC curve of risk score for 1-, 3-, and 5-year overall survival of patients in the GSE17538 cohort. (n) Risk score and survival outcome of each case in the GSE17536 cohort. (o) Heatmap of the differential expression of prognostic signatures between high-risk and low-risk groups. (p) Overall survival curve for the high-risk score and low-risk score groups in the GSE17536 cohort. (q) Multi-index ROC curve of risk score for 1-, 3-, and 5-year overall survival of patients in the GSE17536 cohort.
Details are in the caption following the image
Construction and validation of a SUMOylation score in CRC. (a) Venn plot of the intersected SUMOylation-related genes by overlapping DEGs among SUMOylation-related clusters and 187 SUMOylation-related genes downloaded from the MSigDB database. (b) The forest plot of univariate Cox analysis indicated the survival-related genes. (c)–(d) The LASSO regression analysis and partial likelihood deviance on the prognostic genes. (e) The correlation between the prognostic gene and the coefficient of each gene. (f) Risk score and survival outcome of each case in the TCGA–COAD cohort. (g) Heatmap of the differential expression of prognostic signatures between high-risk and low-risk groups. (h) Overall survival curve for the high-risk score and low-risk score groups in the TCGA–COAD cohort. (i) Multi-index ROC curve of risk score for 1-, 3-, and 5-year overall survival of patients in the TCGA–COAD cohort. (j) Risk score and survival outcome of each case in the GSE17538 cohort. (k) Heatmap of the differential expression of prognostic signatures between high-risk and low-risk groups. (l) Overall survival curve for the high-risk score and low-risk score groups in the GSE17538 cohort. (m) Multi-index ROC curve of risk score for 1-, 3-, and 5-year overall survival of patients in the GSE17538 cohort. (n) Risk score and survival outcome of each case in the GSE17536 cohort. (o) Heatmap of the differential expression of prognostic signatures between high-risk and low-risk groups. (p) Overall survival curve for the high-risk score and low-risk score groups in the GSE17536 cohort. (q) Multi-index ROC curve of risk score for 1-, 3-, and 5-year overall survival of patients in the GSE17536 cohort.
Details are in the caption following the image
Construction and validation of a SUMOylation score in CRC. (a) Venn plot of the intersected SUMOylation-related genes by overlapping DEGs among SUMOylation-related clusters and 187 SUMOylation-related genes downloaded from the MSigDB database. (b) The forest plot of univariate Cox analysis indicated the survival-related genes. (c)–(d) The LASSO regression analysis and partial likelihood deviance on the prognostic genes. (e) The correlation between the prognostic gene and the coefficient of each gene. (f) Risk score and survival outcome of each case in the TCGA–COAD cohort. (g) Heatmap of the differential expression of prognostic signatures between high-risk and low-risk groups. (h) Overall survival curve for the high-risk score and low-risk score groups in the TCGA–COAD cohort. (i) Multi-index ROC curve of risk score for 1-, 3-, and 5-year overall survival of patients in the TCGA–COAD cohort. (j) Risk score and survival outcome of each case in the GSE17538 cohort. (k) Heatmap of the differential expression of prognostic signatures between high-risk and low-risk groups. (l) Overall survival curve for the high-risk score and low-risk score groups in the GSE17538 cohort. (m) Multi-index ROC curve of risk score for 1-, 3-, and 5-year overall survival of patients in the GSE17538 cohort. (n) Risk score and survival outcome of each case in the GSE17536 cohort. (o) Heatmap of the differential expression of prognostic signatures between high-risk and low-risk groups. (p) Overall survival curve for the high-risk score and low-risk score groups in the GSE17536 cohort. (q) Multi-index ROC curve of risk score for 1-, 3-, and 5-year overall survival of patients in the GSE17536 cohort.
Details are in the caption following the image
Construction and validation of a SUMOylation score in CRC. (a) Venn plot of the intersected SUMOylation-related genes by overlapping DEGs among SUMOylation-related clusters and 187 SUMOylation-related genes downloaded from the MSigDB database. (b) The forest plot of univariate Cox analysis indicated the survival-related genes. (c)–(d) The LASSO regression analysis and partial likelihood deviance on the prognostic genes. (e) The correlation between the prognostic gene and the coefficient of each gene. (f) Risk score and survival outcome of each case in the TCGA–COAD cohort. (g) Heatmap of the differential expression of prognostic signatures between high-risk and low-risk groups. (h) Overall survival curve for the high-risk score and low-risk score groups in the TCGA–COAD cohort. (i) Multi-index ROC curve of risk score for 1-, 3-, and 5-year overall survival of patients in the TCGA–COAD cohort. (j) Risk score and survival outcome of each case in the GSE17538 cohort. (k) Heatmap of the differential expression of prognostic signatures between high-risk and low-risk groups. (l) Overall survival curve for the high-risk score and low-risk score groups in the GSE17538 cohort. (m) Multi-index ROC curve of risk score for 1-, 3-, and 5-year overall survival of patients in the GSE17538 cohort. (n) Risk score and survival outcome of each case in the GSE17536 cohort. (o) Heatmap of the differential expression of prognostic signatures between high-risk and low-risk groups. (p) Overall survival curve for the high-risk score and low-risk score groups in the GSE17536 cohort. (q) Multi-index ROC curve of risk score for 1-, 3-, and 5-year overall survival of patients in the GSE17536 cohort.
Details are in the caption following the image
Construction and validation of a SUMOylation score in CRC. (a) Venn plot of the intersected SUMOylation-related genes by overlapping DEGs among SUMOylation-related clusters and 187 SUMOylation-related genes downloaded from the MSigDB database. (b) The forest plot of univariate Cox analysis indicated the survival-related genes. (c)–(d) The LASSO regression analysis and partial likelihood deviance on the prognostic genes. (e) The correlation between the prognostic gene and the coefficient of each gene. (f) Risk score and survival outcome of each case in the TCGA–COAD cohort. (g) Heatmap of the differential expression of prognostic signatures between high-risk and low-risk groups. (h) Overall survival curve for the high-risk score and low-risk score groups in the TCGA–COAD cohort. (i) Multi-index ROC curve of risk score for 1-, 3-, and 5-year overall survival of patients in the TCGA–COAD cohort. (j) Risk score and survival outcome of each case in the GSE17538 cohort. (k) Heatmap of the differential expression of prognostic signatures between high-risk and low-risk groups. (l) Overall survival curve for the high-risk score and low-risk score groups in the GSE17538 cohort. (m) Multi-index ROC curve of risk score for 1-, 3-, and 5-year overall survival of patients in the GSE17538 cohort. (n) Risk score and survival outcome of each case in the GSE17536 cohort. (o) Heatmap of the differential expression of prognostic signatures between high-risk and low-risk groups. (p) Overall survival curve for the high-risk score and low-risk score groups in the GSE17536 cohort. (q) Multi-index ROC curve of risk score for 1-, 3-, and 5-year overall survival of patients in the GSE17536 cohort.
Details are in the caption following the image
Construction and validation of a SUMOylation score in CRC. (a) Venn plot of the intersected SUMOylation-related genes by overlapping DEGs among SUMOylation-related clusters and 187 SUMOylation-related genes downloaded from the MSigDB database. (b) The forest plot of univariate Cox analysis indicated the survival-related genes. (c)–(d) The LASSO regression analysis and partial likelihood deviance on the prognostic genes. (e) The correlation between the prognostic gene and the coefficient of each gene. (f) Risk score and survival outcome of each case in the TCGA–COAD cohort. (g) Heatmap of the differential expression of prognostic signatures between high-risk and low-risk groups. (h) Overall survival curve for the high-risk score and low-risk score groups in the TCGA–COAD cohort. (i) Multi-index ROC curve of risk score for 1-, 3-, and 5-year overall survival of patients in the TCGA–COAD cohort. (j) Risk score and survival outcome of each case in the GSE17538 cohort. (k) Heatmap of the differential expression of prognostic signatures between high-risk and low-risk groups. (l) Overall survival curve for the high-risk score and low-risk score groups in the GSE17538 cohort. (m) Multi-index ROC curve of risk score for 1-, 3-, and 5-year overall survival of patients in the GSE17538 cohort. (n) Risk score and survival outcome of each case in the GSE17536 cohort. (o) Heatmap of the differential expression of prognostic signatures between high-risk and low-risk groups. (p) Overall survival curve for the high-risk score and low-risk score groups in the GSE17536 cohort. (q) Multi-index ROC curve of risk score for 1-, 3-, and 5-year overall survival of patients in the GSE17536 cohort.
Details are in the caption following the image
Construction and validation of a SUMOylation score in CRC. (a) Venn plot of the intersected SUMOylation-related genes by overlapping DEGs among SUMOylation-related clusters and 187 SUMOylation-related genes downloaded from the MSigDB database. (b) The forest plot of univariate Cox analysis indicated the survival-related genes. (c)–(d) The LASSO regression analysis and partial likelihood deviance on the prognostic genes. (e) The correlation between the prognostic gene and the coefficient of each gene. (f) Risk score and survival outcome of each case in the TCGA–COAD cohort. (g) Heatmap of the differential expression of prognostic signatures between high-risk and low-risk groups. (h) Overall survival curve for the high-risk score and low-risk score groups in the TCGA–COAD cohort. (i) Multi-index ROC curve of risk score for 1-, 3-, and 5-year overall survival of patients in the TCGA–COAD cohort. (j) Risk score and survival outcome of each case in the GSE17538 cohort. (k) Heatmap of the differential expression of prognostic signatures between high-risk and low-risk groups. (l) Overall survival curve for the high-risk score and low-risk score groups in the GSE17538 cohort. (m) Multi-index ROC curve of risk score for 1-, 3-, and 5-year overall survival of patients in the GSE17538 cohort. (n) Risk score and survival outcome of each case in the GSE17536 cohort. (o) Heatmap of the differential expression of prognostic signatures between high-risk and low-risk groups. (p) Overall survival curve for the high-risk score and low-risk score groups in the GSE17536 cohort. (q) Multi-index ROC curve of risk score for 1-, 3-, and 5-year overall survival of patients in the GSE17536 cohort.
Details are in the caption following the image
Construction and validation of a SUMOylation score in CRC. (a) Venn plot of the intersected SUMOylation-related genes by overlapping DEGs among SUMOylation-related clusters and 187 SUMOylation-related genes downloaded from the MSigDB database. (b) The forest plot of univariate Cox analysis indicated the survival-related genes. (c)–(d) The LASSO regression analysis and partial likelihood deviance on the prognostic genes. (e) The correlation between the prognostic gene and the coefficient of each gene. (f) Risk score and survival outcome of each case in the TCGA–COAD cohort. (g) Heatmap of the differential expression of prognostic signatures between high-risk and low-risk groups. (h) Overall survival curve for the high-risk score and low-risk score groups in the TCGA–COAD cohort. (i) Multi-index ROC curve of risk score for 1-, 3-, and 5-year overall survival of patients in the TCGA–COAD cohort. (j) Risk score and survival outcome of each case in the GSE17538 cohort. (k) Heatmap of the differential expression of prognostic signatures between high-risk and low-risk groups. (l) Overall survival curve for the high-risk score and low-risk score groups in the GSE17538 cohort. (m) Multi-index ROC curve of risk score for 1-, 3-, and 5-year overall survival of patients in the GSE17538 cohort. (n) Risk score and survival outcome of each case in the GSE17536 cohort. (o) Heatmap of the differential expression of prognostic signatures between high-risk and low-risk groups. (p) Overall survival curve for the high-risk score and low-risk score groups in the GSE17536 cohort. (q) Multi-index ROC curve of risk score for 1-, 3-, and 5-year overall survival of patients in the GSE17536 cohort.
Details are in the caption following the image
Construction and validation of a SUMOylation score in CRC. (a) Venn plot of the intersected SUMOylation-related genes by overlapping DEGs among SUMOylation-related clusters and 187 SUMOylation-related genes downloaded from the MSigDB database. (b) The forest plot of univariate Cox analysis indicated the survival-related genes. (c)–(d) The LASSO regression analysis and partial likelihood deviance on the prognostic genes. (e) The correlation between the prognostic gene and the coefficient of each gene. (f) Risk score and survival outcome of each case in the TCGA–COAD cohort. (g) Heatmap of the differential expression of prognostic signatures between high-risk and low-risk groups. (h) Overall survival curve for the high-risk score and low-risk score groups in the TCGA–COAD cohort. (i) Multi-index ROC curve of risk score for 1-, 3-, and 5-year overall survival of patients in the TCGA–COAD cohort. (j) Risk score and survival outcome of each case in the GSE17538 cohort. (k) Heatmap of the differential expression of prognostic signatures between high-risk and low-risk groups. (l) Overall survival curve for the high-risk score and low-risk score groups in the GSE17538 cohort. (m) Multi-index ROC curve of risk score for 1-, 3-, and 5-year overall survival of patients in the GSE17538 cohort. (n) Risk score and survival outcome of each case in the GSE17536 cohort. (o) Heatmap of the differential expression of prognostic signatures between high-risk and low-risk groups. (p) Overall survival curve for the high-risk score and low-risk score groups in the GSE17536 cohort. (q) Multi-index ROC curve of risk score for 1-, 3-, and 5-year overall survival of patients in the GSE17536 cohort.
Details are in the caption following the image
Construction and validation of a SUMOylation score in CRC. (a) Venn plot of the intersected SUMOylation-related genes by overlapping DEGs among SUMOylation-related clusters and 187 SUMOylation-related genes downloaded from the MSigDB database. (b) The forest plot of univariate Cox analysis indicated the survival-related genes. (c)–(d) The LASSO regression analysis and partial likelihood deviance on the prognostic genes. (e) The correlation between the prognostic gene and the coefficient of each gene. (f) Risk score and survival outcome of each case in the TCGA–COAD cohort. (g) Heatmap of the differential expression of prognostic signatures between high-risk and low-risk groups. (h) Overall survival curve for the high-risk score and low-risk score groups in the TCGA–COAD cohort. (i) Multi-index ROC curve of risk score for 1-, 3-, and 5-year overall survival of patients in the TCGA–COAD cohort. (j) Risk score and survival outcome of each case in the GSE17538 cohort. (k) Heatmap of the differential expression of prognostic signatures between high-risk and low-risk groups. (l) Overall survival curve for the high-risk score and low-risk score groups in the GSE17538 cohort. (m) Multi-index ROC curve of risk score for 1-, 3-, and 5-year overall survival of patients in the GSE17538 cohort. (n) Risk score and survival outcome of each case in the GSE17536 cohort. (o) Heatmap of the differential expression of prognostic signatures between high-risk and low-risk groups. (p) Overall survival curve for the high-risk score and low-risk score groups in the GSE17536 cohort. (q) Multi-index ROC curve of risk score for 1-, 3-, and 5-year overall survival of patients in the GSE17536 cohort.
Details are in the caption following the image
Construction and validation of a SUMOylation score in CRC. (a) Venn plot of the intersected SUMOylation-related genes by overlapping DEGs among SUMOylation-related clusters and 187 SUMOylation-related genes downloaded from the MSigDB database. (b) The forest plot of univariate Cox analysis indicated the survival-related genes. (c)–(d) The LASSO regression analysis and partial likelihood deviance on the prognostic genes. (e) The correlation between the prognostic gene and the coefficient of each gene. (f) Risk score and survival outcome of each case in the TCGA–COAD cohort. (g) Heatmap of the differential expression of prognostic signatures between high-risk and low-risk groups. (h) Overall survival curve for the high-risk score and low-risk score groups in the TCGA–COAD cohort. (i) Multi-index ROC curve of risk score for 1-, 3-, and 5-year overall survival of patients in the TCGA–COAD cohort. (j) Risk score and survival outcome of each case in the GSE17538 cohort. (k) Heatmap of the differential expression of prognostic signatures between high-risk and low-risk groups. (l) Overall survival curve for the high-risk score and low-risk score groups in the GSE17538 cohort. (m) Multi-index ROC curve of risk score for 1-, 3-, and 5-year overall survival of patients in the GSE17538 cohort. (n) Risk score and survival outcome of each case in the GSE17536 cohort. (o) Heatmap of the differential expression of prognostic signatures between high-risk and low-risk groups. (p) Overall survival curve for the high-risk score and low-risk score groups in the GSE17536 cohort. (q) Multi-index ROC curve of risk score for 1-, 3-, and 5-year overall survival of patients in the GSE17536 cohort.
Details are in the caption following the image
Construction and validation of a SUMOylation score in CRC. (a) Venn plot of the intersected SUMOylation-related genes by overlapping DEGs among SUMOylation-related clusters and 187 SUMOylation-related genes downloaded from the MSigDB database. (b) The forest plot of univariate Cox analysis indicated the survival-related genes. (c)–(d) The LASSO regression analysis and partial likelihood deviance on the prognostic genes. (e) The correlation between the prognostic gene and the coefficient of each gene. (f) Risk score and survival outcome of each case in the TCGA–COAD cohort. (g) Heatmap of the differential expression of prognostic signatures between high-risk and low-risk groups. (h) Overall survival curve for the high-risk score and low-risk score groups in the TCGA–COAD cohort. (i) Multi-index ROC curve of risk score for 1-, 3-, and 5-year overall survival of patients in the TCGA–COAD cohort. (j) Risk score and survival outcome of each case in the GSE17538 cohort. (k) Heatmap of the differential expression of prognostic signatures between high-risk and low-risk groups. (l) Overall survival curve for the high-risk score and low-risk score groups in the GSE17538 cohort. (m) Multi-index ROC curve of risk score for 1-, 3-, and 5-year overall survival of patients in the GSE17538 cohort. (n) Risk score and survival outcome of each case in the GSE17536 cohort. (o) Heatmap of the differential expression of prognostic signatures between high-risk and low-risk groups. (p) Overall survival curve for the high-risk score and low-risk score groups in the GSE17536 cohort. (q) Multi-index ROC curve of risk score for 1-, 3-, and 5-year overall survival of patients in the GSE17536 cohort.
Details are in the caption following the image
Construction and validation of a SUMOylation score in CRC. (a) Venn plot of the intersected SUMOylation-related genes by overlapping DEGs among SUMOylation-related clusters and 187 SUMOylation-related genes downloaded from the MSigDB database. (b) The forest plot of univariate Cox analysis indicated the survival-related genes. (c)–(d) The LASSO regression analysis and partial likelihood deviance on the prognostic genes. (e) The correlation between the prognostic gene and the coefficient of each gene. (f) Risk score and survival outcome of each case in the TCGA–COAD cohort. (g) Heatmap of the differential expression of prognostic signatures between high-risk and low-risk groups. (h) Overall survival curve for the high-risk score and low-risk score groups in the TCGA–COAD cohort. (i) Multi-index ROC curve of risk score for 1-, 3-, and 5-year overall survival of patients in the TCGA–COAD cohort. (j) Risk score and survival outcome of each case in the GSE17538 cohort. (k) Heatmap of the differential expression of prognostic signatures between high-risk and low-risk groups. (l) Overall survival curve for the high-risk score and low-risk score groups in the GSE17538 cohort. (m) Multi-index ROC curve of risk score for 1-, 3-, and 5-year overall survival of patients in the GSE17538 cohort. (n) Risk score and survival outcome of each case in the GSE17536 cohort. (o) Heatmap of the differential expression of prognostic signatures between high-risk and low-risk groups. (p) Overall survival curve for the high-risk score and low-risk score groups in the GSE17536 cohort. (q) Multi-index ROC curve of risk score for 1-, 3-, and 5-year overall survival of patients in the GSE17536 cohort.

3.7. Development of a Predictive Nomogram

The clinical features (age, gender, clinical stages, and MSI status) and SUMOylation risk score were analyzed by univariate and multivariate Cox regression models, and SUMOylation risk score and clinical stages were identified as the independent risk factors for CRC (Figures 7(a), 7(b)). Therefore, a predictive nomogram was constructed based on the SUMOylation risk score and clinical stages (Figure 7(c)). The AUC values of the ROC curves for 1-, 3-, and 5-year OS were 0.695, 0.692, and 0.761, respectively (Figure 7(d)), reminding us that the nomogram had excellent predictive value for prognosis in CRC. The calibration curve revealed that the nomogram had a high accuracy for prediction (Figure 7(e)). The DCA indicated that the predictive nomogram model was effective in clinical practice (Figure 7(f)).

Details are in the caption following the image
Development of a predictive nomogram. (a)–(b) Forest plots of univariate and multivariate Cox analyses by including risk score and clinical characteristics, which indicated the independent prognostic factors. (c) A nomogram was constructed based on independent factors to predicate the 1-, 3-, and 5-year overall survival of patients. (d)–(e) ROC curves and the calibration curve for 1-, 3-, and 5-year OS. (f) The DCA predicted the benefit of patients from the nomogram model.
Details are in the caption following the image
Development of a predictive nomogram. (a)–(b) Forest plots of univariate and multivariate Cox analyses by including risk score and clinical characteristics, which indicated the independent prognostic factors. (c) A nomogram was constructed based on independent factors to predicate the 1-, 3-, and 5-year overall survival of patients. (d)–(e) ROC curves and the calibration curve for 1-, 3-, and 5-year OS. (f) The DCA predicted the benefit of patients from the nomogram model.
Details are in the caption following the image
Development of a predictive nomogram. (a)–(b) Forest plots of univariate and multivariate Cox analyses by including risk score and clinical characteristics, which indicated the independent prognostic factors. (c) A nomogram was constructed based on independent factors to predicate the 1-, 3-, and 5-year overall survival of patients. (d)–(e) ROC curves and the calibration curve for 1-, 3-, and 5-year OS. (f) The DCA predicted the benefit of patients from the nomogram model.
Details are in the caption following the image
Development of a predictive nomogram. (a)–(b) Forest plots of univariate and multivariate Cox analyses by including risk score and clinical characteristics, which indicated the independent prognostic factors. (c) A nomogram was constructed based on independent factors to predicate the 1-, 3-, and 5-year overall survival of patients. (d)–(e) ROC curves and the calibration curve for 1-, 3-, and 5-year OS. (f) The DCA predicted the benefit of patients from the nomogram model.
Details are in the caption following the image
Development of a predictive nomogram. (a)–(b) Forest plots of univariate and multivariate Cox analyses by including risk score and clinical characteristics, which indicated the independent prognostic factors. (c) A nomogram was constructed based on independent factors to predicate the 1-, 3-, and 5-year overall survival of patients. (d)–(e) ROC curves and the calibration curve for 1-, 3-, and 5-year OS. (f) The DCA predicted the benefit of patients from the nomogram model.
Details are in the caption following the image
Development of a predictive nomogram. (a)–(b) Forest plots of univariate and multivariate Cox analyses by including risk score and clinical characteristics, which indicated the independent prognostic factors. (c) A nomogram was constructed based on independent factors to predicate the 1-, 3-, and 5-year overall survival of patients. (d)–(e) ROC curves and the calibration curve for 1-, 3-, and 5-year OS. (f) The DCA predicted the benefit of patients from the nomogram model.

3.8. Association of SUMOylation Gene Signature With CRC Risk

Finally, we assessed the potential association between SUMOylation signatures and the risk of CRC using a two-sample MR analysis. 131 cis-eQTL SNPs for eight SUMOylation signatures (NDC1, PPARGC1A, CDKN2A, UHRF2, NUP54, PIAS3, CHD3, and SUMO2) were selected as IVs according to three assumptions (Table S9). However, the expression of PIAS3 had not been found in the outcome; therefore, we focused on the investigation of the association between seven SUMOylation signatures (NDC1, PPARGC1A, CDKN2A, UHRF2, NUP54, CHD3, and SUMO2) and risk of CRC. As shown in Figure 8(a), the IVW model uncovered that the high expression of NUP54 had a positive causality on CRC (p = 0.00550, OR = 1.12, 95% CI: (1.03, 1.21)). The consistent results were observed in MR-Egger regression (p = 0.04275, OR = 1.22, 95% CI: (1.02, 1.47)), WM (p = 0.01762, OR = 1.15, 95% CI: (1.02, 1.29)), and MR. RAPS (p = 0.00435, OR = 1.13, 95% CI: (1.04, 1.22)). MR-PRESSO results were also consistent with IVW analysis (p = 0.00535, OR = 1.12, 95% CI: (1.04, 1.20), Table S10). Besides, the IVW model showed that the high expression of PPARGC1A had a negative causality on CRC (p = 0.00010, OR = 0.94, 95% CI: (0.91, 0.97)). The consistent results were also observed in MR-Egger regression (p = 0.01865, OR = 0.93, 95% CI: (0.87, 0.99)) and MR. RAPS (p = 0.00023, OR = 0.94, 95% CI: (0.91, 0.97)). MR-PRESSO results were also consistent with IVW analysis (p = 0.00041, OR = 0.94, 95% CI: (0.91, 0.97), Table S10). No causality between other SUMOylation signatures and CRC was found. No pleiotropy or heterogeneity was observed in these results with the Cochran Q test p > 0.05 and Pleiotropy_p > 0.05 (Figure 8(a)). Furthermore, the leave-one-out analysis of the causal association between NUP54 expression and CRC, as well as PPARGC1A expression and CRC, showed that no significant changes were observed (Figures 8(b), 8(c)).

Details are in the caption following the image
Association of SUMOylation gene signature with CRC risk. (a) Forest plot for the relationship between SUMOylation gene signatures (NDC1, PPARGC1A, CDKN2A, UHRF2, NUP54, CHD3, and SUMO2) expression and outcome of CRC using two-sample MR. (b)–(c) Leave-one-out permutation analysis of the causal association between NUP54 and the outcome of CRC, as well as between PPARGC1A and the outcome of CRC.
Details are in the caption following the image
Association of SUMOylation gene signature with CRC risk. (a) Forest plot for the relationship between SUMOylation gene signatures (NDC1, PPARGC1A, CDKN2A, UHRF2, NUP54, CHD3, and SUMO2) expression and outcome of CRC using two-sample MR. (b)–(c) Leave-one-out permutation analysis of the causal association between NUP54 and the outcome of CRC, as well as between PPARGC1A and the outcome of CRC.
Details are in the caption following the image
Association of SUMOylation gene signature with CRC risk. (a) Forest plot for the relationship between SUMOylation gene signatures (NDC1, PPARGC1A, CDKN2A, UHRF2, NUP54, CHD3, and SUMO2) expression and outcome of CRC using two-sample MR. (b)–(c) Leave-one-out permutation analysis of the causal association between NUP54 and the outcome of CRC, as well as between PPARGC1A and the outcome of CRC.

4. Discussion

Increasingly, studies have discovered the interaction between SUMOylation and tumorigenesis, metastasis, and progression [43, 44]. Many SUMOylation-related proteins/genes are found to be overexpressed in tumor tissues, reminding SUMOylation-related proteins/genes can be used as biomarkers or drug targets for antitumor treatment by regulating the SUMOylation processes [4547]. For example, SUMO-2/3 directly interacts with the 5-methylcytosine (m5C) RNA methyltransferase NSUN2 and promotes NSUN2 stability and expression to enhance its carcinogenic activity [48]. SUMO2 facilitates tumor cell proliferation and metastasis of nasopharyngeal carcinoma (NPC) [49]. SUMOylation signatures, SAE1 and UBA2, are used to predict the OS and therapeutic targets for non–small-cell lung cancer (NSCLC) [50]. A SUMO deconjugase SENP6 acts as a tumor suppressor and loss of SENP6 promotes genomic instability and can be used as the therapeutic target of PARP inhibitors in B-cell lymphoma [51]. It also found that SUMO-2/3 promotes the progression and oxaliplatin resistance of CRC [52]. The above studies demonstrate that SUMOylation plays a vital role in tumorigenesis and progression.

Furthermore, the latest research indicates that SUMOylation not only directly influences tumor growth, progression, metastasis, and survival but also plays a crucial role in regulating tumor heterogeneity, the TME, particularly the immune microenvironment, and impacts the efficacy of immunotherapy [5355]. PVR is a DNAM1 activating receptor, which experiences the SUMOylation in multiple myeloma to increase tumor cells recognized and killed by NK cells [56]. A SUMOylation inhibitor (TAK-981) can stimulate T-cell activation and enhance T-cell sensitivity as well as response to antigens in preclinical models [57]. SUMOylation is higher in hepatocellular carcinoma (HCRC), and inhibition of SUMOylation by the SUMOylation inhibitors (TAK-981 and ML-792) promotes innate immune signaling activation to restore antitumor immunity in HCRC [58]. Moreover, SUMOylation involves immune cell infiltration and the SUMOylation-related genes (DNMT3B and NUP210) act as promising targets in prostate cancer (PCa) [59], while the comprehensive role of SUMOylation in CRC remains unclear.

Although previous studies have highlighted the critical roles of SUMOylation in cancer initiation, progression, therapeutic response, and prognosis, the correlation between SUMOylation and patient heterogeneity remains inadequately understood. Therefore, in the present study, we first investigated the differences in activated SUMOylation-related pathways between tumor tissues and normal tissues. As expected, SUMOylation pathways were significantly enriched in tumor tissues compared to normal tissues. Based on the above results, patients were classified into three SUMOylation-related clusters. We observed significant heterogeneity among the three clusters, with Cluster 2 exhibiting the poorest prognosis, the most pronounced immune cell infiltration, and the highest sensitivity to chemotherapy and immunotherapy compared to the other clusters. Consistent with the predictions, SUMOylation has been involved in the prognosis, tumor progression, radio-resistance, or chemo-resistance in CRC [60, 61]. In addition, several research studies have discovered that SUMOylation is also enriched in almost all types of immune cells, such as SENP7 sustains CD8+ T-cell metabolic and functional states [62]. And SENP3 facilitates the STING-dependent DC cell function [63]. Therefore, SUMOylation might be associated with the differences in response to immunotherapy and might have benefited from immunotherapy by clustering the SUMOylation clusters.

Moreover, we investigated the prognostic signatures associated with SUMOylation through univariate Cox and LASSO regression analyses, and nine SUMOylation-related signatures (NDC1, PPARGC1A, CDKN2A, UHRF2, NUP54, PIAS3, H4C4, CHD3, and SUMO2) were identified and validated. Consistent with the previous study, patients with a high-SUMOylation risk score showed a worse survival status. Along with observational and experimental studies, the association between SUMOylation signature and risk of CRC is not fully understood. Here, a two-sample MR analysis was used to explore the relationship between SUMOylation signature and the risk of CRC based on GWAS summary statistics, and the results indicated that high expression of the SUMOylation signature NUP54 may increase the risk of CRC, while high expression of the SUMOylation signature PPARGC1A may decrease the risk of CRC. A previous study has found that 15 traits (age at menarche, age at menopause, body mass index, waist-to-hip ratio, height, physical activity, cigarette smoking, sleep duration, and morning-preference chronotype) and six blood biomarkers (estrogens, insulin-like growth factor-1, sex hormone-binding globulin [SHBG], telomere length, HDL-cholesterol, and fasting insulin) are identified as the risk factors for breast cancer by MR analysis [64]. The pooled cohort studies and MR analysis demonstrate that alcohol consumption contributes to carcinogenesis and two alcohol consumption-related genes COLCA1/COLCA2 increase the CRC risk [65].

Nucleoporin 54 (Nup54) which is identified as a CARM1-interacting protein can promote CARM1 nuclear importation to facilitate cell proliferation and tumorigenesis in gastric cancer [66]. This is the first oncogene found in CRC. The transcription coactivator peroxisome proliferator–activated receptor gamma, coactivator 1 alpha (PPARGC1A), also known as PGC-1α, is widely found in tumors and involved in mitochondrial biogenesis, oxidative phosphorylation, metastasis, aggression, and drug resistance [6769]. PPARGC1A belongs to the PPARG coactivator 1 (PPARGC1) family which exerts to predict to confer susceptibility to CRC [70]. PPARGC1A has pleiotropic roles and acts as a potential biomarker in the physical activity-protective effect on CRC [71]. Inversely, PPARGC1A is highly expressed in CRC tissues and positively associated with lymph node and liver metastasis [72]. Moreover, PPARGC1A has been found as a biomarker in CRC [7375]. Our results indicated that Nup54 acts as an oncogene while PPARGC1A acts as a protective gene in CRC. In further study, the indicator roles of Nup54 and PPARGC1A for CRC patients need to be verified in a large number of samples to provide potential biomarkers for clinical practices.

5. Conclusion

Taken together, we explored the SUMOylation-related molecular characteristics, tumor immune microenvironment, and therapeutic responses with different SUMOylation clusters. We also identified and validated the nine SUMOylation prognostic signatures and eventually selected two SUMOylation prognostic signatures (Nup54 and PPARGC1A) that could be used as potential therapeutic targets for CRC. Further studies with larger experiments are warranted to confirm the findings by providing molecular evidence.

Ethics Statement

Ethical approval was waived because this study used data from publicly available databases.

Consent

The authors have nothing to report.

Conflicts of Interest

The authors declare no conflicts of interest.

Author Contributions

Conceptualization: Chunlei Lu; formal analysis: Qingshui Yang; methodology: Qingshui Yang; resources: Zongyu Liang and Zhu Li; supervision: Zongyu Liang and Zhu Li; visualization: Zongyu Liang and Zhu Li; writing – original draft: Xiaobin Zhang; writing – review and editing: Chunlei Lu.

Funding

No funds were received.

Acknowledgments

We thank the participants and investigators of the TCGA and GEO databases, IEU Open GWAS Project, and GWAS data. We acknowledge the eQTLGen and IEU consortiums for their contributions.

    Supporting Information

    Table S1. SUMOylation-related genes in MSigDB.

    Table S2. DEGs between CRC tumor tissues and normal tissues.

    Table S3. SUMOylation-related pathways enriched in colorectal cancer.

    Table S4. SUMOylation-related clusters by NMF algorithm.

    Table S5. DEGs between Cluster 1 and rest clusters.

    Table S6. DEGs between Cluster 2 and rest clusters.

    Table S7. DEGs between Cluster 3 and rest clusters.

    Table S8. Combination of the DEGs among three clusters.

    Table S9. Details of instrumental variables of SUMOylation signatures expression in eQTLGen for MR analysis.

    Table S10. Details of instrumental variables of SUMOylation signatures expression in eQTLGen for MR analysis by MR-PRESSO method.

    Data Availability Statement

    The datasets presented in this are obtained from the open public source. The mRNA expression profile data and corresponding clinical data for CRC patients were downloaded from The Cancer Genome Atlas (TCGA, https://portal.gdc.cancer.gov/) and Gene Expression Omnibus (GEO, https://www.ncbi.nlm.nih.gov/geo/), including GSE17538 and GSE29623. The cis-eQTLs summary data of SUMOylation gene signature were collected from the eQTLGen Consortium (https://eqtlgen.org/). The GWAS summary statistics for colorectal cancer were obtained from the IEU OpenGWAS project (https://gwas.mrcieu.ac.uk/). The code relevant to the study is included in GitHub (https://github.com/Drluchunlei/Data-and-code-.git).

      The full text of this article hosted at iucr.org is unavailable due to technical difficulties.