Background: In recent years, the incidence of Crohn’s disease (CD) has shown a significant global increase, with numerous studies demonstrating its correlation with various cancers. This study aims to identify novel biomarkers for diagnosing CD and explore their potential applications in pan-cancer analysis.

Methods: Gene expression profiles were retrieved from the Gene Expression Omnibus (GEO) database, and differentially expressed genes (DEGs) were identified using the “limma” R package. Key biomarkers were selected through an integrative machine learning pipeline combining LASSO regression, neural network modeling, and Support Vector Machine-Recursive Feature Elimination (SVM-RFE). Six hub genes were identified and further validated using the independent dataset GSE169568. To assess the broader relevance of these biomarkers, a standardized pan-cancer dataset from the UCSC database was analyzed to evaluate their associations with 33 cancer types.

Results: Among the identified biomarkers, S100 calcium binding protein P (S100P) and S100 calcium binding protein A8 (S100A8) emerged as key candidates for CD diagnosis, with strong validation in the independent dataset. Notably, S100P displayed significant associations with immune cell infiltration and patient survival outcomes in both liver and lung cancers. These findings suggest that chronic inflammation and immune imbalances in CD may not only contribute to disease progression but also elevate cancer risk. As an inflammation-associated biomarker, S100P holds particular promise for both CD diagnosis and potential cancer risk stratification, especially in liver and lung cancers.

Conclusion: Our study highlights S100P and S100A8 as potential diagnostic biomarkers for CD. Moreover, the pan-cancer analysis underscores the broader clinical relevance of S100P, offering new insights into its role in immune modulation and cancer prognosis. These findings provide a valuable foundation for future research into the shared molecular pathways linking chronic inflammatory diseases and cancer development.

1. Introduction

Crohn’s disease (CD) is a chronic, granulomatous inflammatory disorder that can affect any segment of the gastrointestinal tract, characterized by persistent and relapsing inflammation. Despite extensive research, the precise etiology of CD remains incompletely understood, and the disease typically follows a lifelong, progressive course. In addition to its impact on the digestive system, CD is increasingly recognized as a systemic disorder, often presenting with extraintestinal manifestations such as arthritis [1], bronchiectasis, Cryptogenic Organizing Pneumonia [2], erythema nodosum, and aphthous stomatitis [3]. Alarmingly, the global prevalence and incidence of CD have risen significantly, particularly among individuals aged 20–60 years [4–6], placing a substantial burden on healthcare systems worldwide.

Although the exact pathogenic mechanisms remain unclear, current evidence highlights that CD arises from a dysregulated immune response triggered by complex interactions between host genetic susceptibility, gut microbiota imbalance, and environmental factors [7]. This intricate pathogenic network underscores the urgent need to identify reliable biomarkers that can facilitate early diagnosis and improve disease monitoring.

Moreover, mounting evidence suggests that chronic intestinal inflammation in CD not only drives local tissue damage but may also increase the risk of various malignancies, including rectal cancer, small bowel cancer [8], cholangiocarcinoma [9], anal cancer [10], and urinary bladder cancer [11]. There are some studies that point out that there is no direct association between CD and certain cancers, but it may indirectly increase the risk of cancer through other pathways. For example, it has been found that CD is not directly related to gastric cancer, but the long-term inflammatory process may promote the development of gastric cancer [12]. In addition, medication for CD, especially overuse of medications, may increase the risk of blood cancers, skin cancers, and hepatosplenic T-cell lymphomas [13].

This inflammatory-carcinogenic link raises important questions about the shared molecular mechanisms underlying CD and tumorigenesis. To address these critical gaps, advanced bioinformatics and machine learning techniques were employed to systematically identify biomarkers associated with CD while also conducting pan-cancer analyses to assess their relevance across 33 cancer types. Through the integration of these analyses, the study seeks to uncover novel biomarkers with dual diagnostic potential for CD and predictive significance for cancer risk, offering new insights into the inflammatory cancer continuum.

2. Materials and Methods

2.1. Data Acquisition

Relevant gene expression datasets were retrieved from the Gene Expression Omnibus (GEO, https://www.ncbi.nlm.nih.gov/geo/) and UCSC Xena (https://xenabrowser.net/) databases, as summarized in Table 1. The overall analysis workflow is illustrated in Figure 1.

Details are in the caption following the image — **Figure 1**
Open in figure viewer PowerPoint

Overview of the analytical framework and corresponding results. AUC, area under the curve; ROC, receiver operating characteristic.

Table 1. Summary of datasets used in this study.

Database	Dataset	Usage	Healthy samples	Disease samples	Other samples	Total samples
GEO	GSE112366	CD experimental group	26	362	—	388
	GSE207022	CD experimental group	23	125	—	148
	GSE169568	CD validation group	30	52	123	205

UCSC	A standardized pan cancer dataset of 33 types of cancer
UCSC	TCGA pan cancer (PANCAN, N = 12,591) dataset

Abbreviation: GEO, Gene Expression Omnibus.

2.2. Data Processing and Differential Expression Analysis

Datasets GSE112366 and GSE207022 were merged, with batch effects corrected using the “sva” package in R (v4.3.1). Differentially expressed genes (DEGs) between CD samples and healthy controls were identified using the “limma” package, with the cutoff criteria set at |log2 fold change (FC)| > 1 and p < 0.05. The DEGs were visualized using heatmaps and volcano plots to depict global expression changes.

2.3. Enrichment Analysis

Functional enrichment analysis of DEGs was performed using the “clusterProfiler” package in R, covering Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), and Disease Ontology (DO) annotations. GO analysis included three categories: molecular function (MF), cellular component (CC), and biological process (BPs). The KEGG pathway analysis provided insights into disease-related biological pathways, while DO analysis facilitated classification of DEGs within broader disease contexts, allowing comparative exploration of CD-associated gene signatures.

2.4. Hub Gene Selection

To identify potential hub genes, we applied an integrative machine learning framework combining LASSO regression, a neural network algorithm, and Support Vector Machine-Recursive Feature Elimination (SVM-RFE). LASSO regression (R package “glmnet”) applied L1 regularization to shrink irrelevant gene coefficients to zero [14, 15], retaining only the most informative DEGs. A neural network model, implemented in Python using “pandas”, “numpy”, “matplotlib.pyplot”, and “sklearn.neural_network”, automatically learned nonlinear patterns and complex gene interactions contributing to CD. SVM-RFE [16] (R packages “e1071”, “kernlab”, and “caret”) recursively eliminated weak predictors, ultimately selecting a refined panel of CD-relevant genes. The intersection of genes identified by all three methods was considered the final set of hub genes for further validation.

2.5. Hub Gene Validation

The independent dataset GSE169568 was used to validate the expression and diagnostic relevance of the identified hub genes. Gene expression differences between CD and healthy samples were visualized using box plots, while diagnostic performance was assessed using receiver operating characteristic (ROC) curves and area under the curve (AUC) metrics, with AUC > 0.80 considered indicative of strong diagnostic power.

2.6. Immune Infiltration Analysis

To estimate immune cell composition within CD samples, we applied the CIBERSORT algorithm in R, which uses linear support vector regression to deconvolute expression profiles into proportions of 22 immune cell types (LM22). This allowed exploration of the immune microenvironment and its relationship with identified hub genes.

2.7. Pan-Cancer Analysis

To investigate the broader relevance of hub genes across malignancies, gene expression profiles and clinical data spanning 33 tumor types were retrieved from the TCGA Pan-Cancer (PANCAN) dataset via the UCSC Xena platform. For each cancer type, differential gene expression between tumor and adjacent normal tissues was computed in R. Additionally, survival analysis using the “survival” package assessed associations between hub gene expression and patient prognosis across cancers. Correlations between hub gene expression and clinical characteristics were further analyzed in selected cancers.

2.8. Tumor Microenvironment Analysis

The tumor microenvironment (TME), particularly the relative abundance of immune and stromal components, plays a crucial role in cancer progression and prognosis. The ESTIMATE algorithm was used to compute immune and stromal scores from gene expression profiles, providing quantitative assessment of nontumor cell infiltration. To further characterize the immune landscape, CIBERSORT deconvolution was repeated (n = 1000 permutations) to ensure robustness [17]. Differences in immune functions and immune checkpoint expression between high and low hub gene expression groups were also examined to infer potential immunomodulatory roles of the identified biomarkers.

2.9. Statistical Analysis

All statistical analyses were conducted using R (v4.3.1). Differential expression analysis, correlation assessments, and diagnostic performance evaluations were performed using established statistical methods. Gene expression differences between CD and healthy samples were analyzed using limma’s linear modeling and empirical Bayes methods. ROC curves were employed to evaluate diagnostic performance, with AUC > 0.80 considered indicative of strong predictive ability. Group comparisons were conducted using the Wilcoxon rank-sum test, with p < 0.05 considered statistically significant.

3. Results

3.1. Identification of DEGs

By integrating datasets GSE1112366 and GSE207022 and performing differential gene expression analysis, we identified a total of 41 DEGs (Figure 2A). These results indicate that a substantial number of genes exhibit increased expression levels in Crohn’s disease (CD) patients compared to healthy controls. Figure 2B further illustrates the expression patterns of these DEGs across the merged dataset.

3.2. Enrichment Analysis of DEGs

To investigate the biological functions and pathway involvement of the identified DEGs, we conducted GO, KEGG, and DO enrichment analyses.

Gene Ontology enrichment analysis revealed that the DEGs are primarily enriched in BPs such as response to lipopolysaccharides, response to bacterial molecules, neutrophil chemotaxis, granulocyte migration, and leukocyte chemotaxis. These processes are closely linked to immune and inflammatory responses (Figure 3A).

Kyoto Encyclopedia of Genes and Genomes pathway analysis further identified five significantly enriched pathways, including the IL-17 signaling pathway, cytokine–cytokine receptor interaction, viral protein–cytokine interaction, TNF signaling pathway, and chemokine signaling pathway (Figure 3B). These pathways emphasize the key role of cytokines and chemokines in CD pathogenesis.

In addition, DO enrichment analysis highlighted that these DEGs are predominantly associated with diseases such as chronic obstructive pulmonary disease, bacterial infections, interstitial lung disease, and enteric diseases (Figure 3C).

3.3. Identification of Hub Genes

We employed three feature selection algorithms to screen hub genes from the DEGs. Lasso logistic regression (R package “glmnet”) identified 15 genes using the minimum lambda criterion (Figure 4A). A neural network model trained on the dataset identified 15 genes associated with CD (Figure 4B). SVM-RFE analysis (R package “e1071”) identified 16 genes relevant to CD diagnosis (Figure 4C). The hub genes selected for each model are shown in Table 2.

Table 2. Gene names selected by three models.

Model	Gene		Model	Gene		Model	Gene
Lasso	S100P	CXCL1	NN	NOS2	DHRS9	SVM-RFE	S100P	PDZK1IP1
	LCN2	GZMB		IGHD	C9orf71		IGHM	S100A8
	CCL11	C9orf71		S100P	FCGR3A		NOS2	DUOX2
	IGHM	GPR109B		CXCL11	DUOXA2		LCN2	SLC6A14
	DHRS9	MMP1		DUOX2	CCL11		PI3	DUOXA2
	NOS2	IGLV2-23		LCN2	IGLV2-23		CCL11	DHRS9
	S100A8	LOC100293983		IGKC	LOC100293983		TFF1	CXCL11
	TFF1	—		S100A8	—		CXCL1	IGHD

Abbreviation: SVM-RFE, support vector machine-recursive feature elimination.

By intersecting the results of these three approaches, six overlapping genes were identified: LCN2, S100P, S100A8, CCL11, DHRS9, and NOS2 (Figure 4D). The diagnostic performance of these six genes was further assessed using ROC curves (Figure 4E), which highlighted S100A8, S100P, and LCN2 as promising biomarkers for CD diagnosis.

3.4. Validation of Hub Genes

To validate the expression patterns and diagnostic utility of the hub genes, we analyzed the independent validation dataset GSE169568. Expression levels of S100P, S100A8, LCN2, and DHRS9 were significantly different between CD patients and healthy controls (Figure 5A). ROC analysis confirmed that S100P (AUC = 0.834) and S100A8 (AUC = 0.878) exhibited high diagnostic accuracy, while DHRS9 (AUC = 0.644) and LCN2 (AUC = 0.732) showed relatively lower diagnostic value (Figure 5B).

3.5. Immune Infiltration Analysis

We applied the CIBERSORT algorithm to estimate immune cell composition in CD and healthy samples (Figure 6A). Significant differences were observed between the groups (Figure 6B). In healthy controls, naive B cells, memory B cells, plasma cells, and follicular helper T cells exhibited higher abundance, indicating a role in immune surveillance and antibody production. In CD patients, there was a higher proportion of resting NK cells, M1 macrophages, eosinophils, and neutrophils, reflecting active inflammation and immune dysregulation.

Correlation analysis between hub gene expression and immune cell abundance revealed that S100P and S100A8 were positively correlated with neutrophil activation and mast cell activation, but negatively correlated with CD8+ T cells (Figure 6C). Additionally, S100A8 showed strong positive correlations with M0 and M1 macrophages, while S100P correlated positively with plasma cells but negatively with memory B cells.

3.6. Pan-Cancer Analysis

In pan-cancer analysis, S100P and S100A8 exhibited significantly altered expression in 8 and 12 tumor types, respectively (Figure 7A).

Survival analysis demonstrated that S100P expression significantly influenced patient survival in liver hepatocellular carcinoma (LIHC) and lung adenocarcinoma (LUAD) (Figure 7B). ROC analysis further confirmed the predictive power of S100P for 1-year, 3-year, and 5-year survival in these cancers (Figure 7C).

Analysis of clinical factors revealed that S100P expression was significantly associated with gender, tumor M stage, and race in LUAD and gender in LIHC (Figure 7D).

3.7. TME Analysis of S100P

We further investigated the role of S100P in shaping the TME in LUAD and LIHC. In LUAD, high S100P expression correlated with lower immune scores, stromal scores, and estimate scores (Figure 8A) and showed a negative correlation with these scores but a positive correlation with DNAss and RNAss (Figure 8B). In LIHC, high S100P expression correlated with higher immune and estimate scores but no significant change in stromal score (Figure 8A).

S100P expression correlated negatively with resting dendritic cells, resting mast cells, and M2 macrophages but positively with M0 macrophages and T follicular helper cells (Figure 8C,D). Immune functions, chemokine receptor expression, and checkpoint expression differed significantly between S100P-high and S100P-low groups (Figure 8E,F).

4. Discussion

CD predominantly affects individuals under the age of 30, yet the incidence among elderly populations is steadily increasing [18]. Notably, Germany reports the highest incidence, reaching 322 cases per 100,000 individuals [19]. Despite advancements in therapeutic strategies, over 80% of CD patients experience postoperative recurrence, posing a considerable challenge for clinical management [20, 21]. As treatment options continue to evolve, the identification of reliable biomarkers to predict therapeutic response is crucial. In this study, we applied machine learning algorithms to identify and validate S100P and S100A8 as potential biomarkers for CD, thereby offering novel insights into its diagnosis and management.

Although previous studies have identified various biomarkers for CD [22, 23], our study applied advanced machine learning approaches to uncover novel candidates, ultimately highlighting S100P and S100A8. The utility of these genes was subsequently confirmed using an independent dataset, further supporting their robustness as potential biomarkers.

The pathogenesis of CD remains incompletely understood, but it is widely recognized as a disorder driven by immune dysfunction and microbial imbalances [7]. In our enrichment analysis of DEGs, we observed significant enrichment in pathways related to responses to lipopolysaccharides and bacterial molecules, underscoring the critical role of microbial dysbiosis in CD development. Furthermore, we identified the involvement of the TNF signaling pathway and Toll-like receptor (TLR) signaling pathway, which has previously been linked to immune-related diseases such as idiopathic pulmonary fibrosis [24, 25].

Integrating enrichment analysis with immune infiltration profiling, we found that S100P and S100A8 overexpression in CD patients may lead to aberrant activation of the IL-17 signaling pathway, triggering excessive production of cytokines and chemokines. These inflammatory mediators promote neutrophil recruitment and prolong their lifespan [26–28], creating a self-sustaining inflammatory loop that exacerbates CD pathology. Therefore, targeting neutrophil activation may represent a promising therapeutic avenue for CD. This mechanistic link further strengthens the rationale for S100P and S100A8 as key biomarkers.

Previous studies have also suggested associations between CD and certain cancers, particularly colorectal cancer [29], anal cancer [10], oral cavity cancer, and breast cancer [30]. Expanding upon this, we conducted a pan-cancer analysis to examine the relevance of our identified biomarkers across 33 cancer types. This analysis revealed significantly altered expression of S100P in LIHC and LUAD, with strong correlations between S100P expression and patient survival outcomes. Moreover, S100P expression was associated with specific clinical features in both LIHC and LUAD. These findings are consistent with previous studies showing that patients with CD lead to an increased risk of developing lung cancer [31, 32]. Chronic inflammation, steroid use, and hepatic steatosis in CD promote cirrhosis, which may increase cancer risk [33]. Our results thus offer valuable insights into the potential interplay between CD and cancers beyond the gastrointestinal tract, particularly in the lung and liver.

S100P and S100A8 both belong to the S100 protein family, which regulates diverse biological processes, including cell differentiation, proliferation, migration, and apoptosis, largely through interactions with key signaling proteins such as p53, β-catenin, and NF-κB [34, 35]. Numerous studies have linked dysregulated S100 protein expression to lung cancer initiation and progression [36–38]. High S100P expression in metastatic lung tumors is also associated with poorer survival [39–41]. Additionally, recent research highlights S100P as a ferroptosis inhibitor, promoting hepatocellular carcinoma by reprograming lipid metabolism [42].

In LIHC, our TME analysis revealed significant shifts in immune cell populations, though no substantial changes were observed in stromal components. High S100P expression suppressed dendritic cell and mast cell abundance while promoting M0 macrophage infiltration, contributing to an immunosuppressive environment conducive to tumor progression. In LUAD, low S100P expression correlated with elevated B cells, M0 macrophages, M1 macrophages, and dendritic cells, highlighting S100P’s potential role in modulating macrophage polarization and thus shaping the TME. Additionally, S100P expression influenced immune checkpoint levels, suggesting its role in immune evasion. As antigen-presenting cells, dendritic cells are central to T cell priming; however, tumor cells often suppress dendritic cell function to escape immune surveillance [43]. Together, these findings underscore S100P’s relevance as both a diagnostic and therapeutic target in LIHC and LUAD.

5. Limitations

This study has several limitations. First, the validation dataset used was relatively small compared to the training datasets, which may reduce the generalizability of our findings. Second, our pan-cancer analysis focused only on cancer types where hub genes exhibited statistically significant correlations with overall survival (p < 0.05). Consequently, we may have overlooked potential associations between S100P/S100A8 and other cancers, particularly those with smaller sample sizes. Existing literature suggests that CD is associated with increased risks of colorectal cancer, skin cancer, hematologic malignancies, and urinary tract cancer, which were not comprehensively explored in this study.

6. Conclusion

In summary, S100P and S100A8 emerged as promising biomarkers for CD, providing novel directions for its diagnosis and potential treatment stratification. Additionally, S100P demonstrated potential as a prognostic and therapeutic biomarker in liver and lung cancers, further expanding its clinical relevance. These findings not only enhance our understanding of the molecular mechanisms underlying CD but also offer a bridge to explore its relationship with oncogenesis, especially in LIHC and LUAD.

Abbreviations

ACC:: Adrenocortical carcinoma
AUC:: area under curve
BLCA:: bladder urothelial carcinoma
BRCA:: breast cancer
CD:: Crohn’s disease
CESC:: cervical squamous cell carcinoma and endocervical adenocarcinoma
CHOL:: Cholangiocarcinoma
COAD:: Colorectal adenocarcinoma
DEGs:: Differentially expressed genes
DLBC:: Diffuse large B-cell lymphoma
DO:: Disease Ontology
ESCA:: esophageal carcinoma
GBM:: glioblastoma multiforme
GEO:: gene expression omnibus
GO:: gene ontology
HCs:: healthy controls
HNSC:: head and neck squamous cell carcinoma
KEGG:: Kyoto Encyclopedia of Genes and Genomes
KICH:: kidney chromophobe
KIRC:: kidney renal clear cell carcinoma
KIRP:: kidney renal papillary cell carcinoma
LAML:: acute myeloid leukemia
LGG:: low grade glioma
LIHC:: liver hepatocellular carcinoma
LUAD:: lung adenocarcinoma
LUSC:: lung squamous cell carcinoma
MESO:: malignant mesothelioma
OV:: ovarian cancer
PAAD:: pancreatic adenocarcinoma
PCPG:: pheochromocytoma and paraganglioma
PRAD:: prostate adenocarcinoma
READ:: rectal adenocarcinoma
ROC:: receiver operating characteristic
S100A8:: S100 calcium binding protein A8
S100P:: S100 calcium binding protein P
SARC:: Sarcoma
SKCM:: cutaneous melanoma
STAD:: stomach adenocarcinoma
SVM-RFE:: support vector machine-recursive feature elimination
TGCT:: testicular germ cell tumor
THCA:: thyroid cancer
THYM:: thymoma
TME:: tumor microenvironment
UCEC:: uterine corpus endometrial carcinoma
USC:: uterine carcinosarcoma
UVM:: uveal melanoma

Data Availability Statement

All code is saved on the GitHub website (https://github.com/YuanTangyu/Crohn-s-disease.git). The datasets used in this study are available in the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/) under the accession numbers “GSE112366”, “GSE207022,” and “GSE169568”. The standardized pan cancer dataset of 33 types of cancer used in this study can be obtained from the University of California Santa Cruz (UCSC) database (https://xenabrowser.net/), which includes clinical information and gene expression data of the samples.

Conflicts of Interest

The authors declare no conflicts of interest.

Author Contributions

Pengtao Liu and Tangyu Yuan: Conceptualization, Jiayin Xing: methodology, Tangyu Yuan: data curation, Tangyu Yuan and Jiayin Xing: writing–original draft preparation, Pengtao Liu, Tangyu Yuan, and Jiayin Xing: writing–review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by grants from the Natural Science Foundation of Shandong Province (ZR2023QC282 to Li Jiang) and Shandong medical and health science technology development program (2017WS410).

Acknowledgments

We thank all the teachers and classmates who participated in this study at the School of Basic Medicine of Shandong Second Medical University. Special thanks to Mr. Zimo Wang and Ms. Xinxia Song for their English language editing and proofreading of the manuscript. Their contribution greatly enhanced the language quality of the article. And thank you to Dr. Li Jiang for her assistance in analysis and funding sponsorship.

References

1 Rios Rodriguez V., Sonnenberg E., and Proft F., et al.Presence of Spondyloarthritis Associated to Higher Disease Activity and HLA-B27 Positivity in Patients With Early Crohn’s Disease: Clinical and MRI Results From a Prospective Inception Cohort, Joint Bone Spine. (2022) 89, no. 5, https://doi.org/10.1016/j.jbspin.2022.105367, 105367.
10.1016/j.jbspin.2022.105367
PubMed Google Scholar
2 Zaman T., Watson J., and Zaman M., Cryptogenic Organizing Pneumonia With Lung Nodules Secondary to Pulmonary Manifestation of Crohn Disease, Clinical Medicine Insights: Case Reports. (2017) 10, https://doi.org/10.1177/1179547617710672, 2-s2.0-85031711895, 1179547617710672.
10.1177/1179547617710672
Google Scholar
3 Karmiris K., Avgerinos A., and Tavernaraki A., et al.Prevalence and Characteristics of Extra-Intestinal Manifestations in a Large Cohort of Greek Patients With Inflammatory Bowel Disease, Journal of Crohn’s and Colitis. (2016) 10, no. 4, 429–436, https://doi.org/10.1093/ecco-jcc/jjv232, 2-s2.0-84966405194.
10.1093/ecco-jcc/jjv232
PubMed Web of Science® Google Scholar
4 Kaplan G. G. and Windsor J. W., The Four Epidemiological Stages in the Global Evolution of Inflammatory Bowel Disease, Nature Reviews Gastroenterology & Hepatology. (2021) 18, no. 1, 56–66, https://doi.org/10.1038/s41575-020-00360-x.
10.1038/s41575-020-00360-x
PubMed Web of Science® Google Scholar
5 da Silva Júnior R. T., Apolonio J. S., and de Souza Nascimento J. O., et al.Crohn’s Disease and Clinical Management Today: How it Does?, World Journal of Methodology. (2023) 13, no. 5, 399–413, https://doi.org/10.5662/wjm.v13.i5.399.
10.5662/wjm.v13.i5.399
PubMed Google Scholar
6 Cassol O. S., Zabot G. P., Saad-Hossne R., and Padoin A., Epidemiology of Inflammatory Bowel Diseases in the State of Rio Grande do Sul, Brazil, World Journal of Gastroenterology. (2022) 28, no. 30, 4174–4181, https://doi.org/10.3748/wjg.v28.i30.4174.
10.3748/wjg.v28.i30.4174
PubMed Web of Science® Google Scholar
7 Zhao M., Gönczi L., Lakatos P. L., and Burisch J., The Burden of Inflammatory Bowel Disease in Europe in 2020, Journal of Crohn’s and Colitis. (2021) 15, no. 9, 1573–1587, https://doi.org/10.1093/ecco-jcc/jjab029.
10.1093/ecco-jcc/jjab029
PubMed Web of Science® Google Scholar
8 Jess T., Gamborg M., Matzen P., Munkholm P., and Sørensen T. I., Increased Risk of Intestinal Cancer in Crohn’s Disease: A Meta-Analysis of Population-Based Cohort Studies, The American Journal of Gastroenterology. (2005) 100, no. 12, 2724–2729, https://doi.org/10.1111/j.1572-0241.2005.00287.x, 2-s2.0-33644820014.
10.1111/j.1572-0241.2005.00287.x
PubMed Web of Science® Google Scholar
9 Huai J. P., Ding J., Ye X. H., and Chen Y. P., Inflammatory Bowel Disease and Risk of Cholangiocarcinoma: Evidence From a Meta-Analysis of Population-Based Studies, Asian Pacific Journal of Cancer Prevention. (2014) 15, no. 8, 3477–3482, https://doi.org/10.7314/APJCP.2014.15.8.3477, 2-s2.0-84901992603.
10.7314/APJCP.2014.15.8.3477
PubMed Web of Science® Google Scholar
10 Johansen M. P., Wewer M. D., Nordholm-Carstensen A., and Burisch J., Perianal Crohn’s Disease and the Development of Colorectal and Anal Cancer: A Systematic Review and Meta-Analysis, Journal of Crohn’s and Colitis. (2023) 17, no. 3, 361–368, https://doi.org/10.1093/ecco-jcc/jjac143.
10.1093/ecco-jcc/jjac143
PubMed Web of Science® Google Scholar
11 Persson P. G., Karlén P., and Bernell O., et al.Crohn’s Disease and Cancer: A Population-Based Cohort Study, Gastroenterology. (1994) 107, no. 6, 1675–1679, https://doi.org/10.1016/0016-5085(94)90807-9, 2-s2.0-0028135110.
10.1016/0016-5085(94)90807-9
CAS PubMed Web of Science® Google Scholar
12 Wei Q., Wang Z., Liu X., Liang H., and Chen L., Association Between Gastric Cancer and 12 Autoimmune Diseases: A Mendelian Randomization Study, Genes (Basel). (2023) 14, no. 10, https://doi.org/10.3390/genes14101844.
10.3390/genes14101844
Web of Science® Google Scholar
13 Beaugerie L., Rahier J. F., and Predicting Kirchgesner J., Preventing, and Managing Treatment-Related Complications in Patients With Inflammatory Bowel Diseases, Clinical Gastroenterology and Hepatology. (2020) 18, no. 6, 1324–1335.e2, https://doi.org/10.1016/j.cgh.2020.02.009.
10.1016/j.cgh.2020.02.009
CAS PubMed Web of Science® Google Scholar
14 Cheung-Lee W. L. and Link A. J., Genome Mining for Lasso Peptides: Past, Present, and Future, Journal of Industrial Microbiology and Biotechnology. (2019) 46, no. 9-10, 1371–1379, https://doi.org/10.1007/s10295-019-02197-z, 2-s2.0-85067000002.
10.1007/s10295-019-02197-z
CAS PubMed Web of Science® Google Scholar
15 Fernández-Delgado M., Sirsat M. S., Cernadas E., Alawadi S., Barro S., and Febrero-Bande M., An Extensive Experimental Survey of Regression Methods, Neural Networks. (2019) 111, 11–34, https://doi.org/10.1016/j.neunet.2018.12.010, 2-s2.0-85059940317.
10.1016/j.neunet.2018.12.010
CAS PubMed Web of Science® Google Scholar
16 Huang S., Cai N., Pacheco P. P., Narrandes S., Wang Y., and Xu W., Applications of Support Vector Machine (SVM) Learning in Cancer Genomics, Cancer Genomics & Proteomics. (2018) 15, no. 1, 41–51, https://doi.org/10.21873/cgp.20063, 2-s2.0-85040177449.
10.21873/cgp.20063
CAS PubMed Web of Science® Google Scholar
17 Newman A. M., Liu C. L., and Green M. R., et al.Robust Enumeration of Cell Subsets From Tissue Expression Profiles, Nature Methods. (2015) 12, no. 5, 453–457, https://doi.org/10.1038/nmeth.3337, 2-s2.0-84928927858.
10.1038/nmeth.3337
CAS PubMed Web of Science® Google Scholar
18 Roda G., Chien Ng S., and Kotze P. G., et al.Crohn’s Disease, Nature Reviews Disease Primers. (2020) 6, no. 1, https://doi.org/10.1038/s41572-020-0156-2.
10.1038/s41572-020-0156-2
Web of Science® Google Scholar
19 Ng S. C., Shi H. Y., and Hamidi N., et al.Worldwide Incidence and Prevalence of Inflammatory Bowel Disease in the 21st Century: A Systematic Review of Population-Based Studies, The Lancet. (2017) 390, no. 10114, 2769–2778, https://doi.org/10.1016/S0140-6736(17)32448-0, 2-s2.0-85031499214.
10.1016/S0140-6736(17)32448-0
PubMed Web of Science® Google Scholar
20 Olaison G., Smedh K., and Sjödahl R., Natural Course of Crohn’s Disease after Ileocolonic Resection: Endoscopically Visualized Ileal Ulcers Preceding Symptoms, Gut. (1992) 33, no. 3, 331–335, https://doi.org/10.1136/gut.33.3.331.
10.1136/gut.33.3.331
CAS PubMed Web of Science® Google Scholar
21 Rutgeerts P., Van Assche G., and Vermeire S., et al.Ornidazole for Prophylaxis of Postoperative Crohn’s Disease Recurrence: A Randomized, Double-Blind, Placebo-Controlled Trial, Gastroenterology. (2005) 128, no. 4, 856–861, https://doi.org/10.1053/j.gastro.2005.01.010, 2-s2.0-20244388832.
10.1053/j.gastro.2005.01.010
CAS PubMed Web of Science® Google Scholar
22 Tang D., Huang Y., and Che Y., et al.Identification of Platelet-Related Subtypes and Diagnostic Markers in Pediatric Crohn’s Disease Based on WGCNA and Machine Learning, Frontiers in Immunology. (2024) 15, https://doi.org/10.3389/fimmu.2024.1323418, 1323418.
10.3389/fimmu.2024.1323418
CAS PubMed Web of Science® Google Scholar
23 Chen K. A., Nishiyama N. C., and Kennedy Ng M. M., et al.Linking Gene Expression to Clinical Outcomes in Pediatric Crohn’s Disease Using Machine Learning, Scientific Reports. (2024) 14, no. 1, https://doi.org/10.1038/s41598-024-52678-0.
10.1038/s41598-024-52678-0
Web of Science® Google Scholar
24 Wang H., Xie Q., Ou-Yang W., and Zhang M., Integrative Analyses of Genes Associated With Idiopathic Pulmonary Fibrosis, Journal of Cellular Biochemistry. (2019) 120, no. 5, 8648–8660, https://doi.org/10.1002/jcb.28153, 2-s2.0-85057966312.
10.1002/jcb.28153
CAS PubMed Web of Science® Google Scholar
25 Li H., Zhou Q., Ding Z., and Wang Q., RTP4, a Biomarker Associated With Diagnosing Pulmonary Tuberculosis and Pan-Cancer Analysis, Mediators of Inflammation. (2023) 2023, 13, https://doi.org/10.1155/2023/2318473, 2318473.
10.1155/2023/2318473
PubMed Web of Science® Google Scholar
26 Galli S. J., Borregaard N., and Wynn T. A., Phenotypic and Functional Plasticity of Cells of Innate Immunity: Macrophages, Mast Cells and Neutrophils, Nature Immunology. (2011) 12, no. 11, 1035–1044, https://doi.org/10.1038/ni.2109, 2-s2.0-80054899779.
10.1038/ni.2109
CAS PubMed Web of Science® Google Scholar
27 Brannigan A. E., O’Connell P. R., and Hurley H., et al.Neutrophil Apoptosis Is Delayed in Patients with Inflammatory Bowel Disease, Shock. (2000) 13, no. 5, 361–366, https://doi.org/10.1097/00024382-200005000-00003, 2-s2.0-0034182004.
10.1097/00024382-200005000-00003
CAS PubMed Web of Science® Google Scholar
28 Colotta F., Re F., Polentarutti N., Sozzani S., and Mantovani A., Modulation of Granulocyte Survival and Programmed Cell Death by Cytokines and Bacterial Products, Blood. (1992) 80, no. 8, 2012–2020, https://doi.org/10.1182/blood.V80.8.2012.2012.
10.1182/blood.V80.8.2012.2012
CAS PubMed Web of Science® Google Scholar
29 Gatenby G., Glyn T., Pearson J., Gearry R., and Eglinton T., The Long-Term Incidence of Dysplasia and Colorectal Cancer in a Crohn’s Colitis Population-Based Cohort, Colorectal Disease. (2021) 23, no. 9, 2399–2406, https://doi.org/10.1111/codi.15756.
10.1111/codi.15756
PubMed Web of Science® Google Scholar
30 Gao H., Zheng S., Yuan X., Xie J., and Xu L., Causal Association Between Inflammatory Bowel Disease and 32 Site-Specific Extracolonic Cancers: A Mendelian Randomization Study, BMC Medicine. (2023) 21, no. 1, https://doi.org/10.1186/s12916-023-03096-y.
10.1186/s12916-023-03096-y
Web of Science® Google Scholar
31 Lo B., Zhao M., Vind I., and Burisch J., The Risk of Extraintestinal Cancer in Inflammatory Bowel Disease: A Systematic Review and Meta-Analysis of Population-Based Cohort Studies, Clinical Gastroenterology and Hepatology. (2021) 19, no. 6, 1117–1138.e19, https://doi.org/10.1016/j.cgh.2020.08.015.
10.1016/j.cgh.2020.08.015
PubMed Web of Science® Google Scholar
32 Pedersen N., Duricova D., Elkjaer M., Gamborg M., Munkholm P., and Jess T., Risk of Extra-Intestinal Cancer in Inflammatory Bowel Disease: Meta-Analysis of Population-Based Cohort Studies, American Journal of Gastroenterology. (2010) 105, no. 7, 1480–1487, https://doi.org/10.1038/ajg.2009.760, 2-s2.0-77954426269.
10.1038/ajg.2009.760
PubMed Web of Science® Google Scholar
33 Voss J., Schneider C. V., Kleinjans M., Bruns T., Trautwein C., and Strnad P., Hepatobiliary Phenotype of Individuals With Chronic Intestinal Disorders, Scientific Reports. (2021) 11, no. 1, https://doi.org/10.1038/s41598-021-98843-7.
10.1038/s41598-021-98843-7
Web of Science® Google Scholar
34 Saiki Y. and Horii A., Multiple Functions of S100A10, an Important Cancer Promoter, Pathology international. (2019) 69, no. 11, 629–636.
10.1111/pin.12861
CAS PubMed Web of Science® Google Scholar
35 Donato R., Sorci G., and Giambanco I., S100A6 Protein: Functional Roles, Cellular and Molecular Life Sciences. (2017) 74, no. 15, 2749–2760, https://doi.org/10.1007/s00018-017-2526-9, 2-s2.0-85017513431.
10.1007/s00018-017-2526-9
CAS PubMed Web of Science® Google Scholar
36 Woo T., Okudela K., and Mitsui H., et al.Up-Regulation of S100A11 in Lung Adenocarcinoma - Its Potential Relationship With Cancer Progression, PLOS ONE. (2015) 10, no. 11, https://doi.org/10.1371/journal.pone.0142642, 2-s2.0-84953231218, e0142642.
10.1371/journal.pone.0142642
PubMed Web of Science® Google Scholar
37 Hu M., Ye L., Ruge F., Zhi X., Zhang L., and Jiang W. G., The Clinical Significance of Psoriasin for Non-Small Cell Lung Cancer Patients and Its Biological Impact on Lung Cancer Cell Functions, BMC Cancer. (2012) 12, no. 1, https://doi.org/10.1186/1471-2407-12-588, 2-s2.0-84872170154.
10.1186/1471-2407-12-588
Web of Science® Google Scholar
38 Chen N., Sato D., Saiki Y., Sunamura M., Fukushige S., and Horii A., S100A4 Is Frequently Overexpressed in Lung Cancer Cells and Promotes Cell Growth and Cell Motility, Biochemical and Biophysical Research Communications. (2014) 447, no. 3, 459–464, https://doi.org/10.1016/j.bbrc.2014.04.025, 2-s2.0-84900342109.
10.1016/j.bbrc.2014.04.025
CAS PubMed Web of Science® Google Scholar
39 Diederichs S., Bulk E., and Steffen B., et al.S100 Family Members and Trypsinogens Are Predictors of Distant Metastasis and Survival in Early-Stage Non-Small Cell Lung Cancer, Cancer Research. (2004) 64, no. 16, 5564–5569, https://doi.org/10.1158/0008-5472.CAN-04-2004, 2-s2.0-4143116938.
10.1158/0008-5472.CAN-04-2004
CAS PubMed Web of Science® Google Scholar
40 Kim B., Lee H. J., and Choi H. Y., et al.Clinical Validity of the Lung Cancer Biomarkers Identified by Bioinformatics Analysis of Public Expression Data, Cancer Research. (2007) 67, no. 15, 7431–7438, https://doi.org/10.1158/0008-5472.CAN-07-0003, 2-s2.0-34547631980.
10.1158/0008-5472.CAN-07-0003
CAS PubMed Web of Science® Google Scholar
41 Bartling B., Rehbein G., Schmitt W. D., Hofmann H. S., Silber R. E., and Simm A., S100A2-S100P Expression Profile and Diagnosis of Non-Small Cell Lung Carcinoma: Impairment by Advanced Tumour Stages and Neoadjuvant Chemotherapy, European Journal of Cancer. (2007) 43, no. 13, 1935–1943, https://doi.org/10.1016/j.ejca.2007.06.010, 2-s2.0-34548027490.
10.1016/j.ejca.2007.06.010
CAS PubMed Web of Science® Google Scholar
42 Yang M., Cui W., and Lv X., et al.S100P is a Ferroptosis Suppressor to Facilitate Hepatocellular Carcinoma Development by Rewiring Lipid Metabolism, Nature Communications. (2025) 16, no. 1, https://doi.org/10.1038/s41467-024-55785-8.
10.1038/s41467-024-55785-8
Web of Science® Google Scholar
43 Gardner A. and Ruffell B., Dendritic Cells and Cancer Immunity, Trends in Immunology. (2016) 37, no. 12, 855–865, https://doi.org/10.1016/j.it.2016.09.006, 2-s2.0-84994074775.
10.1016/j.it.2016.09.006
CAS PubMed Web of Science® Google Scholar

All articles

Identification of Crohn’s Disease-Related Biomarkers and Pan-Cancer Analysis Based on Machine Learning

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Acquisition

2.2. Data Processing and Differential Expression Analysis

2.3. Enrichment Analysis

2.4. Hub Gene Selection

2.5. Hub Gene Validation

2.6. Immune Infiltration Analysis

2.7. Pan-Cancer Analysis

2.8. Tumor Microenvironment Analysis

2.9. Statistical Analysis

3. Results

3.1. Identification of DEGs

3.2. Enrichment Analysis of DEGs

3.3. Identification of Hub Genes

3.4. Validation of Hub Genes

3.5. Immune Infiltration Analysis

3.6. Pan-Cancer Analysis

3.7. TME Analysis of S100P

4. Discussion

5. Limitations

6. Conclusion

Abbreviations

Data Availability Statement

Conflicts of Interest

Author Contributions

Funding

Acknowledgments

References

Figures

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley