Volume 2024, Issue 1 1218700
Research Article
Open Access

Deciphering the Immune Subtypes and Signature Genes: A Novel Approach Towards Diagnosing and Prognosticating Severe Asthma Through Interpretable Machine Learning

Yue Hu

Yue Hu

Department of Pulmonary and Critical Care Medicine , Ruijin Hospital , Shanghai Jiao Tong University School of Medicine , Shanghai , China , shsmu.edu.cn

Institute of Respiratory Diseases , Shanghai Jiao Tong University School of Medicine , Shanghai , China , shsmu.edu.cn

Diagnosis and Treatment of Respiratory Infectious Diseases , Shanghai Key Laboratory of Emergency Prevention , Shanghai , China

Search for more papers by this author
Yating Lin

Yating Lin

Shanghai Children’s Hospital , School of Medicine , Shanghai Jiao Tong University , Shanghai , China , sjtu.edu.cn

Search for more papers by this author
Bo Peng

Bo Peng

Department of Pulmonary and Critical Care Medicine , Ruijin Hospital , Shanghai Jiao Tong University School of Medicine , Shanghai , China , shsmu.edu.cn

Institute of Respiratory Diseases , Shanghai Jiao Tong University School of Medicine , Shanghai , China , shsmu.edu.cn

Diagnosis and Treatment of Respiratory Infectious Diseases , Shanghai Key Laboratory of Emergency Prevention , Shanghai , China

Search for more papers by this author
Chunyan Xiang

Chunyan Xiang

Shanghai Institute of Rheumatology , Renji Hospital , School of Medicine , Shanghai JiaoTong University (SJTUSM) , Shanghai , China , renji.com

Search for more papers by this author
Wei Tang

Corresponding Author

Wei Tang

Department of Pulmonary and Critical Care Medicine , Ruijin Hospital , Shanghai Jiao Tong University School of Medicine , Shanghai , China , shsmu.edu.cn

Institute of Respiratory Diseases , Shanghai Jiao Tong University School of Medicine , Shanghai , China , shsmu.edu.cn

Diagnosis and Treatment of Respiratory Infectious Diseases , Shanghai Key Laboratory of Emergency Prevention , Shanghai , China

Search for more papers by this author
First published: 25 November 2024
Academic Editor: Roberta Antonia Diotti

Abstract

Asthma, a pervasive pulmonary disorder, affects countless individuals globally. Characterized by chronic inflammation of the bronchial passages, its symptoms include cough, wheezing, dyspnea, and chest tightness. While many manage their symptoms through pharmaceutical interventions and self-care, a significant subset grapples with severe asthma, posing therapeutic challenges. This study delves into the intricate etiology of asthma, emphasizing the pivotal roles of immune cells such as T cells, eosinophils, and mast cells in its pathogenesis. The recent emergence of monoclonal antibodies, including mepolizumab, reslizumab, and benralizumab, offers therapeutic promise, yet their efficacy varies due to the heterogeneous nature of asthma. Recognizing the potential of personalized medicine, this research underscores the need for a comprehensive understanding of asthma’s immunological diversity. We employ single-sample gene set enrichment analysis (ssGSEA) and Least Absolute Shrinkage and Selection Operator (LASSO) algorithms to identify differentially expressed immune cells and utilize machine learning techniques, including Extreme Gradient Boosting (XGBoost) and random forest, to predict severe asthma outcomes and identify key genes associated with immune cells. Using a murine asthma model and an online database, we aim to elucidate distinct immune-centric asthma subtypes. This study seeks to provide novel insights into the diagnosis and classification of severe asthma through a transcriptomic lens.

1. Introduction

Asthma, a ubiquitous pulmonary malaise, affects millions of people worldwide [1]. As an enduring inflammatory perturbation of the bronchial passageways, it exhibits symptoms encompassing chronic cough, wheezing, dyspnea, and thoracic tightness [2]. These clinical presentations originate from the inflammatory cascade within the airways, precipitating phenomena such as mucosal overproduction, architectural transformation of the airway walls, and amplified bronchial susceptibility [3]. While a predominant fraction of individuals with asthma can adeptly regulate their clinical manifestations through judicious pharmaceutical strategies and astute self-management, there exists a significant cohort persistently besieged by the severe manifestation of this condition, presenting a vexing therapeutic enigma [46]. Severe asthma is delineated as a malady demanding augmented doses of inhaled corticosteroids concomitant with ancillary controller medicaments or systemic corticosteroids to thwart unbridled exacerbations, or one that remains obdurate despite such pharmacological expeditions [7].

The complex therapeutic landscape of severe asthma, marked by patient diversity, places a significant burden on those affected and presents a challenging puzzle for healthcare providers to solve [8]. Identifying key diagnostic markers and understanding the subtle pathophysiological aspects of severe asthma is crucial for developing personalized treatments.

Asthma’s etiology is complex, comprising a network of immune cells and signal pathways. T cells are crucial, anchoring the immune system’s coordination [9]. In asthma, these cells become active, producing a slew of cytokines that alter immune communication [10]. Specifically, T helper 2 cells leave a lasting impact by releasing key cytokines—interleukin (IL)-4, -5, and -13—which drive inflammation and energize the immune front lines [11, 12]. Eosinophils, crucial in fighting parasites, swell in number, exacerbating inflammation through cytokines and other agents, attacking the delicate airway lining and summoning more immune cells to the site [1315]. Mast cells, central to allergic reactions, contribute to the development of asthma by releasing inflammatory agents like histamine, heightening airway disruption and sensitivity [16, 17]. In essence, asthma manifests as a complex immunological puzzle with diverse elements like T cells, eosinophils, and mast cells. Each plays a distinct role in the cascade of inflammation and sensitivity. Unraveling the complex molecular patterns and signaling pathways in asthma is critical for developing new treatments to address this persistent condition.

Recent advancements in asthma treatment have been marked by the introduction of novel monoclonal antibodies, including mepolizumab, reslizumab, and benralizumab. These agents specifically target immune pathways associated with IL-5 and have shown promising outcomes for select patient groups. For example, while early mepolizumab trials were not fully satisfactory, later research in patients selected based on biomarkers revealed significant improvements in lung function and reduced reliance on corticosteroids [18]. Nonetheless, this optimism is not universally applicable due to the significant heterogeneity observed among asthma patients, especially in type 2 (T2)-high asthma, evidenced by the mixed responses to anti-immunoglobulin E (IgE) treatments such as omalizumab, with about 70% of recipients showing improvement [19].

In the thrust towards personalized medicine, recognizing the diverse immunological foundations of asthma is critical. The genomic revolution underscores the urgency to establish a comprehensive omics framework to identify reliable biomarkers for severe asthma. A detailed examination of immune-related signature genes and precise categorization of immune molecular subtypes are crucial steps towards providing individualized treatment strategies for asthma. The classification of asthma includes varied subtypes, like allergic, nonallergic, occupational, aspirin-exacerbated respiratory disease, potentially fatal, exercise-induced, and cough variant asthma, each requiring specific clinical approaches. These subtypes display a wide array of clinical presentations and underlying mechanisms, posing challenges to patient management and ongoing care. The European Academy of Allergy and Clinical Immunology (EAACI) guidelines offer clear definitions for allergic and eosinophilic asthma, classifying them as distinct subtypes of T2-high diseases [20]. However, the sharp distinction between allergic and eosinophilic asthma may be inadequate due to significant overlap in patients, highlighting the need for more precise and scientific classification [21, 22].

The single-sample gene set enrichment analysis (ssGSEA) method, commonly used in immune cell infiltration analysis, estimates the relative enrichment of specific gene sets (immune cell gene sets) in samples based on gene expression data [23]. Least Absolute Shrinkage and Selection Operator (LASSO) is a regression analysis method that performs variable selection and regularization to enhance prediction accuracy and model interpretability. Gene studies involving numerous variables often face reproducibility issues. LASSO helps in selecting genes consistently associated with outcome variables to build replicable models [24]. We aim to dissect the immune complexities in patients with severe asthma, utilizing ssGSEA and LASSO to identify differentially active immune cells. By focusing on immune-associated differential genes and applying machine learning techniques, like Extreme Gradient Boosting (XGBoost), random forest, and LASSO, we strive to predict severe asthma outcomes and identify key genes closely related to immune mechanisms. Validating this genetic mosaic, we have summoned the insights of an asthmatic murine model and perused an ancillary online compendium. By exploring the diverse functions, pathways, immune cell infiltrations, and immunological characteristics, we aspire to provide groundbreaking insights that will enhance the diagnosis and detailed categorization of severe asthma via transcriptomics.

2. Materials and Methods

2.1. Raw Data Acquisition and Preprocessing

Gene expression profiles for asthma research were obtained from four distinct datasets: U-BIOPRED [25], Sun et al. [26], Li Q et al.[27], each with a unique Gene Expression Omnibus (GEO) accession number. We analyzed four datasets to identify initial findings and used additional datasets for validation. Specifically, Gene Expression Omnibus Series 76262 (GSE76262) were used for the initial analysis, while GSE74986, GSE74075, and GSE137268 served as validation datasets. This approach ensured that our findings were robust and reproducible across different cohorts. These datasets include RNA from healthy controls, moderate, and severe asthma cases, with the sample sizes and analysis platforms varying across studies. Detailed information on the datasets, including sample counts and microarray platforms used, can be found in Supporting Information 1: Table S1.

2.2. Assessment of Immune Cell Infiltration

To analyze immune cell infiltration, we applied ssGSEA to determine the abundance of 28 immune cell subtypes in individual samples, using enrichment scores to quantify each subtype’s relative proportion. Statistical significance was assessed with the Wilcoxon rank sum test, considering p values below 0.05.

For identifying critical immune cell subsets across patient groups, we utilized LASSO regression, dividing our data into training and validation sets. Optimal parameters were determined via tenfold cross-validation, selecting the penalty parameter (λ) that minimized cross-validation error. Immune cell subsets with nonzero coefficients after LASSO regression were further analyzed.

2.3. Weighted Gene Co-Expression Network Analysis (WGCNA) Network Construction and Module Identification

We used the WGCNA package to create a co-expression network from gene expression data, applying soft-thresholding to achieve a scale-free topology [28]. After determining the ideal soft-thresholding power, we constructed a topological overlap matrix and performed hierarchical clustering to detect gene modules. Correlations between these modules and immune cell subtypes were calculated using ssGSEA scores, focusing on modules with the strongest correlations for further analysis.

2.4. Evaluation of Immune-Associated Differential Genes

In this study, we procured 2483 immune-related genes from the Immunology Database and Analysis Portal (ImmPort) and retrieved 1378 immune genes from the InnateDB database, an integrated knowledge resource on innate immunity in humans and mice. To identify differentially expressed genes (DEGs) between control and severe asthma samples, we employed the “limma” R package with criteria of | log2 (fold change) | >0.5 and a false discovery rate (FDR) of 0.05. We ascertained immune-associated DEGs by intersecting immune-related genes from ImmPort and InnateDB, DEGs, and genes in feature modules.

2.5. Enrichment of Biological Functions and Pathways

We utilized the R package “clusterProfiler” (version 4.4.4) to conduct Gene Ontology (GO) biological functions and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses [29]. GO enrichment analysis identified biological functions enriched among input genes, while KEGG pathway enrichment analysis revealed associated enriched pathways. Significantly enriched biological functions and pathways were chosen based on a threshold of p value less than 0.05 and an FDR less than 0.05. Additionally, we generated a visual representation of enriched biological functions and pathways using the R packages ggplot2 (version 3.3.6) and GOplot (version 1.0.2).

2.6. Machine Learning Models for Sophisticated Feature Selection and Visualization

We divided the combined dataset, comprising 114 samples (21 healthy controls and 93 severe asthma cases), evenly into training and test sets using the caret package in R. We then employed LASSO regression, XGBoost, and random forest algorithms for model building and feature selection. LASSO regression, known for its capacity to reduce the number of features, was used for its ability to enhance model interpretability. XGBoost, a gradient-boosting framework, was chosen for its performance in classification tasks, and random forest was selected for its robustness and effectiveness in handling complex interactions and classifications. The receiver operating characteristic (ROC) curve and area under the curve (AUC) were calculated to evaluate the models’ ability to distinguish between the two groups, demonstrating the practical utility of these algorithms in selecting and visualizing key features indicative of severe asthma.

2.7. Unearthing Distinct Subgroups via Unsupervised Clustering

To identify subgroups within severe asthma patients, we used consensus clustering with the ConsensusClusterPlus package. The optimal number of clusters was determined using the consensus score matrix, cumulative distribution function (CDF) curve, proportion of ambiguous clustering (PAC) score, and Nbclust indicators. For visualization, we applied principal component analysis (PCA) to distinguish between the subtypes, with PCA plots highlighting the differences. Further details of this analysis are available in the Supporting Information.

2.8. Small-Molecule Compound Prognostication

We executed Connectivity Map (CMap) analysis to forecast small-molecule compounds targeting immune microenvironment subtypes 1 and 2, following previously established methods [30, 31]. Succinctly, we procured 1309 drug signatures from the CMap database (https://clue.io/) and chose the top 150 upregulated and 150 downregulated expression profiles as input data. Employing the eXtreme Sum (XSum) algorithm, we computed CMap scores, and subsequently, the top five small-molecule compounds were designated for visualization.

2.9. External Validation of Crucial Genes

We employed external datasets, comprising GSE74986, GSE137268, and GSE74075, to ascertain the capacity of immune related signature genes to differentiate severe asthma patients from healthy controls. Diagnostic effectiveness was visualized through AUC curves generated with the R package “pROC” (version 1.18.0).

2.10. Mouse Model

The house dust mite (HDM)–induced murine asthma paradigm was established as delineated previously [32]. Succinctly, wild-type (WT) mice were sensitized intranasally with 100 mg HDM in 40 mL phosphate-buffered saline (PBS) on Day 0 and confronted intranasally with 10 mg HDM from Days 7 to 11, culminating in harvest on Day 14.

2.11. Real-Time Reverse Transcription-Polymerase Chain Reaction (RT-PCR) Analysis

Pulmonary tissue RNA was extracted employing Accurate Biology RNA exPro RNA (AG21101). Total RNA underwent RT-PCR utilizing HiScript® III RT SuperMix for quantitative PCR (qPCR; Vazyme, Nanjing, China) for complementary DNA (cDNA) synthesis. The primers implemented for RT-PCR analysis were as Supporting Information 2: Table S2.

The qRT-PCR was executed with ChamQ Universal SYBR qPCR Master Mix (Vazyme, Nanjing, China) and an Applied Biosystems QuantStudio 3 (Applied Biosystems, CA, USA). Relative quantification was ascertained against a standard curve, and specific values were normalized against murine glyceraldehyde-3-phosphate dehydrogenase (Gapdh) messenger RNA (mRNA). The outcomes were characterized as a relative augmentation in mRNA expression in comparison to control values.

3. Results

3.1. Differential Immune Cell Landscape in Severe Asthma Patients

Our study, outlined in Figure 1, systematically identified unique immune cell profiles and gene markers in severe asthma, culminating in the discovery of distinct patient subtypes and potential therapeutic targets through advanced multiphase analysis and validation.

Details are in the caption following the image
Flowchart. Step I: Dataset collection and initial analysis. Obtaining sputum sample datasets from healthy controls and severe asthma patients, followed by immune cell abundance analysis using ssGSEA and PCA. This step sets the stage for identifying differences in immune cell profiles between the two groups. Step II: Identification of immune cell subtypes and gene modules. Utilization of LASSO regression analysis to identify key immune cell subtypes associated with asthma progression. Application of WGCNA to determine correlated gene modules, further refining the cellular and molecular landscape. Step III: Construction of signature utilizing multiple machine learning algorithms. Implementation of machine learning algorithms such as LASSO regression, XGBoost, and random forest to identify a robust set of key genes (e.g., HBEGF, HSPA1A, CD4, IL18R1, EDN1, CTLA4, and IGF1R) that can distinguish between healthy and severe asthma patients. Step IV: Validation in external cohorts and murine model. Validation of the identified key genes using ROC curves in test and validation sets, as well as external datasets. Further validation is conducted using a murine asthma model, confirming the genes’ roles in asthma pathogenesis. Step V: Subtype identification based on signature. Application of consensus clustering to identify distinct subtypes within severe asthma patients (C1 and C2). Exploration of differential gene expression and immune cell abundance between these subtypes to understand their unique characteristics. Step VI: Molecular characterizations of immune-related subtypes. Detailed molecular characterization of the identified subtypes, focusing on differential gene expression and immune cell abundance. Recommendations for potential drug targets based on CMap database analysis, paving the way for tailored therapeutic strategies. CMap, Connectivity Map; PCA, principal component analysis; ROC, receiver operating characteristic; ssGSEA, single-sample gene set enrichment analysis; WGCNA, weighted gene co-expression network analysis.

In our initial exploration of the U-BIOPRED (GSE76262) gene expression matrix, we conducted a PCA across three categories: healthy control, moderate asthma, and severe asthma. Interestingly, the PCA revealed a marked proximity between the healthy control and moderate asthma groups, suggesting a high degree of similarity in their gene expression profiles. In stark contrast, the severe asthma group manifested a discernible separation from the other two, underlining its distinct gene expression landscape (Figure 2A).

Details are in the caption following the image
Comprehensive transcriptomic and immune profiling of severe asthma in comparison to healthy controls. (A) GO analysis of upregulated genes in severe asthma compared to healthy controls. Key biological processes include cytokine-mediated signaling, leukocyte chemotaxis, and myeloid leukocyte activation, emphasizing the critical role of immune pathways in severe asthma pathogenesis. (B) Boxplot representation of immune cell subset enrichment scores in induced sputum samples between healthy controls (blue) and severe asthma patients (red). Significant increases in eosinophils, mast cells, macrophages, MDSCs, and Th17 cells are observed in severe asthma, while γδT cells, NK cells, and immature dendritic cells show reduced scores. Statistical significance is indicated (p < 0.05, ∗∗p < 0.01, ∗∗∗p < 0.001, and ∗∗∗∗p < 0.0001, ns, not significant). (C) PCA of immune cell infiltration profiles highlights distinct separation between healthy controls and severe asthma groups, revealing profound immune perturbations in severe asthma. (D) Immune cell interaction network constructed using correlation analysis, identifying four unique interaction patterns among 28 immune cell subsets in severe asthma. Blue edges represent positive correlations, while red edges denote negative correlations, illustrating the complex interplay of immune cells in the severe asthma microenvironment. GO, Gene Ontology; MDSCs, myeloid-derived suppressor cells; PCA, Principal Component Analysis; Th17, Helper T cells 17.
Details are in the caption following the image
Comprehensive transcriptomic and immune profiling of severe asthma in comparison to healthy controls. (A) GO analysis of upregulated genes in severe asthma compared to healthy controls. Key biological processes include cytokine-mediated signaling, leukocyte chemotaxis, and myeloid leukocyte activation, emphasizing the critical role of immune pathways in severe asthma pathogenesis. (B) Boxplot representation of immune cell subset enrichment scores in induced sputum samples between healthy controls (blue) and severe asthma patients (red). Significant increases in eosinophils, mast cells, macrophages, MDSCs, and Th17 cells are observed in severe asthma, while γδT cells, NK cells, and immature dendritic cells show reduced scores. Statistical significance is indicated (p < 0.05, ∗∗p < 0.01, ∗∗∗p < 0.001, and ∗∗∗∗p < 0.0001, ns, not significant). (C) PCA of immune cell infiltration profiles highlights distinct separation between healthy controls and severe asthma groups, revealing profound immune perturbations in severe asthma. (D) Immune cell interaction network constructed using correlation analysis, identifying four unique interaction patterns among 28 immune cell subsets in severe asthma. Blue edges represent positive correlations, while red edges denote negative correlations, illustrating the complex interplay of immune cells in the severe asthma microenvironment. GO, Gene Ontology; MDSCs, myeloid-derived suppressor cells; PCA, Principal Component Analysis; Th17, Helper T cells 17.
Details are in the caption following the image
Comprehensive transcriptomic and immune profiling of severe asthma in comparison to healthy controls. (A) GO analysis of upregulated genes in severe asthma compared to healthy controls. Key biological processes include cytokine-mediated signaling, leukocyte chemotaxis, and myeloid leukocyte activation, emphasizing the critical role of immune pathways in severe asthma pathogenesis. (B) Boxplot representation of immune cell subset enrichment scores in induced sputum samples between healthy controls (blue) and severe asthma patients (red). Significant increases in eosinophils, mast cells, macrophages, MDSCs, and Th17 cells are observed in severe asthma, while γδT cells, NK cells, and immature dendritic cells show reduced scores. Statistical significance is indicated (p < 0.05, ∗∗p < 0.01, ∗∗∗p < 0.001, and ∗∗∗∗p < 0.0001, ns, not significant). (C) PCA of immune cell infiltration profiles highlights distinct separation between healthy controls and severe asthma groups, revealing profound immune perturbations in severe asthma. (D) Immune cell interaction network constructed using correlation analysis, identifying four unique interaction patterns among 28 immune cell subsets in severe asthma. Blue edges represent positive correlations, while red edges denote negative correlations, illustrating the complex interplay of immune cells in the severe asthma microenvironment. GO, Gene Ontology; MDSCs, myeloid-derived suppressor cells; PCA, Principal Component Analysis; Th17, Helper T cells 17.
Details are in the caption following the image
Comprehensive transcriptomic and immune profiling of severe asthma in comparison to healthy controls. (A) GO analysis of upregulated genes in severe asthma compared to healthy controls. Key biological processes include cytokine-mediated signaling, leukocyte chemotaxis, and myeloid leukocyte activation, emphasizing the critical role of immune pathways in severe asthma pathogenesis. (B) Boxplot representation of immune cell subset enrichment scores in induced sputum samples between healthy controls (blue) and severe asthma patients (red). Significant increases in eosinophils, mast cells, macrophages, MDSCs, and Th17 cells are observed in severe asthma, while γδT cells, NK cells, and immature dendritic cells show reduced scores. Statistical significance is indicated (p < 0.05, ∗∗p < 0.01, ∗∗∗p < 0.001, and ∗∗∗∗p < 0.0001, ns, not significant). (C) PCA of immune cell infiltration profiles highlights distinct separation between healthy controls and severe asthma groups, revealing profound immune perturbations in severe asthma. (D) Immune cell interaction network constructed using correlation analysis, identifying four unique interaction patterns among 28 immune cell subsets in severe asthma. Blue edges represent positive correlations, while red edges denote negative correlations, illustrating the complex interplay of immune cells in the severe asthma microenvironment. GO, Gene Ontology; MDSCs, myeloid-derived suppressor cells; PCA, Principal Component Analysis; Th17, Helper T cells 17.

Given the clinical challenges of managing severe asthma, we honed our research focus on this group. A comparison between healthy controls and moderate asthma patients revealed only a small number of DEGs: 30 upregulated and 90 downregulated (Figure 2B), indicating a closer transcriptional resemblance between these two groups compared to the severe asthma cohort.

In contrast, between the healthy control and severe asthma groups, our analysis discerned 593 upregulated and 1493 downregulated genes (Figure 2C). This led to a GO biological process enrichment analysis, which highlighted significant immune-related activities, such as cytokine-mediated signaling and leukocyte migration, among the genes upregulated in severe asthma (Figure 2D). These findings underscore the importance of immune cell migration and cytokine signaling in severe asthma, warranting a deeper investigation into this facet.

To further elucidate the immune contrasts between healthy individuals and severe asthma patients, we embarked on an investigation of the variations in enrichment scores across 28 immune cell subsets from the dataset using the ssGSEA method. Leveraging the ssGSEA method, we determined the abundance scores of immune cells present in sputum samples from both cohorts. Our inquiry revealed that severe asthma patients manifested increased enrichment scores for eosinophils, mast cells, macrophages, myeloid-derived suppressor cells (MDSCs), activated B cells, activated Cluster of Differentiation 4 (CD4) T cells (defined as CD4 T cells in an elevated state of responsiveness poststimulation), monocytes, Helper T cells 17 (Th17) cells, memory B cells, and neutrophils when juxtaposed with healthy controls. Conversely, the γδT cells, immature Dendritic Cells (DC) cells, Natural Killer (NK) cells, and effector memory CD4 T cells exhibited reduced enrichment scores in the severe asthma samples (Figure 2B). It is worth highlighting that while variations in some cell types, such as Th17 cells, may seem marginal, even these subtle distinctions could have profound implications in the intricate landscape of severe asthma pathogenesis [33].

Another PCA focused on immune cell infiltration showed significant differences in the immune cell profiles of healthy versus severe asthma patients, further buttressing the premise that immune cell perturbations are deeply intertwined with the pathogenesis of severe asthma (Supporting Information 3: Figure S1A). Further analysis of immune cell correlations revealed complex interaction patterns among the 28 immune cell types, identifying four unique interaction patterns (Supporting Information 3: Figure S1B). This comprehensive view of immune cell interactions enriches our understanding of severe asthma’s pathogenesis.

3.2. Identification of Immune-Related DEGs

We employed LASSO regression analysis to pinpoint critical immune cell subtypes associated with severe asthma progression from the 28 discerned via ssGSEA (Figure 3A,B). Our findings spotlighted seven immune cell subtypes—MDSC, mast cell, activated CD4 T cell, eosinophil, macrophage, NK cell, and γδT cell, all of which had nonzero coefficients, underscoring their significance in asthma progression (Figure 3C). These cells play an integral role in asthma’s pathogenesis.

Details are in the caption following the image
Identification of key immune cell subtypes and gene modules associated with asthma progression. (A and B) LASSO regression analysis to identify the most relevant immune cell subtypes for asthma progression based on the ssGSEA results. (C) The seven immune cell subtypes (MDSC, mast cell, activated CD4 T cell, eosinophil, macrophage, NK cell, and γδT cell) with nonzero coefficients, indicating their significant association with asthma progression. (D) Determination of the optimal soft-thresholding parameter for WGCNA using the PickSoftThreshold function. (E) Hierarchical clustering algorithm results, depicting the clustering tree divided into seven gene modules of different colors, each containing a set of highly correlated genes. (F) Heatmap showing the correlation between gene modules and the seven selected immune cell subtypes, with the blue and turquoise modules demonstrating associations with different sets of immune cell subtypes. ME, module eigengene; ssGSEA, single-sample gene set enrichment analysis; T2, type 2; WGCNA, weighted gene co-expression network analysis.
Details are in the caption following the image
Identification of key immune cell subtypes and gene modules associated with asthma progression. (A and B) LASSO regression analysis to identify the most relevant immune cell subtypes for asthma progression based on the ssGSEA results. (C) The seven immune cell subtypes (MDSC, mast cell, activated CD4 T cell, eosinophil, macrophage, NK cell, and γδT cell) with nonzero coefficients, indicating their significant association with asthma progression. (D) Determination of the optimal soft-thresholding parameter for WGCNA using the PickSoftThreshold function. (E) Hierarchical clustering algorithm results, depicting the clustering tree divided into seven gene modules of different colors, each containing a set of highly correlated genes. (F) Heatmap showing the correlation between gene modules and the seven selected immune cell subtypes, with the blue and turquoise modules demonstrating associations with different sets of immune cell subtypes. ME, module eigengene; ssGSEA, single-sample gene set enrichment analysis; T2, type 2; WGCNA, weighted gene co-expression network analysis.
Details are in the caption following the image
Identification of key immune cell subtypes and gene modules associated with asthma progression. (A and B) LASSO regression analysis to identify the most relevant immune cell subtypes for asthma progression based on the ssGSEA results. (C) The seven immune cell subtypes (MDSC, mast cell, activated CD4 T cell, eosinophil, macrophage, NK cell, and γδT cell) with nonzero coefficients, indicating their significant association with asthma progression. (D) Determination of the optimal soft-thresholding parameter for WGCNA using the PickSoftThreshold function. (E) Hierarchical clustering algorithm results, depicting the clustering tree divided into seven gene modules of different colors, each containing a set of highly correlated genes. (F) Heatmap showing the correlation between gene modules and the seven selected immune cell subtypes, with the blue and turquoise modules demonstrating associations with different sets of immune cell subtypes. ME, module eigengene; ssGSEA, single-sample gene set enrichment analysis; T2, type 2; WGCNA, weighted gene co-expression network analysis.
Details are in the caption following the image
Identification of key immune cell subtypes and gene modules associated with asthma progression. (A and B) LASSO regression analysis to identify the most relevant immune cell subtypes for asthma progression based on the ssGSEA results. (C) The seven immune cell subtypes (MDSC, mast cell, activated CD4 T cell, eosinophil, macrophage, NK cell, and γδT cell) with nonzero coefficients, indicating their significant association with asthma progression. (D) Determination of the optimal soft-thresholding parameter for WGCNA using the PickSoftThreshold function. (E) Hierarchical clustering algorithm results, depicting the clustering tree divided into seven gene modules of different colors, each containing a set of highly correlated genes. (F) Heatmap showing the correlation between gene modules and the seven selected immune cell subtypes, with the blue and turquoise modules demonstrating associations with different sets of immune cell subtypes. ME, module eigengene; ssGSEA, single-sample gene set enrichment analysis; T2, type 2; WGCNA, weighted gene co-expression network analysis.
Details are in the caption following the image
Identification of key immune cell subtypes and gene modules associated with asthma progression. (A and B) LASSO regression analysis to identify the most relevant immune cell subtypes for asthma progression based on the ssGSEA results. (C) The seven immune cell subtypes (MDSC, mast cell, activated CD4 T cell, eosinophil, macrophage, NK cell, and γδT cell) with nonzero coefficients, indicating their significant association with asthma progression. (D) Determination of the optimal soft-thresholding parameter for WGCNA using the PickSoftThreshold function. (E) Hierarchical clustering algorithm results, depicting the clustering tree divided into seven gene modules of different colors, each containing a set of highly correlated genes. (F) Heatmap showing the correlation between gene modules and the seven selected immune cell subtypes, with the blue and turquoise modules demonstrating associations with different sets of immune cell subtypes. ME, module eigengene; ssGSEA, single-sample gene set enrichment analysis; T2, type 2; WGCNA, weighted gene co-expression network analysis.
Details are in the caption following the image
Identification of key immune cell subtypes and gene modules associated with asthma progression. (A and B) LASSO regression analysis to identify the most relevant immune cell subtypes for asthma progression based on the ssGSEA results. (C) The seven immune cell subtypes (MDSC, mast cell, activated CD4 T cell, eosinophil, macrophage, NK cell, and γδT cell) with nonzero coefficients, indicating their significant association with asthma progression. (D) Determination of the optimal soft-thresholding parameter for WGCNA using the PickSoftThreshold function. (E) Hierarchical clustering algorithm results, depicting the clustering tree divided into seven gene modules of different colors, each containing a set of highly correlated genes. (F) Heatmap showing the correlation between gene modules and the seven selected immune cell subtypes, with the blue and turquoise modules demonstrating associations with different sets of immune cell subtypes. ME, module eigengene; ssGSEA, single-sample gene set enrichment analysis; T2, type 2; WGCNA, weighted gene co-expression network analysis.

In a WGCNA of sputum samples from both healthy controls and severe asthma patients, we identified core modules corresponding to these seven unique immune cell subtypes. The soft threshold β was determined at 14, establishing an optimal power value for coexpression network construction (Figure 3D). This resulted in seven gene modules, each marked by a distinct color and related genes, correlating with the immune cell subtypes (Figure 3E,F). Notably, the blue module showed a positive correlation with several immune cells, whereas the turquoise module exhibited a mix of negative and positive correlations with different cells, indicating their varied roles in severe asthma. Moreover, these genes were confirmed in authoritative immune databases as actively participating in immune regulation and response processes. This approach identified 28 immune-related DEGs from the turquoise module and DEGs between healthy controls and severe asthma patients, both found in the ImmPort and InnateDB databases (Figure 4A). The blue module yielded 17 such DEGs (Figure 4B). Key genes like insulin-like growth factor 1 receptor (IGF1R), cytotoxic T-lymphocyte associated protein 4 (CTLA4), heat shock protein family A member 1A (HSPA1A), endothelin 1 (EDN1), and interleukin 18 receptor 1 (IL18R1) were upregulated in severe asthma, while heparin-binding EGF-like growth factor (HBEGF) and CD4 were downregulated (Figures 4C,D), highlighting their critical role in asthma progression.

Details are in the caption following the image
Analysis of immune-related DEGs from the blue and turquoise modules and their associations with asthma progression. (A) Venn diagram illustrating the identification of 28 immune-related DEGs shared by the turquoise module–related genes, DEGs between healthy controls and severe asthma, and immune-related genes from the ImmPort and InnateDB databases. (B) Venn diagram depicting the identification of 17 immune-related DEGs from the blue module. (C and D) Boxplots displaying the expression levels of the 45 immune-related DEGs in healthy control and severe asthma groups, highlighting the distinct immune features between the two groups. (E) GO enrichment analysis of the 45 immune-related DEGs from the blue and turquoise modules, demonstrating significant enrichment in immune and inflammation-related processes, implicating their potential roles in severe asthma pathogenesis. p < 0.05, ∗∗p < 0.01, ∗∗∗p < 0.001, and ∗∗∗∗p < 0.0001. DEGs, differentially expressed genes; GO, Gene Ontology; ImmPort, Immunology Database and Analysis Portal.
Details are in the caption following the image
Analysis of immune-related DEGs from the blue and turquoise modules and their associations with asthma progression. (A) Venn diagram illustrating the identification of 28 immune-related DEGs shared by the turquoise module–related genes, DEGs between healthy controls and severe asthma, and immune-related genes from the ImmPort and InnateDB databases. (B) Venn diagram depicting the identification of 17 immune-related DEGs from the blue module. (C and D) Boxplots displaying the expression levels of the 45 immune-related DEGs in healthy control and severe asthma groups, highlighting the distinct immune features between the two groups. (E) GO enrichment analysis of the 45 immune-related DEGs from the blue and turquoise modules, demonstrating significant enrichment in immune and inflammation-related processes, implicating their potential roles in severe asthma pathogenesis. p < 0.05, ∗∗p < 0.01, ∗∗∗p < 0.001, and ∗∗∗∗p < 0.0001. DEGs, differentially expressed genes; GO, Gene Ontology; ImmPort, Immunology Database and Analysis Portal.
Details are in the caption following the image
Analysis of immune-related DEGs from the blue and turquoise modules and their associations with asthma progression. (A) Venn diagram illustrating the identification of 28 immune-related DEGs shared by the turquoise module–related genes, DEGs between healthy controls and severe asthma, and immune-related genes from the ImmPort and InnateDB databases. (B) Venn diagram depicting the identification of 17 immune-related DEGs from the blue module. (C and D) Boxplots displaying the expression levels of the 45 immune-related DEGs in healthy control and severe asthma groups, highlighting the distinct immune features between the two groups. (E) GO enrichment analysis of the 45 immune-related DEGs from the blue and turquoise modules, demonstrating significant enrichment in immune and inflammation-related processes, implicating their potential roles in severe asthma pathogenesis. p < 0.05, ∗∗p < 0.01, ∗∗∗p < 0.001, and ∗∗∗∗p < 0.0001. DEGs, differentially expressed genes; GO, Gene Ontology; ImmPort, Immunology Database and Analysis Portal.
Details are in the caption following the image
Analysis of immune-related DEGs from the blue and turquoise modules and their associations with asthma progression. (A) Venn diagram illustrating the identification of 28 immune-related DEGs shared by the turquoise module–related genes, DEGs between healthy controls and severe asthma, and immune-related genes from the ImmPort and InnateDB databases. (B) Venn diagram depicting the identification of 17 immune-related DEGs from the blue module. (C and D) Boxplots displaying the expression levels of the 45 immune-related DEGs in healthy control and severe asthma groups, highlighting the distinct immune features between the two groups. (E) GO enrichment analysis of the 45 immune-related DEGs from the blue and turquoise modules, demonstrating significant enrichment in immune and inflammation-related processes, implicating their potential roles in severe asthma pathogenesis. p < 0.05, ∗∗p < 0.01, ∗∗∗p < 0.001, and ∗∗∗∗p < 0.0001. DEGs, differentially expressed genes; GO, Gene Ontology; ImmPort, Immunology Database and Analysis Portal.
Details are in the caption following the image
Analysis of immune-related DEGs from the blue and turquoise modules and their associations with asthma progression. (A) Venn diagram illustrating the identification of 28 immune-related DEGs shared by the turquoise module–related genes, DEGs between healthy controls and severe asthma, and immune-related genes from the ImmPort and InnateDB databases. (B) Venn diagram depicting the identification of 17 immune-related DEGs from the blue module. (C and D) Boxplots displaying the expression levels of the 45 immune-related DEGs in healthy control and severe asthma groups, highlighting the distinct immune features between the two groups. (E) GO enrichment analysis of the 45 immune-related DEGs from the blue and turquoise modules, demonstrating significant enrichment in immune and inflammation-related processes, implicating their potential roles in severe asthma pathogenesis. p < 0.05, ∗∗p < 0.01, ∗∗∗p < 0.001, and ∗∗∗∗p < 0.0001. DEGs, differentially expressed genes; GO, Gene Ontology; ImmPort, Immunology Database and Analysis Portal.

Our GO enrichment analysis connected these DEGs to immune and inflammatory responses, underscoring their relevance in severe asthma (Figure 4E). The enriched pathways suggest these genes are crucial for immune cell functions, impacting their interactions, regulations, and activations. Our study emphasizes the significant impact of these immune-related DEGs on asthma.

3.3. Development and Estimation of Machine Learning Models and Identification and Validation of Key Genes for Severe Asthma Discrimination

To identify the most informative gene features for discriminating between healthy and severe asthma patients, we employed three machine learning algorithms: LASSO regression, XGBoost, and random forest models, using 45 immune-related genes from a previous analysis as input features. We randomly divided our sample data into training and testing sets at a 1:1 ratio using caret package and built the three machine learning models on the training set.

We first created a LASSO regression model with the glmnet package, selecting features through the lambda.min value (Figure 5A), which simplifies the model by eliminating less critical features. The LASSO model’s efficacy was confirmed on the test set, showcasing an impressive AUC of 0.94 (Figure 5B). Next, we built an XGBoost model, a choice motivated by its ability to handle sparse data and its robustness against overfitting, using the xgboost package and measured its accuracy with log loss, achieving a 0.93 AUC on the test set (Figure 5C,D). Last, the random forest model, selected for its versatility and ease of interpretation, highlighted the 30 most important genes, reaching a 0.94 AUC on the test set (Figure 5E,F). Higher AUC values indicate superior model performance in distinguishing between the two patient groups.

Details are in the caption following the image
Identification of informative gene features using machine learning algorithms. (A) LASSO regression model construction and feature selection using the glmnet package with lambda.min value. (B) Validation of the LASSO model on the testing set, achieving an AUC of 0.94. (C) XGBoost model construction using the xgboost package with specified parameters and log loss as the evaluation metric. (D) Validation of the XGBoost model on the testing set, achieving an AUC of 0.93. (E) Random forest model construction using the random forest package and selection of top 30 genes based on importance scores. (F) Validation of the random forest model on the testing set, achieving an AUC of 0.94. Higher AUC values indicate better performance in distinguishing between healthy and severe asthma patients. (G) Identification of key genes for distinguishing between healthy and severe asthma patients. The intersection of three machine learning algorithms—LASSO regression, XGBoost, and random forest—reveals seven pivotal genes (HBEGF, HSPA1A, CD4, IL18R1, EDN1, CTLA4, and IGF1R) as robust and discriminative features. AUC, area under the curve.
Details are in the caption following the image
Identification of informative gene features using machine learning algorithms. (A) LASSO regression model construction and feature selection using the glmnet package with lambda.min value. (B) Validation of the LASSO model on the testing set, achieving an AUC of 0.94. (C) XGBoost model construction using the xgboost package with specified parameters and log loss as the evaluation metric. (D) Validation of the XGBoost model on the testing set, achieving an AUC of 0.93. (E) Random forest model construction using the random forest package and selection of top 30 genes based on importance scores. (F) Validation of the random forest model on the testing set, achieving an AUC of 0.94. Higher AUC values indicate better performance in distinguishing between healthy and severe asthma patients. (G) Identification of key genes for distinguishing between healthy and severe asthma patients. The intersection of three machine learning algorithms—LASSO regression, XGBoost, and random forest—reveals seven pivotal genes (HBEGF, HSPA1A, CD4, IL18R1, EDN1, CTLA4, and IGF1R) as robust and discriminative features. AUC, area under the curve.
Details are in the caption following the image
Identification of informative gene features using machine learning algorithms. (A) LASSO regression model construction and feature selection using the glmnet package with lambda.min value. (B) Validation of the LASSO model on the testing set, achieving an AUC of 0.94. (C) XGBoost model construction using the xgboost package with specified parameters and log loss as the evaluation metric. (D) Validation of the XGBoost model on the testing set, achieving an AUC of 0.93. (E) Random forest model construction using the random forest package and selection of top 30 genes based on importance scores. (F) Validation of the random forest model on the testing set, achieving an AUC of 0.94. Higher AUC values indicate better performance in distinguishing between healthy and severe asthma patients. (G) Identification of key genes for distinguishing between healthy and severe asthma patients. The intersection of three machine learning algorithms—LASSO regression, XGBoost, and random forest—reveals seven pivotal genes (HBEGF, HSPA1A, CD4, IL18R1, EDN1, CTLA4, and IGF1R) as robust and discriminative features. AUC, area under the curve.
Details are in the caption following the image
Identification of informative gene features using machine learning algorithms. (A) LASSO regression model construction and feature selection using the glmnet package with lambda.min value. (B) Validation of the LASSO model on the testing set, achieving an AUC of 0.94. (C) XGBoost model construction using the xgboost package with specified parameters and log loss as the evaluation metric. (D) Validation of the XGBoost model on the testing set, achieving an AUC of 0.93. (E) Random forest model construction using the random forest package and selection of top 30 genes based on importance scores. (F) Validation of the random forest model on the testing set, achieving an AUC of 0.94. Higher AUC values indicate better performance in distinguishing between healthy and severe asthma patients. (G) Identification of key genes for distinguishing between healthy and severe asthma patients. The intersection of three machine learning algorithms—LASSO regression, XGBoost, and random forest—reveals seven pivotal genes (HBEGF, HSPA1A, CD4, IL18R1, EDN1, CTLA4, and IGF1R) as robust and discriminative features. AUC, area under the curve.
Details are in the caption following the image
Identification of informative gene features using machine learning algorithms. (A) LASSO regression model construction and feature selection using the glmnet package with lambda.min value. (B) Validation of the LASSO model on the testing set, achieving an AUC of 0.94. (C) XGBoost model construction using the xgboost package with specified parameters and log loss as the evaluation metric. (D) Validation of the XGBoost model on the testing set, achieving an AUC of 0.93. (E) Random forest model construction using the random forest package and selection of top 30 genes based on importance scores. (F) Validation of the random forest model on the testing set, achieving an AUC of 0.94. Higher AUC values indicate better performance in distinguishing between healthy and severe asthma patients. (G) Identification of key genes for distinguishing between healthy and severe asthma patients. The intersection of three machine learning algorithms—LASSO regression, XGBoost, and random forest—reveals seven pivotal genes (HBEGF, HSPA1A, CD4, IL18R1, EDN1, CTLA4, and IGF1R) as robust and discriminative features. AUC, area under the curve.
Details are in the caption following the image
Identification of informative gene features using machine learning algorithms. (A) LASSO regression model construction and feature selection using the glmnet package with lambda.min value. (B) Validation of the LASSO model on the testing set, achieving an AUC of 0.94. (C) XGBoost model construction using the xgboost package with specified parameters and log loss as the evaluation metric. (D) Validation of the XGBoost model on the testing set, achieving an AUC of 0.93. (E) Random forest model construction using the random forest package and selection of top 30 genes based on importance scores. (F) Validation of the random forest model on the testing set, achieving an AUC of 0.94. Higher AUC values indicate better performance in distinguishing between healthy and severe asthma patients. (G) Identification of key genes for distinguishing between healthy and severe asthma patients. The intersection of three machine learning algorithms—LASSO regression, XGBoost, and random forest—reveals seven pivotal genes (HBEGF, HSPA1A, CD4, IL18R1, EDN1, CTLA4, and IGF1R) as robust and discriminative features. AUC, area under the curve.
Details are in the caption following the image
Identification of informative gene features using machine learning algorithms. (A) LASSO regression model construction and feature selection using the glmnet package with lambda.min value. (B) Validation of the LASSO model on the testing set, achieving an AUC of 0.94. (C) XGBoost model construction using the xgboost package with specified parameters and log loss as the evaluation metric. (D) Validation of the XGBoost model on the testing set, achieving an AUC of 0.93. (E) Random forest model construction using the random forest package and selection of top 30 genes based on importance scores. (F) Validation of the random forest model on the testing set, achieving an AUC of 0.94. Higher AUC values indicate better performance in distinguishing between healthy and severe asthma patients. (G) Identification of key genes for distinguishing between healthy and severe asthma patients. The intersection of three machine learning algorithms—LASSO regression, XGBoost, and random forest—reveals seven pivotal genes (HBEGF, HSPA1A, CD4, IL18R1, EDN1, CTLA4, and IGF1R) as robust and discriminative features. AUC, area under the curve.

Integrating the findings from the LASSO regression, XGBoost, and random forest algorithms, we identified seven key genes (HBEGF, HSPA1A, CD4, IL18R1, EDN1, CTLA4, IGF1R) that consistently distinguished between healthy and severe asthma cases (Figure 5G). These genes are promising biomarkers for severe asthma, providing insights into the disease’s molecular framework and suggesting potential targets for new treatments. Their relevance to immune responses and inflammation in severe asthma merits further study.

Based on the identified genes, we determined their positions in the human genome (Figure 6A). Subsequently, we utilized ROC curves to validate the performance of these seven genes in the test set, validation set, and their combination. The AUC values demonstrated satisfactory performance across all sets (Figure 6B–H).

Details are in the caption following the image
Gene positions and validation performance. (A) Determination of the identified genes’ positions in the human genome. (B–H) Validation of the seven genes’ performance using ROC curves in the test set, validation set, and their combination, with AUC values demonstrating satisfactory performance across all sets. AUC, area under the curve; ROC, receiver operating characteristic.
Details are in the caption following the image
Gene positions and validation performance. (A) Determination of the identified genes’ positions in the human genome. (B–H) Validation of the seven genes’ performance using ROC curves in the test set, validation set, and their combination, with AUC values demonstrating satisfactory performance across all sets. AUC, area under the curve; ROC, receiver operating characteristic.
Details are in the caption following the image
Gene positions and validation performance. (A) Determination of the identified genes’ positions in the human genome. (B–H) Validation of the seven genes’ performance using ROC curves in the test set, validation set, and their combination, with AUC values demonstrating satisfactory performance across all sets. AUC, area under the curve; ROC, receiver operating characteristic.
Details are in the caption following the image
Gene positions and validation performance. (A) Determination of the identified genes’ positions in the human genome. (B–H) Validation of the seven genes’ performance using ROC curves in the test set, validation set, and their combination, with AUC values demonstrating satisfactory performance across all sets. AUC, area under the curve; ROC, receiver operating characteristic.
Details are in the caption following the image
Gene positions and validation performance. (A) Determination of the identified genes’ positions in the human genome. (B–H) Validation of the seven genes’ performance using ROC curves in the test set, validation set, and their combination, with AUC values demonstrating satisfactory performance across all sets. AUC, area under the curve; ROC, receiver operating characteristic.
Details are in the caption following the image
Gene positions and validation performance. (A) Determination of the identified genes’ positions in the human genome. (B–H) Validation of the seven genes’ performance using ROC curves in the test set, validation set, and their combination, with AUC values demonstrating satisfactory performance across all sets. AUC, area under the curve; ROC, receiver operating characteristic.
Details are in the caption following the image
Gene positions and validation performance. (A) Determination of the identified genes’ positions in the human genome. (B–H) Validation of the seven genes’ performance using ROC curves in the test set, validation set, and their combination, with AUC values demonstrating satisfactory performance across all sets. AUC, area under the curve; ROC, receiver operating characteristic.
Details are in the caption following the image
Gene positions and validation performance. (A) Determination of the identified genes’ positions in the human genome. (B–H) Validation of the seven genes’ performance using ROC curves in the test set, validation set, and their combination, with AUC values demonstrating satisfactory performance across all sets. AUC, area under the curve; ROC, receiver operating characteristic.

3.4. Validation of Key Genes in External Dataset and Animal Model

To corroborate the expression of the seven pivotal genes identified, we delved into transcriptomic samples from healthy controls and severe asthma patients across three external datasets: GSE74986, GSE137268, and GSE74075. However, it is worth noting that due to microarray probe limitations inherent to the GSE74986 dataset, we could not ascertain the expression levels of all seven genes therein. Regardless, our results from the other datasets and U-BIOPRED consistently demonstrated similar expression trends. Furthermore, the area under the ROC curve validated their effectiveness (Figure 7A–C).

Details are in the caption following the image
Validation of the seven key genes identified using external datasets and a murine asthma model. (A–C) Expression trends and ROC curve analysis of the seven genes in external datasets GSE74986, GSE137268, and GSE74075, demonstrating consistent results with the GSE76262 dataset. (D) Schematic representation of the murine asthma model protocol using HDM administration. (E–K) qPCR analysis of the gene expression levels of HBEGF, HSPA1A, CD4, IL18R1, EDN1, CTLA4, and IGF1R in murine lung tissues, showing expression trends consistent with the transcriptomic data in patients and further validating the role of these genes in asthma pathogenesis. Statistical significance is indicated as p < 0.05, ∗∗p < 0.01, ∗∗∗p < 0.001, and ∗∗∗∗p < 0.0001. HDM, house dust mite; ns, not significant; qPCR, quantitative polymerase chain reaction; ROC, receiver operating characteristic.
Details are in the caption following the image
Validation of the seven key genes identified using external datasets and a murine asthma model. (A–C) Expression trends and ROC curve analysis of the seven genes in external datasets GSE74986, GSE137268, and GSE74075, demonstrating consistent results with the GSE76262 dataset. (D) Schematic representation of the murine asthma model protocol using HDM administration. (E–K) qPCR analysis of the gene expression levels of HBEGF, HSPA1A, CD4, IL18R1, EDN1, CTLA4, and IGF1R in murine lung tissues, showing expression trends consistent with the transcriptomic data in patients and further validating the role of these genes in asthma pathogenesis. Statistical significance is indicated as p < 0.05, ∗∗p < 0.01, ∗∗∗p < 0.001, and ∗∗∗∗p < 0.0001. HDM, house dust mite; ns, not significant; qPCR, quantitative polymerase chain reaction; ROC, receiver operating characteristic.
Details are in the caption following the image
Validation of the seven key genes identified using external datasets and a murine asthma model. (A–C) Expression trends and ROC curve analysis of the seven genes in external datasets GSE74986, GSE137268, and GSE74075, demonstrating consistent results with the GSE76262 dataset. (D) Schematic representation of the murine asthma model protocol using HDM administration. (E–K) qPCR analysis of the gene expression levels of HBEGF, HSPA1A, CD4, IL18R1, EDN1, CTLA4, and IGF1R in murine lung tissues, showing expression trends consistent with the transcriptomic data in patients and further validating the role of these genes in asthma pathogenesis. Statistical significance is indicated as p < 0.05, ∗∗p < 0.01, ∗∗∗p < 0.001, and ∗∗∗∗p < 0.0001. HDM, house dust mite; ns, not significant; qPCR, quantitative polymerase chain reaction; ROC, receiver operating characteristic.
Details are in the caption following the image
Validation of the seven key genes identified using external datasets and a murine asthma model. (A–C) Expression trends and ROC curve analysis of the seven genes in external datasets GSE74986, GSE137268, and GSE74075, demonstrating consistent results with the GSE76262 dataset. (D) Schematic representation of the murine asthma model protocol using HDM administration. (E–K) qPCR analysis of the gene expression levels of HBEGF, HSPA1A, CD4, IL18R1, EDN1, CTLA4, and IGF1R in murine lung tissues, showing expression trends consistent with the transcriptomic data in patients and further validating the role of these genes in asthma pathogenesis. Statistical significance is indicated as p < 0.05, ∗∗p < 0.01, ∗∗∗p < 0.001, and ∗∗∗∗p < 0.0001. HDM, house dust mite; ns, not significant; qPCR, quantitative polymerase chain reaction; ROC, receiver operating characteristic.
Details are in the caption following the image
Validation of the seven key genes identified using external datasets and a murine asthma model. (A–C) Expression trends and ROC curve analysis of the seven genes in external datasets GSE74986, GSE137268, and GSE74075, demonstrating consistent results with the GSE76262 dataset. (D) Schematic representation of the murine asthma model protocol using HDM administration. (E–K) qPCR analysis of the gene expression levels of HBEGF, HSPA1A, CD4, IL18R1, EDN1, CTLA4, and IGF1R in murine lung tissues, showing expression trends consistent with the transcriptomic data in patients and further validating the role of these genes in asthma pathogenesis. Statistical significance is indicated as p < 0.05, ∗∗p < 0.01, ∗∗∗p < 0.001, and ∗∗∗∗p < 0.0001. HDM, house dust mite; ns, not significant; qPCR, quantitative polymerase chain reaction; ROC, receiver operating characteristic.
Details are in the caption following the image
Validation of the seven key genes identified using external datasets and a murine asthma model. (A–C) Expression trends and ROC curve analysis of the seven genes in external datasets GSE74986, GSE137268, and GSE74075, demonstrating consistent results with the GSE76262 dataset. (D) Schematic representation of the murine asthma model protocol using HDM administration. (E–K) qPCR analysis of the gene expression levels of HBEGF, HSPA1A, CD4, IL18R1, EDN1, CTLA4, and IGF1R in murine lung tissues, showing expression trends consistent with the transcriptomic data in patients and further validating the role of these genes in asthma pathogenesis. Statistical significance is indicated as p < 0.05, ∗∗p < 0.01, ∗∗∗p < 0.001, and ∗∗∗∗p < 0.0001. HDM, house dust mite; ns, not significant; qPCR, quantitative polymerase chain reaction; ROC, receiver operating characteristic.
Details are in the caption following the image
Validation of the seven key genes identified using external datasets and a murine asthma model. (A–C) Expression trends and ROC curve analysis of the seven genes in external datasets GSE74986, GSE137268, and GSE74075, demonstrating consistent results with the GSE76262 dataset. (D) Schematic representation of the murine asthma model protocol using HDM administration. (E–K) qPCR analysis of the gene expression levels of HBEGF, HSPA1A, CD4, IL18R1, EDN1, CTLA4, and IGF1R in murine lung tissues, showing expression trends consistent with the transcriptomic data in patients and further validating the role of these genes in asthma pathogenesis. Statistical significance is indicated as p < 0.05, ∗∗p < 0.01, ∗∗∗p < 0.001, and ∗∗∗∗p < 0.0001. HDM, house dust mite; ns, not significant; qPCR, quantitative polymerase chain reaction; ROC, receiver operating characteristic.
Details are in the caption following the image
Validation of the seven key genes identified using external datasets and a murine asthma model. (A–C) Expression trends and ROC curve analysis of the seven genes in external datasets GSE74986, GSE137268, and GSE74075, demonstrating consistent results with the GSE76262 dataset. (D) Schematic representation of the murine asthma model protocol using HDM administration. (E–K) qPCR analysis of the gene expression levels of HBEGF, HSPA1A, CD4, IL18R1, EDN1, CTLA4, and IGF1R in murine lung tissues, showing expression trends consistent with the transcriptomic data in patients and further validating the role of these genes in asthma pathogenesis. Statistical significance is indicated as p < 0.05, ∗∗p < 0.01, ∗∗∗p < 0.001, and ∗∗∗∗p < 0.0001. HDM, house dust mite; ns, not significant; qPCR, quantitative polymerase chain reaction; ROC, receiver operating characteristic.
Details are in the caption following the image
Validation of the seven key genes identified using external datasets and a murine asthma model. (A–C) Expression trends and ROC curve analysis of the seven genes in external datasets GSE74986, GSE137268, and GSE74075, demonstrating consistent results with the GSE76262 dataset. (D) Schematic representation of the murine asthma model protocol using HDM administration. (E–K) qPCR analysis of the gene expression levels of HBEGF, HSPA1A, CD4, IL18R1, EDN1, CTLA4, and IGF1R in murine lung tissues, showing expression trends consistent with the transcriptomic data in patients and further validating the role of these genes in asthma pathogenesis. Statistical significance is indicated as p < 0.05, ∗∗p < 0.01, ∗∗∗p < 0.001, and ∗∗∗∗p < 0.0001. HDM, house dust mite; ns, not significant; qPCR, quantitative polymerase chain reaction; ROC, receiver operating characteristic.
Details are in the caption following the image
Validation of the seven key genes identified using external datasets and a murine asthma model. (A–C) Expression trends and ROC curve analysis of the seven genes in external datasets GSE74986, GSE137268, and GSE74075, demonstrating consistent results with the GSE76262 dataset. (D) Schematic representation of the murine asthma model protocol using HDM administration. (E–K) qPCR analysis of the gene expression levels of HBEGF, HSPA1A, CD4, IL18R1, EDN1, CTLA4, and IGF1R in murine lung tissues, showing expression trends consistent with the transcriptomic data in patients and further validating the role of these genes in asthma pathogenesis. Statistical significance is indicated as p < 0.05, ∗∗p < 0.01, ∗∗∗p < 0.001, and ∗∗∗∗p < 0.0001. HDM, house dust mite; ns, not significant; qPCR, quantitative polymerase chain reaction; ROC, receiver operating characteristic.
Details are in the caption following the image
Validation of the seven key genes identified using external datasets and a murine asthma model. (A–C) Expression trends and ROC curve analysis of the seven genes in external datasets GSE74986, GSE137268, and GSE74075, demonstrating consistent results with the GSE76262 dataset. (D) Schematic representation of the murine asthma model protocol using HDM administration. (E–K) qPCR analysis of the gene expression levels of HBEGF, HSPA1A, CD4, IL18R1, EDN1, CTLA4, and IGF1R in murine lung tissues, showing expression trends consistent with the transcriptomic data in patients and further validating the role of these genes in asthma pathogenesis. Statistical significance is indicated as p < 0.05, ∗∗p < 0.01, ∗∗∗p < 0.001, and ∗∗∗∗p < 0.0001. HDM, house dust mite; ns, not significant; qPCR, quantitative polymerase chain reaction; ROC, receiver operating characteristic.

To verify the expression patterns of the seven genes (HBEGF, HSPA1A, CD4, IL18R1, EDN1, CTLA4, and IGF1R) identified by machine learning in asthma pathogenesis, we established a murine model using the following protocol: on day 0, mice were administered 100 µg of HDM in 40 µL of PBS; on days 7–11, they received 10 µg of HDM in 40 µL of PBS; and on day 14, the mice were sacrificed (Figure 7D).

Subsequently, we isolated RNA from murine lung tissues and performed qPCR analysis to assess gene expression levels. The results demonstrated that the expression trends of HBEGF, HSPA1A, IL18R1, EDN1, CD4, CTLA4, and IGF1R were consistent with those observed at the transcriptomic level in patients (Figures 4 and 7E–K).

The consistency between the animal model findings and transcriptomic data supports the reliability of these gene expression patterns as potential biomarkers and therapeutic targets for asthma.

3.5. Dentification of Immune Subtypes of Patients with Severe Asthma

We performed a consensus cluster analysis, in which all severe asthma samples were initially divided into k (k = 2–9) clusters based on the expression matrix of the featured genes above. The CDF curves of the consensus score matrix and PAC statistic indicated that the optimal number was obtained when k = 2 (Figure 8A–C). The same result was achieved from Nbclust testing (Figure 8D).

Details are in the caption following the image
Identification of distinct subtypes within severe asthma patients and their differential gene expression. (A–D) Consensus clustering analysis was performed using the ConsensusClusterPlus package to determine the optimal number of clusters, based on the consensus score matrix, CDF curve, PAC score, and Nbclust. (E) PCA plot illustrating the differences between the two derived subtypes, C1 and C2. (F) Volcano plot showcasing the DEGs between subtypes C1 and C2. (G) Boxplot depicting the differential expression of the seven critical genes (HBEGF, HSPA1A, CD4, IL18R1, EDN1, CTLA4, and IGF1R) between the C1 and C2 subtypes. (H) Corplot demonstrating the linear relationship between the expression levels of the seven key genes in the subtypes. These results highlight the distinct molecular characteristics present within severe asthma patients, which may have implications for personalized therapeutic approaches. p < 0.05, ∗∗p < 0.01, ∗∗∗p < 0.001, and ∗∗∗∗p < 0.0001. CDF, cumulative distribution function; DEGs, differentially expressed genes; PAC, proportion of ambiguous clustering; PCA, principal component analysis.
Details are in the caption following the image
Identification of distinct subtypes within severe asthma patients and their differential gene expression. (A–D) Consensus clustering analysis was performed using the ConsensusClusterPlus package to determine the optimal number of clusters, based on the consensus score matrix, CDF curve, PAC score, and Nbclust. (E) PCA plot illustrating the differences between the two derived subtypes, C1 and C2. (F) Volcano plot showcasing the DEGs between subtypes C1 and C2. (G) Boxplot depicting the differential expression of the seven critical genes (HBEGF, HSPA1A, CD4, IL18R1, EDN1, CTLA4, and IGF1R) between the C1 and C2 subtypes. (H) Corplot demonstrating the linear relationship between the expression levels of the seven key genes in the subtypes. These results highlight the distinct molecular characteristics present within severe asthma patients, which may have implications for personalized therapeutic approaches. p < 0.05, ∗∗p < 0.01, ∗∗∗p < 0.001, and ∗∗∗∗p < 0.0001. CDF, cumulative distribution function; DEGs, differentially expressed genes; PAC, proportion of ambiguous clustering; PCA, principal component analysis.
Details are in the caption following the image
Identification of distinct subtypes within severe asthma patients and their differential gene expression. (A–D) Consensus clustering analysis was performed using the ConsensusClusterPlus package to determine the optimal number of clusters, based on the consensus score matrix, CDF curve, PAC score, and Nbclust. (E) PCA plot illustrating the differences between the two derived subtypes, C1 and C2. (F) Volcano plot showcasing the DEGs between subtypes C1 and C2. (G) Boxplot depicting the differential expression of the seven critical genes (HBEGF, HSPA1A, CD4, IL18R1, EDN1, CTLA4, and IGF1R) between the C1 and C2 subtypes. (H) Corplot demonstrating the linear relationship between the expression levels of the seven key genes in the subtypes. These results highlight the distinct molecular characteristics present within severe asthma patients, which may have implications for personalized therapeutic approaches. p < 0.05, ∗∗p < 0.01, ∗∗∗p < 0.001, and ∗∗∗∗p < 0.0001. CDF, cumulative distribution function; DEGs, differentially expressed genes; PAC, proportion of ambiguous clustering; PCA, principal component analysis.
Details are in the caption following the image
Identification of distinct subtypes within severe asthma patients and their differential gene expression. (A–D) Consensus clustering analysis was performed using the ConsensusClusterPlus package to determine the optimal number of clusters, based on the consensus score matrix, CDF curve, PAC score, and Nbclust. (E) PCA plot illustrating the differences between the two derived subtypes, C1 and C2. (F) Volcano plot showcasing the DEGs between subtypes C1 and C2. (G) Boxplot depicting the differential expression of the seven critical genes (HBEGF, HSPA1A, CD4, IL18R1, EDN1, CTLA4, and IGF1R) between the C1 and C2 subtypes. (H) Corplot demonstrating the linear relationship between the expression levels of the seven key genes in the subtypes. These results highlight the distinct molecular characteristics present within severe asthma patients, which may have implications for personalized therapeutic approaches. p < 0.05, ∗∗p < 0.01, ∗∗∗p < 0.001, and ∗∗∗∗p < 0.0001. CDF, cumulative distribution function; DEGs, differentially expressed genes; PAC, proportion of ambiguous clustering; PCA, principal component analysis.
Details are in the caption following the image
Identification of distinct subtypes within severe asthma patients and their differential gene expression. (A–D) Consensus clustering analysis was performed using the ConsensusClusterPlus package to determine the optimal number of clusters, based on the consensus score matrix, CDF curve, PAC score, and Nbclust. (E) PCA plot illustrating the differences between the two derived subtypes, C1 and C2. (F) Volcano plot showcasing the DEGs between subtypes C1 and C2. (G) Boxplot depicting the differential expression of the seven critical genes (HBEGF, HSPA1A, CD4, IL18R1, EDN1, CTLA4, and IGF1R) between the C1 and C2 subtypes. (H) Corplot demonstrating the linear relationship between the expression levels of the seven key genes in the subtypes. These results highlight the distinct molecular characteristics present within severe asthma patients, which may have implications for personalized therapeutic approaches. p < 0.05, ∗∗p < 0.01, ∗∗∗p < 0.001, and ∗∗∗∗p < 0.0001. CDF, cumulative distribution function; DEGs, differentially expressed genes; PAC, proportion of ambiguous clustering; PCA, principal component analysis.
Details are in the caption following the image
Identification of distinct subtypes within severe asthma patients and their differential gene expression. (A–D) Consensus clustering analysis was performed using the ConsensusClusterPlus package to determine the optimal number of clusters, based on the consensus score matrix, CDF curve, PAC score, and Nbclust. (E) PCA plot illustrating the differences between the two derived subtypes, C1 and C2. (F) Volcano plot showcasing the DEGs between subtypes C1 and C2. (G) Boxplot depicting the differential expression of the seven critical genes (HBEGF, HSPA1A, CD4, IL18R1, EDN1, CTLA4, and IGF1R) between the C1 and C2 subtypes. (H) Corplot demonstrating the linear relationship between the expression levels of the seven key genes in the subtypes. These results highlight the distinct molecular characteristics present within severe asthma patients, which may have implications for personalized therapeutic approaches. p < 0.05, ∗∗p < 0.01, ∗∗∗p < 0.001, and ∗∗∗∗p < 0.0001. CDF, cumulative distribution function; DEGs, differentially expressed genes; PAC, proportion of ambiguous clustering; PCA, principal component analysis.
Details are in the caption following the image
Identification of distinct subtypes within severe asthma patients and their differential gene expression. (A–D) Consensus clustering analysis was performed using the ConsensusClusterPlus package to determine the optimal number of clusters, based on the consensus score matrix, CDF curve, PAC score, and Nbclust. (E) PCA plot illustrating the differences between the two derived subtypes, C1 and C2. (F) Volcano plot showcasing the DEGs between subtypes C1 and C2. (G) Boxplot depicting the differential expression of the seven critical genes (HBEGF, HSPA1A, CD4, IL18R1, EDN1, CTLA4, and IGF1R) between the C1 and C2 subtypes. (H) Corplot demonstrating the linear relationship between the expression levels of the seven key genes in the subtypes. These results highlight the distinct molecular characteristics present within severe asthma patients, which may have implications for personalized therapeutic approaches. p < 0.05, ∗∗p < 0.01, ∗∗∗p < 0.001, and ∗∗∗∗p < 0.0001. CDF, cumulative distribution function; DEGs, differentially expressed genes; PAC, proportion of ambiguous clustering; PCA, principal component analysis.
Details are in the caption following the image
Identification of distinct subtypes within severe asthma patients and their differential gene expression. (A–D) Consensus clustering analysis was performed using the ConsensusClusterPlus package to determine the optimal number of clusters, based on the consensus score matrix, CDF curve, PAC score, and Nbclust. (E) PCA plot illustrating the differences between the two derived subtypes, C1 and C2. (F) Volcano plot showcasing the DEGs between subtypes C1 and C2. (G) Boxplot depicting the differential expression of the seven critical genes (HBEGF, HSPA1A, CD4, IL18R1, EDN1, CTLA4, and IGF1R) between the C1 and C2 subtypes. (H) Corplot demonstrating the linear relationship between the expression levels of the seven key genes in the subtypes. These results highlight the distinct molecular characteristics present within severe asthma patients, which may have implications for personalized therapeutic approaches. p < 0.05, ∗∗p < 0.01, ∗∗∗p < 0.001, and ∗∗∗∗p < 0.0001. CDF, cumulative distribution function; DEGs, differentially expressed genes; PAC, proportion of ambiguous clustering; PCA, principal component analysis.

To visualize the distinctions between the two derived subtypes, we conducted PCA using the R function PCA (Figure 8E). The resulting PCA plots effectively displayed the differences between the two subtypes. Subsequently, we executed a differential analysis between subtypes C1 and C2. Although both subtypes consisted of severe asthma patients, a considerable number of DEGs were observed between them (Figure 8F). This observation was also illustrated in the volcano plot. The seven critical genes exhibited differential expression between the C1 and C2 subtypes, as demonstrated by the boxplot in Figure 8G. Additionally, a corplot revealed a certain degree of linear relationship between the expression levels of these genes (Figure 8H). These findings emphasize the presence of distinct molecular characteristics within severe asthma patients, warranting further exploration to refine personalized therapeutic approaches.

3.6. Differentiation of Immune Characteristics Between Immune-Related Subtypes

To better clarify and understand the biological and immunological differences and relationships between these immune microenvironment subtypes, following the ssGSEA immune infiltration analysis of subtypes C1 and C2, we observed a significant difference in immune cell abundance between the two subgroups, as illustrated in the differential immune cell abundance heatmap (Figure 9A) and the PCA plot (Figure 9B).

Details are in the caption following the image
Differential immune cell abundance and correlation with gene expression in subtypes C1 and C2. (A) Heatmap illustrating significant differences in immune cell abundance between subtypes C1 and C2, based on ssGSEA immune infiltration analysis. (B) PCA plot visualizing the distinctions between the two subgroups. (C) Comparison of 28 immune cell subtypes between C1 and C2, with C1 displaying higher infiltration of various T cells, NK cells, and immature DC cells, while C2 is enriched with innate immune cells. (D) Correlation between immune cell abundance from ssGSEA and gene expression, revealing a strong positive correlation between γδT cell abundance and the expression of HBEGF and CD4 in the C1 subtype, and a strong negative correlation with CTLA4, EDN1, HSPA1A, and IL18R1 in the C2 subtype, suggesting the potential role of γδT cells in distinguishing the subtypes and their involvement in differential immune responses. p < 0.05, ∗∗p < 0.01, ∗∗∗p < 0.001, and ∗∗∗∗p < 0.0001. ns, no significance; ssGSEA, single-sample gene set enrichment analysis; PCA, principal component analysis.
Details are in the caption following the image
Differential immune cell abundance and correlation with gene expression in subtypes C1 and C2. (A) Heatmap illustrating significant differences in immune cell abundance between subtypes C1 and C2, based on ssGSEA immune infiltration analysis. (B) PCA plot visualizing the distinctions between the two subgroups. (C) Comparison of 28 immune cell subtypes between C1 and C2, with C1 displaying higher infiltration of various T cells, NK cells, and immature DC cells, while C2 is enriched with innate immune cells. (D) Correlation between immune cell abundance from ssGSEA and gene expression, revealing a strong positive correlation between γδT cell abundance and the expression of HBEGF and CD4 in the C1 subtype, and a strong negative correlation with CTLA4, EDN1, HSPA1A, and IL18R1 in the C2 subtype, suggesting the potential role of γδT cells in distinguishing the subtypes and their involvement in differential immune responses. p < 0.05, ∗∗p < 0.01, ∗∗∗p < 0.001, and ∗∗∗∗p < 0.0001. ns, no significance; ssGSEA, single-sample gene set enrichment analysis; PCA, principal component analysis.
Details are in the caption following the image
Differential immune cell abundance and correlation with gene expression in subtypes C1 and C2. (A) Heatmap illustrating significant differences in immune cell abundance between subtypes C1 and C2, based on ssGSEA immune infiltration analysis. (B) PCA plot visualizing the distinctions between the two subgroups. (C) Comparison of 28 immune cell subtypes between C1 and C2, with C1 displaying higher infiltration of various T cells, NK cells, and immature DC cells, while C2 is enriched with innate immune cells. (D) Correlation between immune cell abundance from ssGSEA and gene expression, revealing a strong positive correlation between γδT cell abundance and the expression of HBEGF and CD4 in the C1 subtype, and a strong negative correlation with CTLA4, EDN1, HSPA1A, and IL18R1 in the C2 subtype, suggesting the potential role of γδT cells in distinguishing the subtypes and their involvement in differential immune responses. p < 0.05, ∗∗p < 0.01, ∗∗∗p < 0.001, and ∗∗∗∗p < 0.0001. ns, no significance; ssGSEA, single-sample gene set enrichment analysis; PCA, principal component analysis.
Details are in the caption following the image
Differential immune cell abundance and correlation with gene expression in subtypes C1 and C2. (A) Heatmap illustrating significant differences in immune cell abundance between subtypes C1 and C2, based on ssGSEA immune infiltration analysis. (B) PCA plot visualizing the distinctions between the two subgroups. (C) Comparison of 28 immune cell subtypes between C1 and C2, with C1 displaying higher infiltration of various T cells, NK cells, and immature DC cells, while C2 is enriched with innate immune cells. (D) Correlation between immune cell abundance from ssGSEA and gene expression, revealing a strong positive correlation between γδT cell abundance and the expression of HBEGF and CD4 in the C1 subtype, and a strong negative correlation with CTLA4, EDN1, HSPA1A, and IL18R1 in the C2 subtype, suggesting the potential role of γδT cells in distinguishing the subtypes and their involvement in differential immune responses. p < 0.05, ∗∗p < 0.01, ∗∗∗p < 0.001, and ∗∗∗∗p < 0.0001. ns, no significance; ssGSEA, single-sample gene set enrichment analysis; PCA, principal component analysis.

We further compared the differences in 28 immune cell subtypes between the two subtypes. The C1 subtype exhibited a higher infiltration rate of T cells, including γδT cells, T follicular helper (Tfh) cells, Th2 cells, central memory CD8 T cells, effector memory CD4 T cells, and activated CD8 T cells, as well as NK cells and immature DC cells, compared to the C2 subtype. In contrast, the C2 subtype was significantly enriched with innate immune cells such as macrophages, eosinophils, monocytes, activated B cells, mast cells, neutrophils, memory B cells, and activated DC cells (Figure 9C).

These findings imply that the immune response in the C1 subtype is more active, with a stronger adaptive immune response, particularly involving T cells and NK cells. Conversely, the C2 subtype is associated with the activation of innate immune cells. These observations may be linked to distinct asthma pathogenesis mechanisms in the two subtypes. The increased T cell infiltration in the C1 subtype might be associated with Th2 cell activation, a well-established driver of asthma pathogenesis. On the other hand, the activation of innate immune cells in the C2 subtype could be related to environmental factors such as air pollution and allergens [34]. Further investigation is necessary to elucidate these mechanisms in greater detail.

Subsequently, we explored the correlation between immune cell abundance obtained from ssGSEA and gene expression (Figure 9D). These findings underscore the strong association between the activity of these seven genes and various immune cells, shedding light on the distinct immune landscapes of the two subtypes. This key aspect of our results section highlights the potential of these immune cell signatures in differentiating the C1 and C2 subtypes, as well as the importance of further investigating the role of these genes and immune cells in asthma pathogenesis.

3.7. Function and Pathway Enrichment in Immune-Related Subtypes and Therapeutic Target Prediction

To assess disparities in gene expression patterns between the two immune-related subtypes of severe asthma (C1 and C2), we performed a differential expression analysis on gene expression matrices of C1 and C2 compared to healthy controls. We identified 189 upregulated and 306 downregulated DEGs in the C1 subtype and 1017 upregulated and 2428 downregulated DEGs in the C2 subtype compared to healthy controls. These findings suggest distinct gene expression profiles in each subtype of severe asthma compared to healthy individuals.

Further analysis of these genes showed that the C1 subtype’s genes are mainly involved in pathways linked to immune responses, such as cytokine interactions, chemokine signaling, and other immune-related activities (Figure 10A,C). In contrast, the genes associated with the C2 subtype are involved in processes related to metabolism, specifically in cellular respiration, mitochondrial function, and protein synthesis (Figure 10B,D). From these findings, we categorized the C1 and C2 subtypes as immune-active subtype (IAS) and metabolic-associated subtype (MAS), respectively. This categorization helps underline the distinct molecular and immune characteristics of each subtype, which could be pivotal in developing customized treatments for severe asthma.

Details are in the caption following the image
Differential gene expression and enrichment analysis of severe asthma subtypes C1 (IAS) and C2 (MAS). (A and C) Enrichment analysis of C1 subtype DEGs, revealing significant involvement in cytokine–cytokine receptor interaction, chemokine signaling pathway, and GO terms related to cytokine-mediated signaling pathways, leukocyte cell–cell adhesion, and immune receptor activity. (B and D) Enrichment analysis of C2 subtype DEGs, demonstrating a strong association with cellular and aerobic respiration in BP, mitochondrial inner membrane, and protein-containing complex in CC, and structural constituents of ribosome and electron transfer activity in MF. BP, biological process; CC, cellular component; DEGs, differentially expressed genes; GO, Gene Ontology; IAS, immune-active subtype; KEGG, Kyoto Encyclopedia of Genes and Genomes; MAS, metabolic-associated subtype; MF: molecular function; PPAR, peroxisome proliferator-activated receptor.
Details are in the caption following the image
Differential gene expression and enrichment analysis of severe asthma subtypes C1 (IAS) and C2 (MAS). (A and C) Enrichment analysis of C1 subtype DEGs, revealing significant involvement in cytokine–cytokine receptor interaction, chemokine signaling pathway, and GO terms related to cytokine-mediated signaling pathways, leukocyte cell–cell adhesion, and immune receptor activity. (B and D) Enrichment analysis of C2 subtype DEGs, demonstrating a strong association with cellular and aerobic respiration in BP, mitochondrial inner membrane, and protein-containing complex in CC, and structural constituents of ribosome and electron transfer activity in MF. BP, biological process; CC, cellular component; DEGs, differentially expressed genes; GO, Gene Ontology; IAS, immune-active subtype; KEGG, Kyoto Encyclopedia of Genes and Genomes; MAS, metabolic-associated subtype; MF: molecular function; PPAR, peroxisome proliferator-activated receptor.
Details are in the caption following the image
Differential gene expression and enrichment analysis of severe asthma subtypes C1 (IAS) and C2 (MAS). (A and C) Enrichment analysis of C1 subtype DEGs, revealing significant involvement in cytokine–cytokine receptor interaction, chemokine signaling pathway, and GO terms related to cytokine-mediated signaling pathways, leukocyte cell–cell adhesion, and immune receptor activity. (B and D) Enrichment analysis of C2 subtype DEGs, demonstrating a strong association with cellular and aerobic respiration in BP, mitochondrial inner membrane, and protein-containing complex in CC, and structural constituents of ribosome and electron transfer activity in MF. BP, biological process; CC, cellular component; DEGs, differentially expressed genes; GO, Gene Ontology; IAS, immune-active subtype; KEGG, Kyoto Encyclopedia of Genes and Genomes; MAS, metabolic-associated subtype; MF: molecular function; PPAR, peroxisome proliferator-activated receptor.
Details are in the caption following the image
Differential gene expression and enrichment analysis of severe asthma subtypes C1 (IAS) and C2 (MAS). (A and C) Enrichment analysis of C1 subtype DEGs, revealing significant involvement in cytokine–cytokine receptor interaction, chemokine signaling pathway, and GO terms related to cytokine-mediated signaling pathways, leukocyte cell–cell adhesion, and immune receptor activity. (B and D) Enrichment analysis of C2 subtype DEGs, demonstrating a strong association with cellular and aerobic respiration in BP, mitochondrial inner membrane, and protein-containing complex in CC, and structural constituents of ribosome and electron transfer activity in MF. BP, biological process; CC, cellular component; DEGs, differentially expressed genes; GO, Gene Ontology; IAS, immune-active subtype; KEGG, Kyoto Encyclopedia of Genes and Genomes; MAS, metabolic-associated subtype; MF: molecular function; PPAR, peroxisome proliferator-activated receptor.

We conducted potential drug target predictions for the top 150 upregulated and downregulated genes in the C1 and C2 subtypes compared to healthy controls using the CMap database. For the IAS subtype, the top potential targets include fatty acid desaturase 3 (FADS3), leucine aminopeptidase 3 (LAP3), TEA domain transcription factor 4 (TEAD4), ribosomal protein S6 kinase B2 (RPS6KB2), and sterol-C5-desaturase-like (SC5DL) (Supporting Information 3: Figure S2A, Supporting Information 4: Table S3). Meanwhile, the top targets for the MAS subtype are FADS3, intercellular adhesion molecule 3 (ICAM3), tumor necrosis factor receptor superfamily member 1A (TNFRSL1A), and transformer-2 beta (TRA2B) (Supporting Information 3: Figure S2B, Supporting Information 5: Table S4). These suggested targets could guide the development of tailored therapies, offering a step forward in personalized medicine for severe asthma.

4. Discussion

Asthma is recognized as a complex disorder with a spectrum of phenotypes beyond the basic allergic and nonallergic categories, requiring detailed study of its various subtypes and the distinct triggers, treatment responses, and pathophysiology associated with each. This nuanced view acknowledges that asthma is not a single condition but rather a constellation of related disorders, each potentially requiring a specialized approach for effective management and therapeutic intervention. Particularly challenging is severe asthma, which involves persistent airway obstruction due to airway-wall remodeling and mucus buildup, highlighting the urgent need for more effective treatments. As we tread further into an era of medical advancement, a myriad of novel treatment methodologies have emerged [19, 35]. However, a one-size-fits-all approach is futile. Alarmingly, several treatments, though promising on paper, fall short in efficacy when applied to the real-world heterogeneous patient population. Such observations underline the pressing need for personalized treatment regimens, tailored to the unique genetic and phenotypic profiles of individual patients. Utilizing machine learning and advanced techniques is crucial for developing targeted treatments and gaining a deeper understanding of asthma, especially through examining gene functions and their impact on the disease’s subtypes.

In the evolving field of machine learning for disease research, significant progress has been made. Liu et al. [36] showcased how various analytical methods can create effective predictive models. Xiao et al. [37] made strides with their use of convolutional neural networks (CNNs) in the Huatuo framework, a breakthrough in understanding gene regulation at the cellular level. Machine learning has notably transformed asthma research. An analysis using the National Health and Nutrition Examination Survey (NHANES) database found key links between certain factors and pediatric asthma, employing logistic and LASSO regression. This research identified a complex relationship between ethnicity, vitamin B12, sodium, vitamin K, and asthma, providing insights into how nutrition and demographics influence the disease [38]. Furthermore, the discovery of pivotal genes like matrix metalloproteinase 12 (MMP12), associated with eosinophilic markers, highlights how proteomic analysis can aid asthma diagnosis [39]. However, research has not extensively moved beyond pediatric asthma to include broad datasets and animal model validation. There is a noticeable absence of studies that merge various machine learning methods to pinpoint asthma-related features. This gap presents an opportunity for groundbreaking research, particularly in personalized medicine, where these methods could significantly tailor interventions to individual needs.

Our study primarily utilized the advanced algorithms of WGCNA alongside ssGSEA to uncover genes related to immune response in severe asthma. Initial findings showed significant differences in immune cell levels between patients with severe asthma and healthy individuals. By integrating WGCNA with machine learning techniques, we utilized three established algorithms—XGBoost, LASSO, and random forest—to efficiently select features and predict severe asthma outcomes. This approach successfully identified seven key genes involved in asthma, validated through various datasets and experimental models. These genes play pivotal roles in the development of asthma by influencing inflammation, airway remodeling, and immune response. HBEGF, part of the epidermal growth factor family, is notable for its increased expression in the epithelium of asthmatic airways, promoting inflammation and sensitivity [4042]. IGF1R, a gene for a receptor protein in membranes, is involved in asthma development through its elevated expression in asthmatic airway smooth muscle, leading to muscle growth and heightened sensitivity [43, 44]. CTLA4, which encodes the cytotoxic T-lymphocyte antigen-4, influences the immune response; its gene variants show diverse links to asthma risk, severity, and treatment outcomes [45]. HSPA1A, a member of the heat shock protein 70 family, contributes to asthma by enhancing airway inflammation via increased production of pro-inflammatory cytokines and chemokines [46, 47]. CD4, which encodes a receptor on helper T cells, plays a role in asthma through airway inflammation and sensitivity, with CD4+ T cells releasing pro-inflammatory cytokines when exposed to allergens [48, 49]. EDN1, responsible for encoding endothelin-1, a potent vasoconstrictor, is involved in asthma-associated airway inflammation, tightening, and structural changes. Inhibiting its signaling has been shown to reduce sensitivity and inflammation in animal studies [50, 51]. IL18R1, coding for the IL-18 cytokine receptor, is associated with inflammation and changes in asthmatic airways [5255]. Its role is further evidenced by the detection of increased IL18R1 protein levels in lung tissue and the activation of downstream nuclear factor kappa-light-chain-enhancer of activated B cells (NF-κB) and activator protein 1 (AP-1) pathways, emphasizing the significance of IL-18 signaling in severe asthma [53]. Building on this, our signature, refined by three machine learning methods, demonstrated improved prediction capabilities. Additional validation, based on qRT-PCR results from the HDM asthma animal model, confirmed the unique expression of these genes in asthma’s pathogenesis. This not only reinforced our initial findings but also highlighted the significance of these genes in understanding asthma more broadly.

Leveraging our genomic insights, we identified two distinct immune-related severe asthma subtypes, offering new perspectives on the disease’s complexity. We named these subtypes IAS and MAS for C1 and C2, respectively. IAS features a strong presence of adaptive immune cells, such as various T cell types, highlighting an active inflammatory response typical of asthma [56]. In contrast, MAS is characterized by a significant number of innate immune cells, suggesting a reaction to environmental factors like allergens or pollutants [57]. Our detailed analysis of the immune-related DEGs further distinguishes these subtypes. IAS is linked with immune cell communication and activation, indicating vigorous immune recruitment. MAS, however, is associated with metabolic functions, hinting at an increased energy demand possibly due to immune responses or changes in airway structure [5860]. This metabolic emphasis could influence the body’s cellular energy balance and mitochondrial health, both vital in asthma’s development. In summary, IAS is associated with immune response activities, while MAS focuses on metabolic processes. Understanding these differences can guide us in creating more personalized asthma treatments, aiming at the specific pathways of each subtype. However, this is pioneering work, and further research is essential to uncover the detailed roles these pathways play in asthma.

Our groundbreaking findings align with the broader research trends in asthma pathophysiology and classification. The understanding of asthma has significantly evolved, transitioning from traditional categories to more detailed classifications like phenotypes or endotypes [61, 62]. This shift highlights the advancement towards precision medicine, where treatment is customized based on detailed evaluations of each patient’s unique characteristics, aiming to enhance care. In asthma classification, a key distinction exists between allergic and eosinophilic asthma [63]. Both are linked to T2 inflammation and high eosinophil levels but were traditionally seen as separate [64]. However, treatments targeting T2 inflammation have transformed the management of these asthma types, offering better symptom control for patients with severe cases [14, 65, 66]. Despite this, current guidelines still recommend different treatments for each type [2, 20], likely due to long-standing views influenced by clinical trial categories. Recognizing the overlap between these classifications is crucial for reevaluating our approach [21, 67, 68].

Our study introduces a detailed examination of two immune subtypes in asthma, IAS and MAS, offering fresh insights. We explore the complex nature of asthma by combining immune and metabolic data, contributing new knowledge, and suggesting directions for further research. Identifying these subtypes helps refine treatment strategies for severe asthma, enabling personalized care that aligns with each patient’s specific immune patterns. This could also help predict how well patients will respond to certain treatments, with IAS patients potentially benefiting from therapies targeting Th2-driven inflammation, while MAS patients might better respond to treatments focusing on innate immune responses.

The signature of seven immune-related genes offers a promising tool for asthma study, potentially revolutionizing diagnosis and treatment. Easily replicated using a simple PCR-based assay, this signature stands out for its clinical utility. While retrospective analyses have initiated this discovery, future research should focus on prospective, multicenter studies to confirm and expand upon these findings. Despite limitations in the existing public datasets, the identified genes show potential links to critical outcomes. Further investigation into these genes is essential to fully understand their roles and impacts on asthma management and patient care, potentially leading to significant advancements in the field.

In summary, our research provides a refined perspective for understanding and treating asthma, having identified distinct immune subtypes that pave the way for personalized medicine. This breakthrough sets a promising direction for future asthma research, promising to significantly impact ongoing studies and therapeutic approaches. By illuminating the complex nature of asthma, our work underscores the potential for developing interventions that are precisely tailored to individual patient needs, potentially transforming the landscape of asthma care.

Ethics Statement

The animal study was reviewed and approved by Ruijin Hospital Animal Ethics Committee.

Disclosure

The authors acknowledge that an earlier version of this manuscript has been presented as a preprint on BioRxiv (available at: https://www.biorxiv.org/content/10.1101/2024.04.15.589644).

Conflicts of Interest

The authors declare no conflicts of interest.

Author Contributions

Yue Hu, Yating Lin, and Bo Peng performed the majority of the analyses, executed the experiments, compiled the data, and contributed equally to this work. Wei Tang and Yue Hu conceived and orchestrated the study. Yue Hu composed the initial manuscript. Bo Peng and Yue Hu carried out the external validation. Yue Hu and Chunyan Xiang devised the flowchart. Wei Tang scrutinized and refined the manuscript. All authors contributed to the article and endorsed the submitted version.

Funding

This work was supported by NSFC, 8217010218 subaward.

Acknowledgments

The authors would like to thank Desheng Chen and Shuang Tian for their valuable help on the statistical advice, Dr. Guo for his support and assistance, and Huacheng Yan for his help.

    Supporting Information

    Additional supporting information can be found online in the Supporting Information section.

    Data Availability Statement

    The datasets analyzed during the current study are available in the Gene Expression Omnibus (GEO, https://www.ncbi.nlm.nih.gov/geo/). Further inquiries can be directed to the corresponding authors.

      The full text of this article hosted at iucr.org is unavailable due to technical difficulties.