Volume 2025, Issue 1 1996661
Research Article
Open Access

Identifying Risk Factors for Poor Efficacy of Adalimumab Treatment in Patients With Crohn’s Disease: Insights From Machine Learning Models

Xiaojun Li

Xiaojun Li

Department of Gastroenterology , The Second Xiangya Hospital of Central South University , Changsha , Hunan, China , csu.edu.cn

Research Center of Digestive Disease , Central South University , Changsha , Hunan, China , csu.edu.cn

Search for more papers by this author
Maomao Tang

Maomao Tang

Department of Gastroenterology , The Second Xiangya Hospital of Central South University , Changsha , Hunan, China , csu.edu.cn

Research Center of Digestive Disease , Central South University , Changsha , Hunan, China , csu.edu.cn

Department of Gastroenterology , Changsha County People’s Hospital (Hunan Provincial People’s Hospital Xingsha Campus) , Changsha , Hunan, China

Search for more papers by this author
Jie Zhang

Jie Zhang

Department of Gastroenterology , The Second Xiangya Hospital of Central South University , Changsha , Hunan, China , csu.edu.cn

Research Center of Digestive Disease , Central South University , Changsha , Hunan, China , csu.edu.cn

Search for more papers by this author
Yongjun Wang

Yongjun Wang

Department of Gastroenterology , The Second Xiangya Hospital of Central South University , Changsha , Hunan, China , csu.edu.cn

Research Center of Digestive Disease , Central South University , Changsha , Hunan, China , csu.edu.cn

Search for more papers by this author
Chunlian Wang

Chunlian Wang

Department of Gastroenterology , The Second Xiangya Hospital of Central South University , Changsha , Hunan, China , csu.edu.cn

Research Center of Digestive Disease , Central South University , Changsha , Hunan, China , csu.edu.cn

Search for more papers by this author
Chunhui Ouyang

Corresponding Author

Chunhui Ouyang

Department of Gastroenterology , The Second Xiangya Hospital of Central South University , Changsha , Hunan, China , csu.edu.cn

Research Center of Digestive Disease , Central South University , Changsha , Hunan, China , csu.edu.cn

Department of Gastroenterology , Guilin Hospital of the Second Xiangya Hospital Central South University , Guilin , Guangxi, China

Search for more papers by this author
First published: 12 June 2025
Academic Editor: Yung-Tsu Cho

Abstract

Aim: Adalimumab (ADA) is an effective treatment for Crohn’s disease (CD); however, some patients still experience adverse reactions and nonresponse. This study aimed to explore the risk factors associated with ADA poor efficacy through machine learning algorithms, which provide promising guidance for the management of ADA in clinical practice.

Methods: This single-center investigation included 114 CD patients treated with ADA in the Department of Gastroenterology from January 2020 to January 2023. Risk factors associated with each poor efficacy event were explored using logistic regression and machine learning algorithms. Shapley additive explanations (SHAP) and partial dependence plot methods were used to analyze the risk factors of each event.

Results: The results showed 8 of these patients experienced primary non-response (PNR), 35 patients developed secondary loss of response (LOR), and 27.2% (31/114) of patients experienced at least one adverse events (AEs). After comparing the fit of the models established by 10 algorithms, the risk factors associated with PNR, LOR, and AEs were analyzed using the logistic regression algorithm, KNN algorithm, and Extra Tree algorithm, respectively. The most important variables related to the PNR, LOR, and AEs events were the history of corticosteroid use, baseline CDAI, and uric acid, respectively.

Conclusions: This study confirmed the efficacy of ADA for clinical practice in the Chinese CD population, and that patients with a history of corticosteroid use, high levels of disease activity, and high inflammatory state before ADA treatment were associated with increased risks of poor efficacy.

1. Introduction

Crohn’s disease (CD) is a life-long, relapsing, and chronic inflammatory disease that can involve the entire gastrointestinal tract [1]. The clinical presentation and severity of CD patients are diverse, and the treatments varying with different severities are currently a well-recognized strategy [2]. Since the introduction of biological agents into CD treatment in the 1990s, the patients have gradually achieved long-term and endoscopic remission. This kind of drug improves the effectiveness of traditional treatments such as corticosteroid and immunosuppressive agents and thus reaches the therapeutic goal of preventing intestinal complications and halting disease progression. Nowadays, biological agents are recommended as a first-line treatment option for patients with moderate to severe CD [3, 4]. In this situation, the use of tumor necrosis factor (TNF) inhibitors, one of the most prominent biological agents, has a more detailed and targeted treatment strategy. Adalimumab (ADA), a fully human anti-TNF monoclonal antibody, also has shown the ability to induce and maintain remission in global clinical trials of patients with CD [5, 6]. However, as a drug has developed more than 15 years, the medication strategy of ADA still has much to be improved.

ADA is one of the classic biological agents for the treatment of CD, which is recommended for the treatment of moderate-to-severe active CD, or as conversion therapy for patients with active CD who have a secondary loss of response (LOR) to infliximab (IFX) [3, 4, 7]. However, the drug effect of ADA varies among populations with different inflammatory statuses and medical histories. Similar to other biological agents, the common TNF inhibitor-related poor efficacy events are primary non-response (PNR), secondary LOR, and adverse events (AEs). Noteworthy, the poor efficacy of anti-TNFα therapy is of great clinical concern and may result in dose escalation, switching the anti-TNFα agent (or a drug with another mode of action), or intestinal surgery [8, 9]. It has not yet been settled which factors influence drug efficacy. There were many studies of relevant risk factors for poor efficacy in patients with IFX including age, extraintestinal manifestations, female, and increased antibodies to IFX [911]. Nevertheless, there is still much to be explored in the research on risk factors associated with ADA’s poor efficacy, especially in Asian populations. Several studies have been done on ADA in China [12, 13], Japan [14, 15], and Korea [16], but these studies have mainly focused on the efficacy of ADA or compared that with the efficacy of IFX. There is a research gap in analyzing the factors related to the effects of ADA itself. Focusing back on ADA treatment, we need to pay more attention to the analysis of factors associated with ADA’s poor efficacy. Since the approval of ADA for the treatment of CD patients in China in 2020, more clinical experience in Asian patients can be summarized. Moreover, the experience will facilitate the establishment of more precise treatment strategies for individual CD patients.

Therefore, this study aims to determine the factors associated with PNR, LOR, and AEs of ADA. To do so, the clinical data of CD patients who were treated with ADA in our department between January 2020 and January 2023 were retrospectively analyzed. The relationship between factors and adverse outcomes was investigated using machine learning (ML), Shapley additive explanations (SHAP), and partial dependence plot (PDP) methods, which are expected to further guide the use of ADA in clinical practice.

2. Methods

2.1. Study Population

This cohort study included patients from the Department of Gastroenterology, Second Xiangya Hospital, Central South University. The case collection was conducted from January 2020 to January 2023. The observation deadline for each patient is January 2023 or 10 weeks after the termination of ADA treatment. The inclusion criteria for this study were patients (age ≥ 16 years) with a confirmed diagnosis of CD and treated with ADA. The diagnosis of CD was based on the diagnostic criteria of the Consensus on Diagnosis and Treatment of Inflammatory Bowel Disease (Beijing, 2018) [17]. Patients included meet the guideline’s recommended indications for ADA use in China [18] and evaluated by experienced gastroenterologists. Patients were treated with ADA 160 mg at week 0, 80 mg at week 2, and the dose was reduced to 40 mg at week 4 to reach the maintenance dose when the patients achieved remission. Patients with malignancy, chronic, or severe underlying diseases were excluded.

2.2. Data Collection

All data were obtained from medical records from the institution. According to expert advice and literature review, variables and data were collected, including demographic, clinical, laboratory, and imaging data. Demographic data collected at the initiation of ADA treatment included sex, age, body mass index (BMI), and smoking history. Clinical signs and symptoms included the duration of disease, location of the lesion, disease behavior, perianal lesions, extraintestinal manifestations, fistulae, medication history, abdominal surgery history, and Crohn’s disease activity index (CDAI). Laboratory data included blood routine examination, C-reactive protein (CRP), and albumin. Fecal calprotectin was excluded due to the presence of more than 70% missing data, resulting from the shortage of test reagents caused by the COVID-19 pandemic. Imaging data included gastrointestinal endoscopy, computer tomography enterography (CTE), and magnetic resonance enterography (MRE).

2.3. Evaluation of Disease

The initial location of CD lesions was determined by colonoscopy and CTE. The Montreal classification for CD was used to categorize disease phenotypes. The location of the lesion was designated as L1 (ileal), L2 (colonic), or L3 (ileocolonic) with L4 as a modifier designating concomitant upper tract disease. Disease behavior was categorized as follows: non-stricturing, nonpenetrating (B1), stricturing (B2), penetrating (B3), and with a P modifier to describe the concomitant perianal disease [19]. The clinical disease activity for CD patients was evaluated using the CDAI, with clinical remission defined as a CDAI < 150, mild disease activity defined as a 150 ≤ CDAI ≤ 220, and moderate to severe disease activity defined as CDAI > 220 [4, 20]. Mucosal inflammation was evaluated by Crohn’s disease endoscopic index of severity (CDEIS), endoscopic response defined as a decrease in the CDEIS score of more than 5 from the baseline of CDEIS, and complete endoscopic remission defined as CDEIS < 3 [21]. For patients with small bowel CD, CTE was examined and radiologic indicators of inflammation in the intestine have emerged as measurements of drug efficacy. For patients with anal fistula, the MRE and PDAI scores were used to evaluate the patient’s condition.

The clinical response is defined as the reduction of CDAI score by 70 points at week 12 [20, 22]. The PNR was defined as having no clinical benefit during the first 12 weeks since the initiation of therapy of ADA [22, 23]. LOR was defined as patients who initially responded to therapy with subsequent worsening of symptoms. LOR is assessed by a multidisciplinary team of experienced experts as a relapse or deterioration after an initial clinical response, based on the following approach: (1) increases in CDAI ≥ 50 from the minimum observed value, and inflammatory marker (CRP > 8.0 mg/L) [23, 24]; (2) indicated the need for therapy modification, including dose escalation, alternative biological agents, corticosteroids, combinations of immunosuppressive therapy, or CD-related surgery [22, 23].

AEs including anaphylaxis, injection-site reactions (rule out physical injury from injection), serious infections requiring hospitalization, psoriasiform skin lesions, neuropathy, and malignancies were assessed. Other drug reactions were excluded. For patients who discontinued ADA because of AEs, details of the AE leading to discontinuation were recorded. Except for those patients who continued commercially available ADA after the end of the study, patients were contacted 10 weeks after the last dose of the drug to assess any new or ongoing AEs.

2.4. Statistical Analysis

All analyses were implemented in SPSS software (version 18.0; SPSS, Inc.) and Python 3.5.2 (Python Software Foundation, Wilmington, DE, USA).

2.4.1. Data Analysis

Figure 1 shows the proposed framework, which summarizes the methods applied in the statistical analysis, model building, and interpretation. Continuous variables are expressed as mean with standard deviation or median and interquartile range [IQR] as needed. We utilized Student’s t-test to analyze data with normal distributions, nonparametric tests for data without normal distributions, and the Chi-square test to compare enumeration data. The statistical significance of the results was evaluated using a p value with a significance level of 0.05. Then, we trained classification models to assess the patient’s risk of LOR and AEs.

Details are in the caption following the image
Study flowchart. This figure displays the participant flowchart and the procedure for data processing and analysis. Abbreviations: ADA: adalimumab; AEs: adverse events; PNR: primary nonresponse; LOR: secondary loss of response.

2.4.2. Data Preparation and Processing

We preprocessed the dataset using the baseline data and a part of the post-induction treatment data. Before model building, data preparation and data preprocessing were performed before the application of ML models.

During the data preparation process, we used stratified random sampling to split 75% of the dataset for training and 25% for testing, which provides an unbiased sense of model effectiveness. In the whole dataset, variables with more than 30% missing data were indeed excluded. For other variables with missing data, we applied the Multiple Imputation by Chained Equations (MICE) algorithm to supplement the missing data in the training set and testing set, respectively. Excluding all cases with missing values could potentially introduce bias to the results. MICE is a practical approach to generating imputations based on a set of imputation models, one for each variable with missing values [25]. MICE’s analysis of multiple imputed data during the process of filling in missing values takes into account the uncertainty in the imputations and yield accurate standard errors. Consequently, this approach allows for minimal impact on overall data results when filling in missing data. A total of 9 values (10.46%) were imputated in the training set and 3 (10.34%) were imputated in the test set.

As the number of patients with PNR and LOR was lower compared to the control group, there is an obvious imbalance in the data between groups. In order to avoid overfitting and address the issue of data imbalance, we employed the synthetic minority oversampling technique (SMOTE) during the model training process, which has also been used in our previous studies [26]. To achieve a desired ratio between the majority and minority classes, SMOTE creates synthetic instances by interpolating m instances of the minority class that are sufficiently close to each other (where m is a given integer value) [27]. This approach addresses the effects of model bias due to data imbalance while having no impact on the real test set data.

To prevent excessive interference with the original data’s variable relationships during the SMOTE process, we compared different upsampling ratios and decided to upsample the LOR group data to 80% of the control group in the LOR analysis, adding 10 items to the LOR data, and upsample the AE group data to 50% of the control group in the AE analysis, adding 8 items to the AEs data, to avoid overfitting the model. All SMOTE steps were performed on the training set only.

2.4.3. Building Models for Evaluating the Factors

We included the 10 algorithms to build a classification model, aimed to simulate the data for the prediction of relevant factors, which contains the following algorithms: logistic regression (LR), Ridge Regression (RR), Random Forests (RF), Classification and Regression Tree (CART), Extremely Randomized Trees (Extra Tree), Gradient Boosting Decision Tree (GBDT), Extreme Gradient Boosting (XGBoost), Support Vector Machines (SVM), Multi-Layer Perceptron (MLP), and K-Nearest Neighbor (KNN). These 10 algorithms are widely applicable in the domain of clinical data analysis and are applicable to different data types. Among them, RF, CART, Extra Tree, GBDT, and XGBoost are decision tree models. The SVM is a discriminative algorithm, which directly correlates relationships between data. MLP is a class of feedforward artificial neural network. KNN is a type of instance-based learning, where all computations are deferred until prediction time. LR and RR are well-established conventional statistical methods that will serve as a baseline for comparison with other ML algorithms.

Similar to our previous study, we employed the recursive feature elimination (RFE) algorithm to rank the variables based on their importance on the training set data [26, 28]. All algorithms use the default parameters. Following variable reduction based on importance, the remaining variables were introduced into the corresponding ML algorithm to select the best set of variables for modeling. To accomplish this, we used k-fold cross-validation, with k = 10, to train and cross-validate our model, and identified the highest area under the receiver-operating characteristic curve (AUC) variable combination as the final result. Next, the automatic tuning of hyperparameters was performed in the model using the scikit-learn GridSearchCV method (scikit-learn GridSearchCV).

The models were evaluated based on multiple metrics, including accuracy, precision (positive predictive value (PPV)), recall (sensitivity), F1-score, Brier score, and AUC. The AUC was used to measure discrimination, while calibration was evaluated using the Brier score. By comparing the performance of the models on the test dataset, we identified the best-performing model in terms of prediction.

The scikit-learn package library (version 0.22.2) was used for all automatic tuning of hyperparameters, RFE, and the models except for the XGBoost model, which was created using the xgboost package library (version 1.1.1).

2.4.4. Model Interpretation

The best-performing model in the test set, as determined by the highest AUC, was selected for further analysis. Additionally, the top-performing models in the training set were included in the analysis for evaluation purposes. While it is possible to observe the variables that have a significant impact on the model, determining the relationship between the variables and the results can be challenging. Hence, we employed the SHAP method for further interpretation. SHAP is a widely adopted method for the interpretation of nonlinear black box models [29, 30]. The SHAP algorithm incorporates all the original data in the model into the analysis, and the SHAP values were calculated and presented using the SHAP Python package (version 0.29.1).

PDPs were constructed in the important SHAP value variables to show how individual predictor variables can affect the probability of certain results while controlling for all of the other predictor variables. The predictive value of each variable was assessed by evaluating accuracy, AUC, PPV, and negative predictive value (NPV).

In addition, because the PNR group data size was so small that using ML algorithms for modeling would have resulted in over-fitting, and maybe fail to uncover important variables associated with PNR occurrence. Therefore, LR algorithms were used to analyze variables that were statistically different (p < 0.05) within the PNR data, combined with PDP plots and predictive value to briefly explain the correlation between the different variables and PNR occurrence.

2.5. Ethical Issues

We assert that all procedures contributing to this work comply with the ethical standards of the relevant national and institutional committees on human experimentation and with the Helsinki Declaration of 1975, as revised in 2008. The research was approved by the Ethics Committee of the Second Xiangya Hospital of Central South University (NO. LYF2021168). This study is a retrospective study using medical record review, and the data are anonymous; therefore, informed consent was waived. The Ethics Committee of the Second Xiangya Hospital of Central South University has waived informed consent for this study.

3. Results

3.1. Characteristics of CD Patients Included in This Study

A total of 114 patients were included in this study, of which 3 patients discontinued the induction therapy due to AEs and 111 patients completed primary induction therapy. Out of the total study population, 55 patients (48.2%) had a CDAI score below 220. Among these 55 patients, 15 had anal fistula, 19 switched from IFX treatment, 9 switched from corticosteroids after having undergone anal fistula surgery prior to their CD diagnosis, 5 switched from corticosteroids with small bowel lesions only, 4 had a CDEIS score greater than 10 and were assessed by gastroenterologists as having moderate CD, 2 had B3 behavior, and 1 had a history of treatment with Etanercept for ankylosing spondylitis. Six patients discontinued maintenance therapy for personal reasons. Sixty-two patients remained on maintenance therapy at the end of the follow-up (Figure 1). As shown in Table 1, the median (interquartile) age of the included patients was 25.5 [20, 33], and 75.4% (86/114) of them were male. The median follow-up of the cohort was 22 months [14, 26], and the median illness duration of patients was 2.0 years [0.5, 5]. Sixty-eight patients (59.6%) had CD-related complications such as abdominal fistula, abdominal abscess, intestinal obstruction, perianal fistula, and so on. Thirty-one patients (27.2%) were treated with IFX for their CD, prior to ADA therapy. According to the Montreal classification, the study population had the largest number of patients with types A2 (83/114), L3 (76/114), and B1 (45/114); meanwhile, 7 patients with small bowel lesions and 28 patients with perianal fistulizing behavior were included in this study. The overall mean CDAI at the baseline of the study population was 194.5 ± 92.2, and the median CDEIS was 5.5 [1.6, 10.1].

Table 1. Characteristics of Crohn’s disease patients included in the study.
Baseline characteristics
Total, n 114
Male gender 86 (75.4%)
Age (years) 25.5 [20, 33]
Age of onset (years) 23 [17, 29]
Illness duration (years) 2.0 [0.5, 5.0]
Height (cm) 170.0 [162.0, 175.0]
Weight (kg) 55.3 ± 11.5
BMI (kg/m2) 19.0 [17.3, 21.5]
Smoking history, yes 25 (21.9%)
Previous surgery, yes 51 (44.7%)
Extraintestinal manifestations 28 (24.6%)
Complications, yes 68 (59.6%)
Previous treatments
 5-Aminosalicylates no. 2 (1.8%)
 Corticosteroids no. 24 (21.1%)
 Immunomodulators no. 41 (36%)
 Infliximab no. 31 (27.2%)
 Vedolizumab no. 1 (0.9%)
 Other biological 1 (0.9%)
Baseline laboratory test
 Leukocytes (109/L) 6.5 ± 2.1
 Hemoglobin (g/L) 125.1 ± 23.4
 Platelets (109/L) 306 [234.5, 391.5]
 Hematocrit (%) 39.2 [35.6, 44.1]
 C-reactive protein (mg/L) 13.5 [4.2, 27.4]
 ESR (mm/h) 17.0 [7.5, 39.5]
 Albumin (g/L) 37.9 ± 6.9
 Creatinine (μmol/L) 65.0 [53.8,75.3]
 Uric acid (μmol/L) 372.8 ± 93.7
Montreal (age of onset), no.
 A1 24 (21.1%)
 A2 83 (72.8%)
 A3 7 (6.1%)
Montreal (location), no.
 L1 19 (16.7%)
 L2 7 (6.1%)
 L3 76 (66.7%)
 L4 0 (0.0%)
 L1 + L2 9 (7.9%)
 L3 + L4 3 (2.6%)
Montreal (behavior), no.
 B1 45 (39.5%)
 B2 30 (26.3%)
 B3 6 (5.3%)
 B2 + B3 5 (4.4%)
 B1 + P 21 (18.4%)
 B2 + P 6 (5.3%)
 B3 + P 1 (0.9%)
 Baseline CDAI 194.5 ± 92.2
Baseline CDEIS 5.5 [1.6,10.1]
Time of follow-up (months) 22 [14, 26]

3.2. Response to Induction Therapy

A total of 111 patients completed primary induction therapy. Eight of these patients (7.2%) showed a PNR in this cohort. Comparing the variables of patients in the PNR and primary response groups, statistical differences were found between the two groups in terms of the history of corticosteroid use, baseline platelets, CRP level, and perianal fistulizing behavior (Table 2). Moreover, patients in the two groups did not differ statistically significantly in terms of illness duration, previous use of IFX, whether ADA was their first-line treatment, age at onset, and location. Some of the patients in the cohort were treated with a combination of immunosuppressive therapy by experienced physicians, depending on their individual conditions. Notably, all patients with colon lesions only and those treated with a combination of immunosuppressive therapy completed the induction therapy without PNR; however, no significant differences in these variables were observed between the PNR and primary response groups with the size of the enrolled population.

Table 2. Univariate analyses for variables in response to adalimumab induction therapy (clinical response vs. primary nonresponse).
Variable Response (n = 103) PNR (n = 8) p value
Male gender 79 (76.7%) 5 (62.5%) 0.367
Age (year) 27.1 ± 9.6 29.5 ± 10.2 0.498
Age of onset (year) 23 [17, 28] 23.5 [18.5, 29.8] 0.732
Illness duration (years) 2.0 [0.5, 5.0] 3.5 [1.3, 9.8] 0.215
Height (cm) 168.7 ± 8.3 165.0 ± 12.2 0.420
Weight (kg) 55.1 ± 10.9 55.7 ± 19.5 0.896
BMI (kg/m2) 19.0 [17.2, 21.5] 19.0 [17.1, 21.7] 0.950
Smoking history 23 (22.3%) 2 (25%) 0.862
Previous surgery 47 (45.6%) 3 (37.5%) 0.656
Extraintestinal manifestations 26 (25.2%) 1 (12.5%) 0.418
Complications 62 (60.2%) 5 (62.5%) 0.898
Previous treatments
 Corticosteroids no. 19 (18.4%) 4 (50.0%) 0.034
 Immunomodulators no. 36 (35.0%) 5 (62.5%) 0.120
 Infliximab no. 28 (27.2%) 3 (37.5%) 0.531
 Vedolizumab no. 0 (0%) 1 (12.5%) 0.001
Baseline laboratory test
 Leukocytes (109/L) 6.2 ± 2.1 7.7 ± 3.1 0.061
 Hemoglobin (g/L) 126.2 ± 23.4 117.3 ± 23.6 0.305
Platelets (109/L) 305.0 ± 120.9 507.5 ± 252.2 0.001
 Hematocrit (%) 42.7 ± 38.2 42.0 ± 21.9 0.807
C-reactive protein (mg/L) 13.5 [4.1, 24.3] 46.5 [11.4, 77.7] 0.047
 ESR (mm/h) 23.5 ± 22.0 38.0 ± 26.0 0.104
 Albumin (g/L) 38.0 ± 6.7 35.3 ± 10.2 0.335
 Creatinine (μmol/L) 63.9 ± 19.1 59.5 ± 14.9 0.616
 Uric acid (μmol/L) 329.1 ± 93.2 292.7 ± 110.9 0.404
Montreal (age of onset), no. 0.626
 A1 23 (22.3%) 1 (12.5%)
 A2 75 (72.8%) 7 (87.5%)
 A3 5 (4.9%) 0 (0%)
Montreal (location), no. 0.894
 L1 18 (17.5%) 1 (12.5%)
 L2 6 (5.8%) 0 (0%)
 L3 68 (66.0%) 6 (75%)
 L4 0 (0%) 0 (0%)
 L1 + L2 8 (7.8%) 1 (12.5%)
 L3 + L4 3 (2.9%) 0 (0%)
Montreal (behavior), no. 0.140
 B1 41 (39.8%) 2 (25%)
 B2 28 (27.2%) 1 (12.5%)
 B3 6 (5.8%) 0 (0%)
 B2 + B3 5 (4.9%) 0 (0%)
 B1 + P 18 (17.5%) 3 (37.5%)
 B2 + P 4 (3.9%) 2 (25%)
 B3 + P 1 (1.0%) 0 (0%)
Perianal fistulizing 23 (22.3%) 5 (62.5%) 0.024
ADA as first treatment 75 (72.8%) 5 (62.5%) 0.531
Baseline CDAI 191.1 ± 93.7 226.4 ± 85.9 0.305
Baseline CDEIS 5.9 [1.6, 10.1] 7.2 [2.2, 17.6] 0.299
Concomitant medication no. 17 (16.5%) 0 (0%) 0.212
Adverse events no. 26 (25.2%) 2 (25.0%) 0.988
  • represents p < 0.05.

3.3. Maintenance Therapy and Secondary Loss of Response

After induction therapy, a total of 103 patients were admitted to maintenance treatment. Six patients were found to have stopped treatment for personal reasons, mainly due to self-perception of significant symptom reduction. As of January 2023, 62 patients remained on ADA maintenance therapy, and 35 (36.1%) patients discontinued treatment due to LOR. As shown in Table 3, the median time on ADA medication for patients with LOR was 7.0 months [6.0, 12.0], and the median time to sustained response for the follow-up population was 25.0 months [22.25, 30.75], which were statistically different.

Table 3. Univariate analyses for variables in response to adalimumab maintenance therapy (ongoing response vs. secondary loss of response).
Variable Ongoing response (n = 62) LOR (n = 35) p value
Male gender 47 (75.8%) 26 (74.3%) 0.868
Age (year) 27.4 ± 10.2 27.3 ± 9.1 0.967
Age of onset (year) 22 [17, 28.8] 24 [17, 29] 0.877
Illness duration (years) 2.0 [0.5, 6.0] 2.0 [0.6, 5.0] 0.571
Height (cm) 168.6 ± 8.8 168.3 ± 8.0 0.890
Weight (kg) 54.9 ± 11.4 55.2 ± 10.6 0.891
BMI (kg/m2) 19.2 [17.1, 21.2] 19.1 [17.6, 21.7] 0.804
Smoking history 16 (25.8%) 4 (11.4%) 0.093
Previous surgery 35 (56.5%) 9 (25.7%) 0.003
Extraintestinal manifestations 17 (27.4%) 9 (25.7%) 0.856
Complications 38 (61.3%) 21 (60.0%) 0.901
Previous treatments
 Corticosteroids no. 10 (16.1%) 9 (25.7%) 0.292
 Immunomodulators no. 19 (30.6%) 16 (45.7%) 0.187
 Infliximab no. 12 (19.4%) 15 (42.9%) 0.018
 Vedolizumab no. 0 (0%) 0 (0%)
Baseline laboratory test
 Leukocytes (109/L) 6.1 ± 2.0 6.5 ± 1.8 0.327
 Hemoglobin (g/L) 127.4 ± 22.7 124.2 ± 24.6 0.547
 Platelets (109/L) 310.4 ± 120.0 289.5 ± 117.2 0.432
 Hematocrit (%) 39.8 [34.7, 43.5] 39.2 [35.7, 45.2] 0.705
 C-reactive protein (mg/L) 9.4 [3.2, 26.1] 13.6 [7.2, 29.4] 0.213
ESR (mm/h) 11.5 [7.0, 28.8] 18.0 [10.0, 45.0] 0.017
 Albumin (g/L) 38.7 ± 6.1 37.3 ± 7.2 0.331
 Creatinine (μmol/L) 63.2 ± 21.8 64.8 ± 15.3 0.734
 Uric acid (μmol/L) 343.5 ± 103.9 303.3 ± 74.5 0.070
Montreal (age of onset), no. 0.618
 A1 16 (25.8%) 6 (17.1%)
 A2 43 (69.4%) 27 (77.1%)
 A3 3 (4.8%) 2 (5.7%)
Montreal (location), no. 0.508
 L1 12 (19.4%) 5 (14.3%)
 L2 3 (4.8%) 3 (8.6%)
 L3 38 (61.3%) 25 (71.4%)
 L4 0 (0%) 0 (0%)
 L1 + L2 6 (9.7%) 2 (5.7%)
 L3 + L4 3 (4.8%) 0 (0%)
Montreal (behavior), no. 0.783
 B1 25 (40.3%) 14 (40.0%)
 B2 17 (27.4%) 10 (28.6%)
 B3 4 (6.5%) 2 (5.7%)
 B2 + B3 2 (3.2%) 3 (8.6%)
 B1 + P 12 (19.4%) 4 (11.4%)
 B2 + P 2 (3.2%) 2 (5.7%)
 B3 + P 0 (0%) 0 (0%)
Perianal fistulizing 14 (22.6%) 6 (17.1%) 0.608
ADA as first treatment 50 (80.6%) 20 (57.1%) 0.018
Duration of ADA 25.0 [22.25, 30.75] 7.0 [6.0, 12.0] 0.001
Baseline CDAI 172.0 ± 98.0 222.3 ± 75.6 0.011
CDAI at 3 months 17.6 [6.4, 41.8] 73 [22.3, 160.9] 0.001
Baseline CDEIS 5.9 [1.6, 10.1] 7.2 [2.2, 17.6] 0.073
Concomitant medication no. 10 (16.1%) 7 (20.0%) 0.630
Adverse events no. 15 (24.2%) 9 (25.7%) 0.868
  • represents p < 0.05.

Comparing patients in the LOR group with those who maintained ADA therapy revealed statistical differences in the history of abdominal surgery, history of IFX use, baseline ESR, whether ADA was the patient’s first-line medication, baseline CDAI, and CDAI after 3 months of ADA therapy. Patients who developed LOR had a higher rate of abdominal surgery history, IFX uses history, ADA as non-first-line agent, and disease extent as manifested by higher levels of ESR, baseline CDAI, and CDAI after 3 months of ADA therapy. Unlike the PNR group, there were no statistically significant differences in the history of corticosteroids use, baseline platelets, CRP level, or perianal fistulizing behavior. In addition, there was no significant difference between the two groups who used immunosuppressive combination therapy. Furthermore, there was no statistically significant difference in the proportion of AEs.

3.4. Patients With AEs

During treatment, 27.2% (31/114) of patients experienced at least one AEs. Of these events, the most common AEs were skin allergy with an incidence of 12.3% (14/114), followed by upper respiratory tract infection (8/114, 7.0%) and leukopenia (6/114, 5.3%) (Table 4).

Table 4. Prevalence of immunological adverse events related to adalimumab.
AEs N (% whole cohort) Rate of the total IAEs (%)
Total 31 (27.2%) 100
Respiratory infection 8 (7.0%) 25.8
Skin allergy 14 (12.3%) 45.2
Leukopenia 6 (5.3%) 19.4
Abnormal liver function 6 (5.3%) 19.4
Headache and dizziness 3 (2.6%) 9.7
Arthralgia 1 (0.9%) 3.2
Psoriasis 1 (0.9%) 3.2
Tuberculosis infection 1 (0.9%) 3.2
Neurologic adverse effects 1 (0.9%) 3.2
Treatment withdrawal due to IAEs 3 (2.6%) 9.7

Three patients failed to complete the induction therapy due to serious AEs, including one case of secondary tuberculosis, one case of optic neuritis diagnosed by visual field loss during drug administration, and another case exhibiting persistent systemic allergic reactions that were ineffective with antiallergic treatment and improved after discontinuation.

Supporting Table 1 compares the baseline information of patients with and without AEs. Statistical results revealed no significant differences in each baseline variable between the two groups. There was also no statistically significant difference in the occurrence of AEs in patients with PNR, LOR, and response.

3.5. Predictive Models

Based on the data from the patients, we built the corresponding prediction models of LOR and AEs. These models were set up to search for the best-fitting model (i.e., the strongest predictive model). Since the data for PNR occurred in only eight persons (7.2%), the data size is too small and the model is prone to overfitting, even if the up-sampling method such as SMOTE is used, so the prediction model for PNR occurred was not established in this study for the analysis of related factors. Only variables that were statistically different were analyzed for their relationship with the occurrence of PNR using the LR algorithm. After dividing the data by random stratification, we compared the distribution plots of the training and test sets. The results showed no significant differences in the demographic features, LOR rate, and AEs rate within the two sets (Supporting Information Figures 1 and 2).

3.5.1. Prediction Model for LOR Occurred

A total of 58 variables were initially included in the model building by incorporating baseline data and disease activity score data at 3 months. RFE algorithm was performed to rank and select variables, and the final optimal combination of variables for each of the 10 models was obtained. All variable rankings and hyperparameter results are presented in Supporting Information (Tables 2 and 4). The performance of all LOR models is shown in Table 5. In the training set, the extra tree model had significantly better AUC (0.8967 ± 0.097), accuracy (0.8167 ± 0.115), precision (0.7967 ± 0.314), F1-score (0.7241 ± 0.266), and Brier score (0.183 ± 0.019). However, the performance of this model in the testing set is mediocre (AUC = 0.6319). Since this study plans to find the relationship between variables and LOR within the ML model, it is necessary to choose an approach that balances model fit and complexity for the analysis. In the test set, the highest AUC, which was taken as a global index of discrimination capacity, was for the KNN model (AUC = 0.7396) and was accompanied by the highest recall (0.6667). Therefore, the KNN model consisting of 9 variables was selected for the next step of model analysis.

Table 5. The LOR prediction models performance of each algorithms.
Model Variable number Accuracy Precision Recall F1-score Brier score AUC
Training Testing Training Testing Training Testing Training Testing Training Testing Training Testing
LR 50 0.7431 ± 0.129 0.44 0.695 ± 0.207 0.2727 0.6917 ± 0.261 0.3333 0.6785 ± 0.207 0.3 0.2441 ± 0.113 0.4814 0.7654 ± 0.124 0.4444
RR 33 0.7306 ± 0.124 0.52 0.6449 ± 0.149 0.3333 0.725 ± 0.277 0.3333 0.6735 ± 0.205 0.3333 0.2891 ± 0.145 0.3398 0.8188 ± 0.081 0.5764
RF 23 0.7306 ± 0.121 0.64 0.695 ± 0.299 0.5 0.65 ± 0.329 0.3333 0.6202 ± 0.264 0.4 0.1993 ± 0.021 0.218 0.8154 ± 0.117 0.6597
CART 3 0.7042 ± 0.153 0.84 0.7667 ± 0.307 0.8571 0.4917 ± 0.254 0.6667 0.5724 ± 0.252 0.75 0.2092 ± 0.1 0.1833 0.7783 ± 0.171 0.736
ET 21 0.8167 ± 0.115 0.64 0.7967 ± 0.314 0.5 0.7 ± 0.284 0.3333 0.7241 ± 0.266 0.375 0.183 ± 0.019 0.225 0.8967 ± 0.097 0.6319
GBDT 9 0.7333 ± 0.099 0.6 0.7467 ± 0.185 0.4545 0.675 ± 0.248 0.5556 0.6684 ± 0.145 0.5 0.1988 ± 0.077 0.3106 0.8150 ± 0.136 0.6806
XGBoost 19 0.7556 ± 0.094 0.64 0.7317 ± 0.1547 0.5 0.7333 ± 0.258 0.1111 0.7008 ± 0.159 0.1818 0.1933 ± 0.089 0.2421 0.8054 ± 0.139 0.7222
SVM 9 0.5944 ± 0.143 0.6 0.5754 ± 0.202 0.4667 0.6833 ± 0.307 0.7778 0.5703 ± 0.172 0.5833 0.2405 ± 0.021 0.2285 0.6641 ± 0.135 0.6493
MLP 22 0.6458 ± 0.173 0.6 0.6379 ± 0.28 0.4444 0.6167 ± 0.245 0.4444 0.5944 ± 0.205 0.4444 0.2697 ± 0.104 0.2928 0.7046 ± 0.134 0.6528
KNN 9 0.6333 ± 0.095 0.64 0.61 ± 0.156 0.5 0.675 ± 0.285 0.6667 0.591 ± 0.138 0.5714 0.2363 ± 0.049 0.2123 0.7081 ± 0.132 0.7396
  • Note: The best performance is determined by the highest AUC in testing set. The best performance model is represented in boldface and . ET: Extremely Randomized Trees.
  • Abbreviations: CART, Classification, and Regression Tree; GBDT, Gradient Boosting Decision Tree; KNN, K- Nearest Neighbor; LR, Logistic Regression; MLP, Multi-Layer Perceptron; RF, Random Forests; RR, Ridge Regression; SVM, Support Vector Machines; XGBoost, Extreme Gradient Boosting.

3.5.2. Prediction Model for AEs Occurred

A total of 47 variables were initially included in the model building by incorporating baseline data, and the results of variable ranking by the RFE and model hyperparameterization are presented in Supporting Information Table 3 and 4. The best performer in the training set is XGBoost, with the best performing AUC (0.7559 ± 0.178), accuracy (0.7333 ± 0.105), precision (0.6333 ± 0.296), recall (0.4583 ± 0.272), F1-score (0.5005 ± 0.245), and Brier score (0.1825 ± 0.066). However, similar to the previous LOR model building, there is a problem of model overfitting, and the best performing model in the testing set is the extra tree model with best AUC = 0.7113, accuracy = 0.7586, precision = 0.6, recall = 0.375, F1-score = 0.4615, and Brier score = 0.1899 (Table 6). Thus, the extra tree model based on nine variables was selected for the next analysis.

Table 6. The AEs prediction models performance of each algorithms.
Model Variable number Accuracy Precision Recall F1-score Brier score AUC
Training Testing Training Testing Training Testing Training Testing Training Testing Training Testing
LR 9 0.6356 ± 0.105 0.7586 0.2333 ± 0.327 0.6 0.2 ± 0.267 0.375 0.2105 ± 0.282 0.4615 0.2503 ± 0.054 0.2442 0.5456 ± 0.19 0.5774
RR 17 0.6278 ± 0.143 0.5517 0.4417 ± 0.398 0.1429 0.2917 ± 0.233 0.125 0.3276 ± 0.258 0.1333 0.2726 ± 0.094 0.2611 0.5198 ± 0.216 0.4167
RF 12 0.72 ± 0.088 0.7241 0.5167 ± 0.391 0.5 0.3167 ± 0.241 0.125 0.3833 ± 0.281 0.2 0.1908 ± 0.031 0.1983 0.7494 ± 0.179 0.631
CART 21 0.7222 ± 0.106 0.6207 0.585 ± 0.354 0.3333 0.475 ± 0.346 0.375 0.4790 ± 0.281 0.3529 0.2480 ± 0.098 0.3793 0.6941 ± 0.22 0.5446
ET 9 0.6422 ± 0.117 0.7586 0.45 ± 0.279 0.6 0.4583 ± 0.344 0.375 0.4105 ± 0.238 0.4615 0.2215 ± 0.052 0.1899 0.7472 ± 0.188 0.7113
GBDT 7 0.6878 ± 0.06 0.6897 0.55 ± 0.35 0.4 0.325 ± 0.259 0.25 0.3667 ± 0.2 0.3077 0.2274 ± 0.055 0.2133 0.7450 ± 0.176 0.6607
XGBoost 15 0.7333 ± 0.105 0.6207 0.6333 ± 0.296 0.2 0.4583 ± 0.272 0.125 0.5005 ± 0.245 0.1538 0.1825 ± 0.066 0.2693 0.7559 ± 0.178 0.4821
SVM 2 0.6678 ± 0.065 0.6897 0.35 ± 0.391 0.3333 0.1583 ± 0.16 0.125 0.21 ± 0.212 0.1818 0.2082 ± 0.013 0.2079 0.6643 ± 0.19 0.5595
MLP 17 0.6033 ± 0.135 0.7586 0.35 ± 0.391 0.6 0.1583 ± 0.16 0.375 0.2133 ± 0.218 0.4615 0.2795 ± 0.096 0.2321 0.4966 ± 0.194 0.5774
KNN 9 0.6778 ± 0.082 0.7241 0.4 ± 0.49 0.5 0.1249 ± 0.155 0.125 0.19 ± 0.234 0.2 0.2032 ± 0.049 0.2336 0.7264 ± 0.206 0.3571
  • Note: The best performance is determined by the highest AUC in the testing set. The best performance model is represented in boldface and . ET: extremely randomized trees.
  • Abbreviations: CART, classification and regression tree; GBDT, gradient boosting decision tree; KNN, K-nearest neighbor; LR, logistic regression; MLP, multilayer perceptron; RF, random forests; RR, ridge regression; SVM, support vector machines; XGBoost, extreme gradient boosting.

3.6. Model and Risk Factors Analysis

The SHAP approach and PDP variable correlation analysis were introduced in this study to analyze the variables and the occurrence of PNR, LOR, and AEs in several dimensions. SHAP is a model-agnostic explanation technique derived from cooperative game theory and was used to interpret the ML model. SHAP allows for the analysis of variables from both global and local perspectives. PDP analysis, on the other hand, reflects the relationship between individual variables and the occurrence of related events, and it combines with the accuracy, AUC, PPV, and NPV of each variable to represent the predictive value.

3.6.1. Factors Associated With the Occurrence of LOR

Firstly, the LOR occurrence prediction model is analyzed using SHAP, and the importance of the variables is shown in Figure 2(a). The three most important variables are baseline CDAI, uric acid, and CDAI at 3 months of ADA treatment (3M CDAI). Patients with higher baseline CDAI (CDAI score > 190.1), lower uric acid (uric acid < 329.0 μmol/L), and higher 3M CDAI (3M CDAI score > 55.4) indicated a higher probability of LOR. Also, note that some of the data points with high baseline CDAI and low uric acid fall in the high incidence range of LOR, while all data points with high 3M CDAI are in the range of high LOR incidence, suggesting that a high 3M CDAI is more effective in predicting the occurrence of LOR, which is also consistent with the data that the 3M CDAI was much higher in the LOR group than in the response group (17.6 [6.4, 41.8] vs73 [22.3, 160.9], p < 0.01). Other variables in the model were also associated with the occurrence of LOR; however, combined with Figure 2(b) shows that most of the discriminations of the data rely on the top five variables, and the top three of them play a major role.

Details are in the caption following the image
Secondary loss of response-related risk factors in adalimumab treatment. (a) Variable importance ranking based on SHapley Additive exPlanations (SHAP) values in the KNN-based secondary loss of response model. The contribution of each variable to the overall model at each point in the figure represents a sample. The variables are ranked according to the sum of the SHAP values for all patients. Red indicates that the value of a variable is higher than average, and blue indicates that the value of a variable is lower than average. The x-axis indicates the effect of SHAP values on the model output. The larger the value of the x-axis, the greater the probability of this level. (b) SHAP decision chart of the KNN model. The folded line from the bottom to the top reflects the process of accumulation of variables in decision-making. It illustrates the cumulative process for each sample and each feature of the model, and the x-axis indicates the final predicted value. Baseline CDAI, uric acid, and 3M CDAI had the highest proportion of predictive importance for outcome. (c) SHAP local explanation for true negative, false positive, false negative, and true positive data. Each variable contributes differently to the final predicted value (model output); variables with higher prediction possibility are shown in red, and variables with lower prediction possibility are shown in blue. According to true negative, false negative, true positive, low uric acid, high CDAI, and 3MCDAI are strong predictors of LOR. (d) Partial dependence plots of baseline CDAI, uric acid, and CDAI after 3 months of adalimumab treatment in the occurrence of secondary loss of response. The x-axis and y-axis represent the levels of the variables and predicted probabilities of occurring events, respectively. Abbreviations: 3M CDAI: CDAI score of patients after 3 months of adalimumab treatment; CRP: C-reactive protein; ESR: erythrocyte sedimentation rate; ALB: albumin.
Details are in the caption following the image
Secondary loss of response-related risk factors in adalimumab treatment. (a) Variable importance ranking based on SHapley Additive exPlanations (SHAP) values in the KNN-based secondary loss of response model. The contribution of each variable to the overall model at each point in the figure represents a sample. The variables are ranked according to the sum of the SHAP values for all patients. Red indicates that the value of a variable is higher than average, and blue indicates that the value of a variable is lower than average. The x-axis indicates the effect of SHAP values on the model output. The larger the value of the x-axis, the greater the probability of this level. (b) SHAP decision chart of the KNN model. The folded line from the bottom to the top reflects the process of accumulation of variables in decision-making. It illustrates the cumulative process for each sample and each feature of the model, and the x-axis indicates the final predicted value. Baseline CDAI, uric acid, and 3M CDAI had the highest proportion of predictive importance for outcome. (c) SHAP local explanation for true negative, false positive, false negative, and true positive data. Each variable contributes differently to the final predicted value (model output); variables with higher prediction possibility are shown in red, and variables with lower prediction possibility are shown in blue. According to true negative, false negative, true positive, low uric acid, high CDAI, and 3MCDAI are strong predictors of LOR. (d) Partial dependence plots of baseline CDAI, uric acid, and CDAI after 3 months of adalimumab treatment in the occurrence of secondary loss of response. The x-axis and y-axis represent the levels of the variables and predicted probabilities of occurring events, respectively. Abbreviations: 3M CDAI: CDAI score of patients after 3 months of adalimumab treatment; CRP: C-reactive protein; ESR: erythrocyte sedimentation rate; ALB: albumin.
Details are in the caption following the image
Secondary loss of response-related risk factors in adalimumab treatment. (a) Variable importance ranking based on SHapley Additive exPlanations (SHAP) values in the KNN-based secondary loss of response model. The contribution of each variable to the overall model at each point in the figure represents a sample. The variables are ranked according to the sum of the SHAP values for all patients. Red indicates that the value of a variable is higher than average, and blue indicates that the value of a variable is lower than average. The x-axis indicates the effect of SHAP values on the model output. The larger the value of the x-axis, the greater the probability of this level. (b) SHAP decision chart of the KNN model. The folded line from the bottom to the top reflects the process of accumulation of variables in decision-making. It illustrates the cumulative process for each sample and each feature of the model, and the x-axis indicates the final predicted value. Baseline CDAI, uric acid, and 3M CDAI had the highest proportion of predictive importance for outcome. (c) SHAP local explanation for true negative, false positive, false negative, and true positive data. Each variable contributes differently to the final predicted value (model output); variables with higher prediction possibility are shown in red, and variables with lower prediction possibility are shown in blue. According to true negative, false negative, true positive, low uric acid, high CDAI, and 3MCDAI are strong predictors of LOR. (d) Partial dependence plots of baseline CDAI, uric acid, and CDAI after 3 months of adalimumab treatment in the occurrence of secondary loss of response. The x-axis and y-axis represent the levels of the variables and predicted probabilities of occurring events, respectively. Abbreviations: 3M CDAI: CDAI score of patients after 3 months of adalimumab treatment; CRP: C-reactive protein; ESR: erythrocyte sedimentation rate; ALB: albumin.
Details are in the caption following the image
Secondary loss of response-related risk factors in adalimumab treatment. (a) Variable importance ranking based on SHapley Additive exPlanations (SHAP) values in the KNN-based secondary loss of response model. The contribution of each variable to the overall model at each point in the figure represents a sample. The variables are ranked according to the sum of the SHAP values for all patients. Red indicates that the value of a variable is higher than average, and blue indicates that the value of a variable is lower than average. The x-axis indicates the effect of SHAP values on the model output. The larger the value of the x-axis, the greater the probability of this level. (b) SHAP decision chart of the KNN model. The folded line from the bottom to the top reflects the process of accumulation of variables in decision-making. It illustrates the cumulative process for each sample and each feature of the model, and the x-axis indicates the final predicted value. Baseline CDAI, uric acid, and 3M CDAI had the highest proportion of predictive importance for outcome. (c) SHAP local explanation for true negative, false positive, false negative, and true positive data. Each variable contributes differently to the final predicted value (model output); variables with higher prediction possibility are shown in red, and variables with lower prediction possibility are shown in blue. According to true negative, false negative, true positive, low uric acid, high CDAI, and 3MCDAI are strong predictors of LOR. (d) Partial dependence plots of baseline CDAI, uric acid, and CDAI after 3 months of adalimumab treatment in the occurrence of secondary loss of response. The x-axis and y-axis represent the levels of the variables and predicted probabilities of occurring events, respectively. Abbreviations: 3M CDAI: CDAI score of patients after 3 months of adalimumab treatment; CRP: C-reactive protein; ESR: erythrocyte sedimentation rate; ALB: albumin.

The contribution of variables in individual data was further analyzed using SHAP (Figure 2(c)); we analyzed the data of the four cases, including true-negative, true-positive, false-positive, and false-negative prediction data. It can be found that for predicting the occurrence of LOR, the level of uric acid, which with a significant proportion of the determination weight, does not represent absolute discriminatory efficacy because uric acid also varies in false-positive and false-negative data. Similarly, a high baseline CDAI does not definitely indicate the occurrence of LOR. However, a high 3M CDAI is more significant for the occurrence of LOR.

Further PDP analysis of the first five variables of the model revealed that when baseline CDAI increased, the probability of LOR increased, after baseline CDAI higher than 300, the probability no longer increases significantly. The probability of LOR decreased in patients with uric acid higher than 275 μmol/L and increased in patients with 3M CDAI higher than 50 (Figure 2(d)). Compared to the previous three variables, CRP and ESR do not have a relatively large effect on the probability of LOR occurrence, although the overall trend increases with increasing CRP and ESR; the confidence interval (blue area) still contains the red baseline (Supporting Information Figure 3). When comparing the predictive value of individual variables in the test set, it was found that 3MCDAI had the highest AUC of 0.677. However, it should be noted that this value was still lower than the efficacy of the KNN model. Additionally, CRP and ESR demonstrated a lower predictive value when compared to the other three variables (Supporting Information Table 5).

3.6.2. Factors Associated With the Occurrence of AEs

The AE prediction model constructed from nine variables is then analyzed. SHAP analysis found that the variables included in the model had similar predictive values for the occurrence of AEs, and the three top variables of this list were uric acid, PLT, and CRP (Figure 3(a)). The majority of the low uric acid data (uric acid < 372.8 μmol/L) predicted a lower probability of AEs; the different levels of other variables data points fall in the middle or on either side of the occurrence of AEs. Together with Figure 3(b), we can obtain that all nine variables contribute to the prediction of outcome, while the first five variables are more correlated with AEs. Local analysis of single cases also further demonstrated the predictive efficacy of relatively low levels of uric acid for the nonoccurrence of AEs (Figure 3(c)). PDP analysis was performed and found that the probability of AEs increased with increasing levels of uric acid. Moreover, at PLT less than 250∗109/L, the probability of AEs decreases as PLT increases; when it exceeded 250∗109/L, the probability of AEs relatively increased. CRP showed a similar pattern, with the probability of AEs decreasing as CRP increased and an increase in the relative probability at CRP above 20 mg/dL (Figure 3(d)). This trend is corroborated by the similar changes in WBC (Supporting Information Figure 3C), an inflammation-related index, suggesting it may be possible that AEs are less likely to occur in low-grade or noninflammatory states relative to other states. The individual predictive values of the above variables in the test set were low, indicating poor predictive power. Among them, uric acid demonstrated the highest predictive value with an AUC of 0.612, while the other variables all exhibited AUCs below 0.6 (Supporting Information Table 5).

Details are in the caption following the image
Adverse event-related risk factors in adalimumab treatment, similar to Figure 2. (a) Variable importance ranking based on SHapley additive exPlanations (SHAP) values in the extra tree-based adverse events model. (b) SHAP decision chart of extra-tree model. It illustrates the cumulative process for each sample and each feature of the model, and the x-axis indicates the final predicted value. Uric acid, platelet (PLT), C-reactive protein (CRP) and white blood cell (WBC) had the highest proportion of predictive importance for outcome. (c) SHAP local explanation for true negative, false positive, false negative, and true positive data. (d) Partial dependence plots of uric acid, PLT and CRP in the occurrence of adverse events. The x-axis and y-axis represent the levels of the variables and predicted probabilities of occurring events, respectively. Abbreviations: WBC: white blood cell; ESR: erythrocyte sedimentation rate.
Details are in the caption following the image
Adverse event-related risk factors in adalimumab treatment, similar to Figure 2. (a) Variable importance ranking based on SHapley additive exPlanations (SHAP) values in the extra tree-based adverse events model. (b) SHAP decision chart of extra-tree model. It illustrates the cumulative process for each sample and each feature of the model, and the x-axis indicates the final predicted value. Uric acid, platelet (PLT), C-reactive protein (CRP) and white blood cell (WBC) had the highest proportion of predictive importance for outcome. (c) SHAP local explanation for true negative, false positive, false negative, and true positive data. (d) Partial dependence plots of uric acid, PLT and CRP in the occurrence of adverse events. The x-axis and y-axis represent the levels of the variables and predicted probabilities of occurring events, respectively. Abbreviations: WBC: white blood cell; ESR: erythrocyte sedimentation rate.
Details are in the caption following the image
Adverse event-related risk factors in adalimumab treatment, similar to Figure 2. (a) Variable importance ranking based on SHapley additive exPlanations (SHAP) values in the extra tree-based adverse events model. (b) SHAP decision chart of extra-tree model. It illustrates the cumulative process for each sample and each feature of the model, and the x-axis indicates the final predicted value. Uric acid, platelet (PLT), C-reactive protein (CRP) and white blood cell (WBC) had the highest proportion of predictive importance for outcome. (c) SHAP local explanation for true negative, false positive, false negative, and true positive data. (d) Partial dependence plots of uric acid, PLT and CRP in the occurrence of adverse events. The x-axis and y-axis represent the levels of the variables and predicted probabilities of occurring events, respectively. Abbreviations: WBC: white blood cell; ESR: erythrocyte sedimentation rate.
Details are in the caption following the image
Adverse event-related risk factors in adalimumab treatment, similar to Figure 2. (a) Variable importance ranking based on SHapley additive exPlanations (SHAP) values in the extra tree-based adverse events model. (b) SHAP decision chart of extra-tree model. It illustrates the cumulative process for each sample and each feature of the model, and the x-axis indicates the final predicted value. Uric acid, platelet (PLT), C-reactive protein (CRP) and white blood cell (WBC) had the highest proportion of predictive importance for outcome. (c) SHAP local explanation for true negative, false positive, false negative, and true positive data. (d) Partial dependence plots of uric acid, PLT and CRP in the occurrence of adverse events. The x-axis and y-axis represent the levels of the variables and predicted probabilities of occurring events, respectively. Abbreviations: WBC: white blood cell; ESR: erythrocyte sedimentation rate.

3.6.3. Factors Associated With the Occurrence of PNR

During the analysis of factors associated with the occurrence of PNR, the variables that differed between the PNR group and non-PNR group were the history of corticosteroid use, perianal fistulizing behavior, PLT, and CRP. The regression equations were obtained through the logistic regression as follows:

P = 1/[1 + e(−11.070+4.1662X1+1.7844X2+0.0135X3+0.0086X4)] (X1, Corticosteroids history; X2, perianal fistulizing behavior; X3, PLT; X4, CRP)

As shown in the risk factor analysis figure and PDP plots (Figure 4), patients with a history of corticosteroid use, perianal fistulizing behavior, elevated PLT, and elevated CRP had an increased risk of PNR, with an increased probability of 6347.100%, 495.6%, 1.3%, and 0.8%, respectively. All individual variables exhibit strong predictive performance across the entire dataset, with the PLT achieving an accuracy of 92.86% and an AUC of up to 0.767 (Supporting Information Table 5). However, considering the sample size, it is necessary to validate in external data to form a comprehensive assessment.

Details are in the caption following the image
Primary nonresponse-related risk factors in adalimumab treatment. (a) Odds ratio (OR) for the association between each risk factor and risk for primary nonresponse. The right side of the figure displays the relative risk and corresponding 95% confidence interval. (b) Partial dependence plots of history of corticosteroid use, perianal fistulizing behavior, platelet (PLT), and C-reactive protein (CRP) in the occurrence of primary nonresponse. The x-axis and y-axis represent the levels of the variables and predicted probabilities of occurring events, respectively.
Details are in the caption following the image
Primary nonresponse-related risk factors in adalimumab treatment. (a) Odds ratio (OR) for the association between each risk factor and risk for primary nonresponse. The right side of the figure displays the relative risk and corresponding 95% confidence interval. (b) Partial dependence plots of history of corticosteroid use, perianal fistulizing behavior, platelet (PLT), and C-reactive protein (CRP) in the occurrence of primary nonresponse. The x-axis and y-axis represent the levels of the variables and predicted probabilities of occurring events, respectively.

4. Discussion

ADA has been widely used in Western countries as a first-line treatment for CD. Nevertheless, this biologic has only been approved for CD treatment in China since 2020. Compared to IFX, studies on the response to ADA treatment and related risk factors in the Chinese population are still relatively lacking. To date, there have been several observational studies on the ADA in Asian populations [12, 1416]. However, the majority of these studies have mainly focused on the evaluation of ADA efficacy or comparison with the effects of IFX. Studies have been rarely conducted on the analysis of therapy failure or AEs in the treatment of ADA. This study explored therapy poor efficacy events on ADA treatment in a cohort of 114 CD patients. Among the cohort, 54.4% of the patients remained on ADA maintenance therapy, 7% of them experienced PNR, 30.7% of them experienced LOR, and 27.2% of them experienced AEs. Analysis revealed that the previous use of corticosteroids, higher baseline platelets, higher CRP level, and perianal fistulizing behavior were associated with the occurrence of PNR. Following the development of the KNN method-based model, the possible correlates of factors and LOR were analyzed by the SHAP method, which in turn led to the baseline CDAI, uric acid, and CDAI at 3 months most associated with the occurrence of LOR. Based on the extra-tree model, uric acid, PLT, and CRP were the three most important factors associated with the occurrence of AEs.

In reviewing the literature, patient demographics in this study were broadly similar to those of patient populations in Asian CD studies [1216]. While comparing Western studies, the patients in this study had a lower BMI and lower smoking rates, which is related to the racial differences between the Asian and Western patient groups [31, 32]. However, in terms of the incidence of therapy failure events, the performance was roughly similar among different regions. Research in this area has shown that up to 40% of CD patients do not respond or lose response after the initial benefit from anti-TNF agents [33]. Of this, around 25%–30% of the population does not respond to treatment during the induction phase [34]. So far, the incidence of PNR in the studies of ADA was approximately 20%–44%, with the reported occurrence of PNR in Asian populations ranging from approximately 24.3% to 30% [16, 3436]. The percentage of patients who experienced a PNR event after ADA use in our study was 7%. The difference from previous research was considered to be related to the sample size and the fact that this study was retrospective. Most randomized controlled clinical studies have strict criteria for inclusion, such as the severity of disease and previous biologics use history. The clinical practice has found that some patients with mild to moderate disease may also need biologics therapy, such as a combination of anal fistulas, or corticosteroids resistance. Moreover, some patients who faced LOR or intolerance to first-line biologics preferred to switch to other kinds of biologics in the early course of LOR. In the present study, 48.2% (55/114) of patients treated with ADA were in mild disease activity or remission state (CDAI ≤ 220), which may lead to a lower-level incidence of PNR in our study. Besides, statistical differences were found between the PNR and non-PNR groups in terms of the history of corticosteroid use, baseline platelets, CRP level, and perianal fistulizing behavior. This indicates a high correlation of ADA efficacy with previous medication and disease behavior. They are also the main features in the assessment of medication strategies for patients with CD. At the same time, no PNR occurred in the population treated with the combination therapy of immunomodulators in this study. Currently, there is controversy as to whether immunomodulator combination therapy can reduce the incidence of PNR and LOR in ADA. Combination therapy of the IFX with azathioprine has been shown to be superior to either treatment alone in patients with CD [37]. However, a meta-analysis of combination therapy between ADA and immunomodulators found only marginally greater benefits than monotherapy with ADA [38]. And in the CALM study, a strategy of starting ADA as monotherapy escalated to combined immunosuppression had been successfully explored [39]. Reinisch et al. suggested considering stopping any concomitant immunomodulator during the first 6–12 months of treatment in patients achieving stable remission [34]. As of now, it appears that there may be a chance of some benefit from adding immunosuppressive therapy to ADA treatment, but the exact strategy needs further investigation. In our study, it was observed that none of the patients who received immunosuppressive therapy experienced PNR, but the incidence of LOR was not significantly different. Although the sample size was small, these findings suggest that combining immunosuppression may offer some short-term benefits.

LOR, another cause of treatment failure, is a common issue encountered by many biologics when used to treat CD. According to the report data, LOR is estimated to occur in approximately 13%–40% of CD patients per year [40]. In our 32-month study, 30.7% of the cohort occurred LOR, which was in line with the previous Asian study [36]. Moreover, the median duration of treatment for patients in the LOR group in the present study was 7 months, which was a significant difference compared to 25 months in the maintenance treatment group. These results are in keeping with previous observational studies, in which approximately half of the patients who develop LOR occur in the first year [16]. However, the median time of LOR in our study was shorter than in other studies, possibly due to the fact that 27% of the ADA patients included in our study had a history of IFX use [41]. Interestingly, the ML combined with SHAP analysis revealed that baseline CDAI, uric acid, and CDAI at 3 months were associated with the occurrence of LOR in this study. Among them, the higher the baseline CDAI (CDAI score > 190.1) and 3M CDAI (3M CDAI score > 55.4), the higher the probability of LOR, and LOR occurred in all patients with high 3M CDAI. A CDAI score of ≥ 150 is typically regarded as an indicative of active disease, whereas a decrease of more than 100 points or a score below 150 following therapy would signify clinical remission of CD. Our study found that in treating patients with CD, despite the absence of PNR following ADA administration and a reduction in CDAI score to below 150 points, those with a 3M CDAI score above 55.4 remained at an elevated risk of developing LOR. This also suggests that CD patients with high levels of disease activity before ADA treatment have a higher probability of LOR, and that the initial 3-month response to ADA is suggestive of overall treatment efficacy. More active clinical follow-up monitoring of this group is needed, with early switching of treatment if necessary. It is noteworthy that the higher the uric acid in this study, the lower the probability of LOR. Patients in this study with baseline uric acid greater than 275 μmol/L, which is a normal high value, had a relatively lower probability of LOR. The intestine is one of the uric acid excretion organs, but so far, there are a few clinical studies on uric acid changes in CD patients. Previous studies all suggest the presence of altered uric acid levels in CD patients, but there is no consistent conclusion on the direction of changes [4244]. Chiaro et al. reported that DSS-induced experiment colitis mice do have altered intestinal uric acid metabolism and it has been suggested that the dysfunction of intestinal epithelium disrupts uric acid excretion [42]. Therefore, uric acid may be related to the inflammatory state of the intestine in CD patients; however, uric acid is related to renal function, gender, diet, and nutritional status, so the relationship between uric acid and disease activity and LOR in patients with CD needs further study in the future. Previous studies have identified a high BMI as a risk factor for LOR; however, the limited sample size and lower BMI in Chinese patients compared to those in Western countries may have contributed to the lack of significant difference in this study [45].

AE, as one of the major factors influencing the maintenance of ADA therapy in CD patients, is an efficacy feedback event valued by clinicians. The incidence of AEs in the phase III clinical study of ADA in China was 37%, and the percentage of AEs occurring in this study was 27.2% [12]. The incidence of adverse reactions was generally consistent compared to other studies. In contrast to IFX, which requires intravenous administration, ADA only needs subcutaneous injection and allows patients to complete maintenance therapy in a non-hospital setting. But, at the same time, monitoring for AEs may not be as timely as IFX. Thus, the knowledge of factors that increase the relative risk of AEs in our study can help guide patients’ maintenance therapy. In the present study, high baseline uric acid, PLT, CRP, and WBC were associated with the occurrence of AEs. Notably among them, PLT, CRP, and WBC, which respond to the degree of inflammation in patients, all showed a decrease in the probability of AEs at low levels and a relative increase at high levels, suggesting that CD patients with an underlying non- or low inflammatory state have a lower probability of AEs. Alleviating the inflammatory status of CD patients before ADA use may benefit the subsequent maintenance therapy.

A highlight of this study was the inclusion of an ML approach to analyze the study data, whereas KNN and extra tree-based models were obtained by comparing the modeling effects of 10 models. The SHAP approach was used to analyze the risk factors associated with the occurrence of LOR and AEs in patients with ADA use. The relationship between the variables was visualized in this study. Until now, SHAP has been used in a lot of studies and has proven to be effective [4649]. In addition, for most clinical research, the amounts of data are relatively small when compared to the large numbers of variables involved. The introduction of ML methods can help address this challenge and thus maximize overall data retention. Numerous studies on big data have consistently exhibited the advantages of ML algorithms. Additionally, there is an increasing number of instances where ML algorithms are being applied in studies with small sample sizes [5053]. Theoretical research has indicated that ML algorithms outperform traditional statistical algorithms when employed on high-quality datasets comprising more than 110 samples, thereby revealing deeper insights into potential internal relationships [5457]. During the initial stages of the COVID-19 epidemic, ML algorithms effectively summarized the characteristics of small sample cases from various regions, thereby contributing to subsequent disease studies [58]. Consequently, incorporating ML algorithms into this research introduces new avenues for exploring and extracting insights from our small sample data. This study uses the MICE and SMOTE methods to expand the amount of data and improve the validity and consistency of model building while retaining the intrinsic relationships of the original data. Although there may be a difference between the ML simulation data and the real data, the conclusions obtained can still provide guidance to ADA treatment in CD. Another advantage of ML is that most models can be updated in real time and can reflect nonlinear and complex variable relationships. As the underlying database expands, the models can be continually optimized for simulation learning to better reveal the intrinsic connections.

A number of limitations need to be noted regarding the present study. The first limitation was that, because it is a single-center study, the sample size was too small and the characteristics of the data among different populations themselves were not well presented. Secondly, this was a retrospective study that there was some recall bias and failed to obtain meaningful test results such as fecal calprotectin and antidrug antibodies and also failed to capture the dynamics of variables change over the course of treatment. Due to the limited amount of data and the retrospective nature of this study, there were insufficient prospective data to validate externally. Thus, it would be more meaningful to proceed with a larger prospective experimental study in order to further examine the results and inform clinical practice. Despite these limitations, the conclusions obtained through ML methods are instructive and can direct subsequent research.

In conclusion, our study confirms the effectiveness of ADA in the treatment of Chinese patients with CD and highlights the factors associated with the occurrence of PNR, LOR, and AEs. These findings have clinical implications for the selection of treatment options for CD patients. It is important to consider the increased risk of poor outcomes with ADA in patients with a history of corticosteroid use, high levels of disease activity, and high inflammatory state, which can help guide the clinical application of ADA.

Conflicts of Interest

The authors declare no conflicts of interest.

Author Contributions

Xiaojun Li and Maomao Tang contributed equally to this work and should be considered co-first authors.

Funding

This work was supported by the National Natural Science Foundation of China (grant number: 81470802).

Acknowledgments

We thank and acknowledge Dr. Wenchen Dong (University College London) and Dr. Jun Chao (Hunan Aicortech Intelligent Research Institute Co.) for software assistance and Dr. Mengyuan Qi for linguistic assistance.

    Supporting Information

    Supporting file 1.pdf: Table 1: Univariate analyses for variables in the no AE group and AE group. Table 2: Ranking of variable importance in LOR based on recursive feature elimination (RFE) feature reduction strategies. Table 3: Ranking of variable importance in AEs based on recursive feature elimination (RFE) feature reduction strategies. Table 4: Optimization results of hyperparameters for each model. Table 5: Predictive value of each variable. Figure 1: Distribution of ongoing response-secondary loss of response, gender, and age of patients in the training set and test set in LOR model. Figure 2: Distribution of adverts events, gender, and age of patients in the training set and test set in AE model. Figure 3: Partial dependence plots in KNN-based LOR model and extra-tree-based AE model.

    Data Availability Statement

    The data that support the findings of this study are available from the corresponding author upon reasonable request.

      The full text of this article hosted at iucr.org is unavailable due to technical difficulties.