Identifying Risk Factors for Poor Efficacy of Adalimumab Treatment in Patients With Crohn’s Disease: Insights From Machine Learning Models
Abstract
Aim: Adalimumab (ADA) is an effective treatment for Crohn’s disease (CD); however, some patients still experience adverse reactions and nonresponse. This study aimed to explore the risk factors associated with ADA poor efficacy through machine learning algorithms, which provide promising guidance for the management of ADA in clinical practice.
Methods: This single-center investigation included 114 CD patients treated with ADA in the Department of Gastroenterology from January 2020 to January 2023. Risk factors associated with each poor efficacy event were explored using logistic regression and machine learning algorithms. Shapley additive explanations (SHAP) and partial dependence plot methods were used to analyze the risk factors of each event.
Results: The results showed 8 of these patients experienced primary non-response (PNR), 35 patients developed secondary loss of response (LOR), and 27.2% (31/114) of patients experienced at least one adverse events (AEs). After comparing the fit of the models established by 10 algorithms, the risk factors associated with PNR, LOR, and AEs were analyzed using the logistic regression algorithm, KNN algorithm, and Extra Tree algorithm, respectively. The most important variables related to the PNR, LOR, and AEs events were the history of corticosteroid use, baseline CDAI, and uric acid, respectively.
Conclusions: This study confirmed the efficacy of ADA for clinical practice in the Chinese CD population, and that patients with a history of corticosteroid use, high levels of disease activity, and high inflammatory state before ADA treatment were associated with increased risks of poor efficacy.
1. Introduction
Crohn’s disease (CD) is a life-long, relapsing, and chronic inflammatory disease that can involve the entire gastrointestinal tract [1]. The clinical presentation and severity of CD patients are diverse, and the treatments varying with different severities are currently a well-recognized strategy [2]. Since the introduction of biological agents into CD treatment in the 1990s, the patients have gradually achieved long-term and endoscopic remission. This kind of drug improves the effectiveness of traditional treatments such as corticosteroid and immunosuppressive agents and thus reaches the therapeutic goal of preventing intestinal complications and halting disease progression. Nowadays, biological agents are recommended as a first-line treatment option for patients with moderate to severe CD [3, 4]. In this situation, the use of tumor necrosis factor (TNF) inhibitors, one of the most prominent biological agents, has a more detailed and targeted treatment strategy. Adalimumab (ADA), a fully human anti-TNF monoclonal antibody, also has shown the ability to induce and maintain remission in global clinical trials of patients with CD [5, 6]. However, as a drug has developed more than 15 years, the medication strategy of ADA still has much to be improved.
ADA is one of the classic biological agents for the treatment of CD, which is recommended for the treatment of moderate-to-severe active CD, or as conversion therapy for patients with active CD who have a secondary loss of response (LOR) to infliximab (IFX) [3, 4, 7]. However, the drug effect of ADA varies among populations with different inflammatory statuses and medical histories. Similar to other biological agents, the common TNF inhibitor-related poor efficacy events are primary non-response (PNR), secondary LOR, and adverse events (AEs). Noteworthy, the poor efficacy of anti-TNFα therapy is of great clinical concern and may result in dose escalation, switching the anti-TNFα agent (or a drug with another mode of action), or intestinal surgery [8, 9]. It has not yet been settled which factors influence drug efficacy. There were many studies of relevant risk factors for poor efficacy in patients with IFX including age, extraintestinal manifestations, female, and increased antibodies to IFX [9–11]. Nevertheless, there is still much to be explored in the research on risk factors associated with ADA’s poor efficacy, especially in Asian populations. Several studies have been done on ADA in China [12, 13], Japan [14, 15], and Korea [16], but these studies have mainly focused on the efficacy of ADA or compared that with the efficacy of IFX. There is a research gap in analyzing the factors related to the effects of ADA itself. Focusing back on ADA treatment, we need to pay more attention to the analysis of factors associated with ADA’s poor efficacy. Since the approval of ADA for the treatment of CD patients in China in 2020, more clinical experience in Asian patients can be summarized. Moreover, the experience will facilitate the establishment of more precise treatment strategies for individual CD patients.
Therefore, this study aims to determine the factors associated with PNR, LOR, and AEs of ADA. To do so, the clinical data of CD patients who were treated with ADA in our department between January 2020 and January 2023 were retrospectively analyzed. The relationship between factors and adverse outcomes was investigated using machine learning (ML), Shapley additive explanations (SHAP), and partial dependence plot (PDP) methods, which are expected to further guide the use of ADA in clinical practice.
2. Methods
2.1. Study Population
This cohort study included patients from the Department of Gastroenterology, Second Xiangya Hospital, Central South University. The case collection was conducted from January 2020 to January 2023. The observation deadline for each patient is January 2023 or 10 weeks after the termination of ADA treatment. The inclusion criteria for this study were patients (age ≥ 16 years) with a confirmed diagnosis of CD and treated with ADA. The diagnosis of CD was based on the diagnostic criteria of the Consensus on Diagnosis and Treatment of Inflammatory Bowel Disease (Beijing, 2018) [17]. Patients included meet the guideline’s recommended indications for ADA use in China [18] and evaluated by experienced gastroenterologists. Patients were treated with ADA 160 mg at week 0, 80 mg at week 2, and the dose was reduced to 40 mg at week 4 to reach the maintenance dose when the patients achieved remission. Patients with malignancy, chronic, or severe underlying diseases were excluded.
2.2. Data Collection
All data were obtained from medical records from the institution. According to expert advice and literature review, variables and data were collected, including demographic, clinical, laboratory, and imaging data. Demographic data collected at the initiation of ADA treatment included sex, age, body mass index (BMI), and smoking history. Clinical signs and symptoms included the duration of disease, location of the lesion, disease behavior, perianal lesions, extraintestinal manifestations, fistulae, medication history, abdominal surgery history, and Crohn’s disease activity index (CDAI). Laboratory data included blood routine examination, C-reactive protein (CRP), and albumin. Fecal calprotectin was excluded due to the presence of more than 70% missing data, resulting from the shortage of test reagents caused by the COVID-19 pandemic. Imaging data included gastrointestinal endoscopy, computer tomography enterography (CTE), and magnetic resonance enterography (MRE).
2.3. Evaluation of Disease
The initial location of CD lesions was determined by colonoscopy and CTE. The Montreal classification for CD was used to categorize disease phenotypes. The location of the lesion was designated as L1 (ileal), L2 (colonic), or L3 (ileocolonic) with L4 as a modifier designating concomitant upper tract disease. Disease behavior was categorized as follows: non-stricturing, nonpenetrating (B1), stricturing (B2), penetrating (B3), and with a P modifier to describe the concomitant perianal disease [19]. The clinical disease activity for CD patients was evaluated using the CDAI, with clinical remission defined as a CDAI < 150, mild disease activity defined as a 150 ≤ CDAI ≤ 220, and moderate to severe disease activity defined as CDAI > 220 [4, 20]. Mucosal inflammation was evaluated by Crohn’s disease endoscopic index of severity (CDEIS), endoscopic response defined as a decrease in the CDEIS score of more than 5 from the baseline of CDEIS, and complete endoscopic remission defined as CDEIS < 3 [21]. For patients with small bowel CD, CTE was examined and radiologic indicators of inflammation in the intestine have emerged as measurements of drug efficacy. For patients with anal fistula, the MRE and PDAI scores were used to evaluate the patient’s condition.
The clinical response is defined as the reduction of CDAI score by 70 points at week 12 [20, 22]. The PNR was defined as having no clinical benefit during the first 12 weeks since the initiation of therapy of ADA [22, 23]. LOR was defined as patients who initially responded to therapy with subsequent worsening of symptoms. LOR is assessed by a multidisciplinary team of experienced experts as a relapse or deterioration after an initial clinical response, based on the following approach: (1) increases in CDAI ≥ 50 from the minimum observed value, and inflammatory marker (CRP > 8.0 mg/L) [23, 24]; (2) indicated the need for therapy modification, including dose escalation, alternative biological agents, corticosteroids, combinations of immunosuppressive therapy, or CD-related surgery [22, 23].
AEs including anaphylaxis, injection-site reactions (rule out physical injury from injection), serious infections requiring hospitalization, psoriasiform skin lesions, neuropathy, and malignancies were assessed. Other drug reactions were excluded. For patients who discontinued ADA because of AEs, details of the AE leading to discontinuation were recorded. Except for those patients who continued commercially available ADA after the end of the study, patients were contacted 10 weeks after the last dose of the drug to assess any new or ongoing AEs.
2.4. Statistical Analysis
All analyses were implemented in SPSS software (version 18.0; SPSS, Inc.) and Python 3.5.2 (Python Software Foundation, Wilmington, DE, USA).
2.4.1. Data Analysis
Figure 1 shows the proposed framework, which summarizes the methods applied in the statistical analysis, model building, and interpretation. Continuous variables are expressed as mean with standard deviation or median and interquartile range [IQR] as needed. We utilized Student’s t-test to analyze data with normal distributions, nonparametric tests for data without normal distributions, and the Chi-square test to compare enumeration data. The statistical significance of the results was evaluated using a p value with a significance level of 0.05. Then, we trained classification models to assess the patient’s risk of LOR and AEs.

2.4.2. Data Preparation and Processing
We preprocessed the dataset using the baseline data and a part of the post-induction treatment data. Before model building, data preparation and data preprocessing were performed before the application of ML models.
During the data preparation process, we used stratified random sampling to split 75% of the dataset for training and 25% for testing, which provides an unbiased sense of model effectiveness. In the whole dataset, variables with more than 30% missing data were indeed excluded. For other variables with missing data, we applied the Multiple Imputation by Chained Equations (MICE) algorithm to supplement the missing data in the training set and testing set, respectively. Excluding all cases with missing values could potentially introduce bias to the results. MICE is a practical approach to generating imputations based on a set of imputation models, one for each variable with missing values [25]. MICE’s analysis of multiple imputed data during the process of filling in missing values takes into account the uncertainty in the imputations and yield accurate standard errors. Consequently, this approach allows for minimal impact on overall data results when filling in missing data. A total of 9 values (10.46%) were imputated in the training set and 3 (10.34%) were imputated in the test set.
As the number of patients with PNR and LOR was lower compared to the control group, there is an obvious imbalance in the data between groups. In order to avoid overfitting and address the issue of data imbalance, we employed the synthetic minority oversampling technique (SMOTE) during the model training process, which has also been used in our previous studies [26]. To achieve a desired ratio between the majority and minority classes, SMOTE creates synthetic instances by interpolating m instances of the minority class that are sufficiently close to each other (where m is a given integer value) [27]. This approach addresses the effects of model bias due to data imbalance while having no impact on the real test set data.
To prevent excessive interference with the original data’s variable relationships during the SMOTE process, we compared different upsampling ratios and decided to upsample the LOR group data to 80% of the control group in the LOR analysis, adding 10 items to the LOR data, and upsample the AE group data to 50% of the control group in the AE analysis, adding 8 items to the AEs data, to avoid overfitting the model. All SMOTE steps were performed on the training set only.
2.4.3. Building Models for Evaluating the Factors
We included the 10 algorithms to build a classification model, aimed to simulate the data for the prediction of relevant factors, which contains the following algorithms: logistic regression (LR), Ridge Regression (RR), Random Forests (RF), Classification and Regression Tree (CART), Extremely Randomized Trees (Extra Tree), Gradient Boosting Decision Tree (GBDT), Extreme Gradient Boosting (XGBoost), Support Vector Machines (SVM), Multi-Layer Perceptron (MLP), and K-Nearest Neighbor (KNN). These 10 algorithms are widely applicable in the domain of clinical data analysis and are applicable to different data types. Among them, RF, CART, Extra Tree, GBDT, and XGBoost are decision tree models. The SVM is a discriminative algorithm, which directly correlates relationships between data. MLP is a class of feedforward artificial neural network. KNN is a type of instance-based learning, where all computations are deferred until prediction time. LR and RR are well-established conventional statistical methods that will serve as a baseline for comparison with other ML algorithms.
Similar to our previous study, we employed the recursive feature elimination (RFE) algorithm to rank the variables based on their importance on the training set data [26, 28]. All algorithms use the default parameters. Following variable reduction based on importance, the remaining variables were introduced into the corresponding ML algorithm to select the best set of variables for modeling. To accomplish this, we used k-fold cross-validation, with k = 10, to train and cross-validate our model, and identified the highest area under the receiver-operating characteristic curve (AUC) variable combination as the final result. Next, the automatic tuning of hyperparameters was performed in the model using the scikit-learn GridSearchCV method (scikit-learn GridSearchCV).
The models were evaluated based on multiple metrics, including accuracy, precision (positive predictive value (PPV)), recall (sensitivity), F1-score, Brier score, and AUC. The AUC was used to measure discrimination, while calibration was evaluated using the Brier score. By comparing the performance of the models on the test dataset, we identified the best-performing model in terms of prediction.
The scikit-learn package library (version 0.22.2) was used for all automatic tuning of hyperparameters, RFE, and the models except for the XGBoost model, which was created using the xgboost package library (version 1.1.1).
2.4.4. Model Interpretation
The best-performing model in the test set, as determined by the highest AUC, was selected for further analysis. Additionally, the top-performing models in the training set were included in the analysis for evaluation purposes. While it is possible to observe the variables that have a significant impact on the model, determining the relationship between the variables and the results can be challenging. Hence, we employed the SHAP method for further interpretation. SHAP is a widely adopted method for the interpretation of nonlinear black box models [29, 30]. The SHAP algorithm incorporates all the original data in the model into the analysis, and the SHAP values were calculated and presented using the SHAP Python package (version 0.29.1).
PDPs were constructed in the important SHAP value variables to show how individual predictor variables can affect the probability of certain results while controlling for all of the other predictor variables. The predictive value of each variable was assessed by evaluating accuracy, AUC, PPV, and negative predictive value (NPV).
In addition, because the PNR group data size was so small that using ML algorithms for modeling would have resulted in over-fitting, and maybe fail to uncover important variables associated with PNR occurrence. Therefore, LR algorithms were used to analyze variables that were statistically different (p < 0.05) within the PNR data, combined with PDP plots and predictive value to briefly explain the correlation between the different variables and PNR occurrence.
2.5. Ethical Issues
We assert that all procedures contributing to this work comply with the ethical standards of the relevant national and institutional committees on human experimentation and with the Helsinki Declaration of 1975, as revised in 2008. The research was approved by the Ethics Committee of the Second Xiangya Hospital of Central South University (NO. LYF2021168). This study is a retrospective study using medical record review, and the data are anonymous; therefore, informed consent was waived. The Ethics Committee of the Second Xiangya Hospital of Central South University has waived informed consent for this study.
3. Results
3.1. Characteristics of CD Patients Included in This Study
A total of 114 patients were included in this study, of which 3 patients discontinued the induction therapy due to AEs and 111 patients completed primary induction therapy. Out of the total study population, 55 patients (48.2%) had a CDAI score below 220. Among these 55 patients, 15 had anal fistula, 19 switched from IFX treatment, 9 switched from corticosteroids after having undergone anal fistula surgery prior to their CD diagnosis, 5 switched from corticosteroids with small bowel lesions only, 4 had a CDEIS score greater than 10 and were assessed by gastroenterologists as having moderate CD, 2 had B3 behavior, and 1 had a history of treatment with Etanercept for ankylosing spondylitis. Six patients discontinued maintenance therapy for personal reasons. Sixty-two patients remained on maintenance therapy at the end of the follow-up (Figure 1). As shown in Table 1, the median (interquartile) age of the included patients was 25.5 [20, 33], and 75.4% (86/114) of them were male. The median follow-up of the cohort was 22 months [14, 26], and the median illness duration of patients was 2.0 years [0.5, 5]. Sixty-eight patients (59.6%) had CD-related complications such as abdominal fistula, abdominal abscess, intestinal obstruction, perianal fistula, and so on. Thirty-one patients (27.2%) were treated with IFX for their CD, prior to ADA therapy. According to the Montreal classification, the study population had the largest number of patients with types A2 (83/114), L3 (76/114), and B1 (45/114); meanwhile, 7 patients with small bowel lesions and 28 patients with perianal fistulizing behavior were included in this study. The overall mean CDAI at the baseline of the study population was 194.5 ± 92.2, and the median CDEIS was 5.5 [1.6, 10.1].
Baseline characteristics | |
---|---|
Total, n | 114 |
Male gender | 86 (75.4%) |
Age (years) | 25.5 [20, 33] |
Age of onset (years) | 23 [17, 29] |
Illness duration (years) | 2.0 [0.5, 5.0] |
Height (cm) | 170.0 [162.0, 175.0] |
Weight (kg) | 55.3 ± 11.5 |
BMI (kg/m2) | 19.0 [17.3, 21.5] |
Smoking history, yes | 25 (21.9%) |
Previous surgery, yes | 51 (44.7%) |
Extraintestinal manifestations | 28 (24.6%) |
Complications, yes | 68 (59.6%) |
Previous treatments | |
5-Aminosalicylates no. | 2 (1.8%) |
Corticosteroids no. | 24 (21.1%) |
Immunomodulators no. | 41 (36%) |
Infliximab no. | 31 (27.2%) |
Vedolizumab no. | 1 (0.9%) |
Other biological | 1 (0.9%) |
Baseline laboratory test | |
Leukocytes (∗109/L) | 6.5 ± 2.1 |
Hemoglobin (g/L) | 125.1 ± 23.4 |
Platelets (∗109/L) | 306 [234.5, 391.5] |
Hematocrit (%) | 39.2 [35.6, 44.1] |
C-reactive protein (mg/L) | 13.5 [4.2, 27.4] |
ESR (mm/h) | 17.0 [7.5, 39.5] |
Albumin (g/L) | 37.9 ± 6.9 |
Creatinine (μmol/L) | 65.0 [53.8,75.3] |
Uric acid (μmol/L) | 372.8 ± 93.7 |
Montreal (age of onset), no. | |
A1 | 24 (21.1%) |
A2 | 83 (72.8%) |
A3 | 7 (6.1%) |
Montreal (location), no. | |
L1 | 19 (16.7%) |
L2 | 7 (6.1%) |
L3 | 76 (66.7%) |
L4 | 0 (0.0%) |
L1 + L2 | 9 (7.9%) |
L3 + L4 | 3 (2.6%) |
Montreal (behavior), no. | |
B1 | 45 (39.5%) |
B2 | 30 (26.3%) |
B3 | 6 (5.3%) |
B2 + B3 | 5 (4.4%) |
B1 + P | 21 (18.4%) |
B2 + P | 6 (5.3%) |
B3 + P | 1 (0.9%) |
Baseline CDAI | 194.5 ± 92.2 |
Baseline CDEIS | 5.5 [1.6,10.1] |
Time of follow-up (months) | 22 [14, 26] |
3.2. Response to Induction Therapy
A total of 111 patients completed primary induction therapy. Eight of these patients (7.2%) showed a PNR in this cohort. Comparing the variables of patients in the PNR and primary response groups, statistical differences were found between the two groups in terms of the history of corticosteroid use, baseline platelets, CRP level, and perianal fistulizing behavior (Table 2). Moreover, patients in the two groups did not differ statistically significantly in terms of illness duration, previous use of IFX, whether ADA was their first-line treatment, age at onset, and location. Some of the patients in the cohort were treated with a combination of immunosuppressive therapy by experienced physicians, depending on their individual conditions. Notably, all patients with colon lesions only and those treated with a combination of immunosuppressive therapy completed the induction therapy without PNR; however, no significant differences in these variables were observed between the PNR and primary response groups with the size of the enrolled population.
Variable | Response (n = 103) | PNR (n = 8) | p value |
---|---|---|---|
Male gender | 79 (76.7%) | 5 (62.5%) | 0.367 |
Age (year) | 27.1 ± 9.6 | 29.5 ± 10.2 | 0.498 |
Age of onset (year) | 23 [17, 28] | 23.5 [18.5, 29.8] | 0.732 |
Illness duration (years) | 2.0 [0.5, 5.0] | 3.5 [1.3, 9.8] | 0.215 |
Height (cm) | 168.7 ± 8.3 | 165.0 ± 12.2 | 0.420 |
Weight (kg) | 55.1 ± 10.9 | 55.7 ± 19.5 | 0.896 |
BMI (kg/m2) | 19.0 [17.2, 21.5] | 19.0 [17.1, 21.7] | 0.950 |
Smoking history | 23 (22.3%) | 2 (25%) | 0.862 |
Previous surgery | 47 (45.6%) | 3 (37.5%) | 0.656 |
Extraintestinal manifestations | 26 (25.2%) | 1 (12.5%) | 0.418 |
Complications | 62 (60.2%) | 5 (62.5%) | 0.898 |
Previous treatments | |||
Corticosteroids no. | 19 (18.4%) | 4 (50.0%) | 0.034∗ |
Immunomodulators no. | 36 (35.0%) | 5 (62.5%) | 0.120 |
Infliximab no. | 28 (27.2%) | 3 (37.5%) | 0.531 |
Vedolizumab no. | 0 (0%) | 1 (12.5%) | 0.001∗ |
Baseline laboratory test | |||
Leukocytes (∗109/L) | 6.2 ± 2.1 | 7.7 ± 3.1 | 0.061 |
Hemoglobin (g/L) | 126.2 ± 23.4 | 117.3 ± 23.6 | 0.305 |
Platelets (∗109/L) | 305.0 ± 120.9 | 507.5 ± 252.2 | 0.001∗ |
Hematocrit (%) | 42.7 ± 38.2 | 42.0 ± 21.9 | 0.807 |
C-reactive protein (mg/L) | 13.5 [4.1, 24.3] | 46.5 [11.4, 77.7] | 0.047∗ |
ESR (mm/h) | 23.5 ± 22.0 | 38.0 ± 26.0 | 0.104 |
Albumin (g/L) | 38.0 ± 6.7 | 35.3 ± 10.2 | 0.335 |
Creatinine (μmol/L) | 63.9 ± 19.1 | 59.5 ± 14.9 | 0.616 |
Uric acid (μmol/L) | 329.1 ± 93.2 | 292.7 ± 110.9 | 0.404 |
Montreal (age of onset), no. | 0.626 | ||
A1 | 23 (22.3%) | 1 (12.5%) | |
A2 | 75 (72.8%) | 7 (87.5%) | |
A3 | 5 (4.9%) | 0 (0%) | |
Montreal (location), no. | 0.894 | ||
L1 | 18 (17.5%) | 1 (12.5%) | |
L2 | 6 (5.8%) | 0 (0%) | |
L3 | 68 (66.0%) | 6 (75%) | |
L4 | 0 (0%) | 0 (0%) | |
L1 + L2 | 8 (7.8%) | 1 (12.5%) | |
L3 + L4 | 3 (2.9%) | 0 (0%) | |
Montreal (behavior), no. | 0.140 | ||
B1 | 41 (39.8%) | 2 (25%) | |
B2 | 28 (27.2%) | 1 (12.5%) | |
B3 | 6 (5.8%) | 0 (0%) | |
B2 + B3 | 5 (4.9%) | 0 (0%) | |
B1 + P | 18 (17.5%) | 3 (37.5%) | |
B2 + P | 4 (3.9%) | 2 (25%) | |
B3 + P | 1 (1.0%) | 0 (0%) | |
Perianal fistulizing | 23 (22.3%) | 5 (62.5%) | 0.024∗ |
ADA as first treatment | 75 (72.8%) | 5 (62.5%) | 0.531 |
Baseline CDAI | 191.1 ± 93.7 | 226.4 ± 85.9 | 0.305 |
Baseline CDEIS | 5.9 [1.6, 10.1] | 7.2 [2.2, 17.6] | 0.299 |
Concomitant medication no. | 17 (16.5%) | 0 (0%) | 0.212 |
Adverse events no. | 26 (25.2%) | 2 (25.0%) | 0.988 |
- ∗represents p < 0.05.
3.3. Maintenance Therapy and Secondary Loss of Response
After induction therapy, a total of 103 patients were admitted to maintenance treatment. Six patients were found to have stopped treatment for personal reasons, mainly due to self-perception of significant symptom reduction. As of January 2023, 62 patients remained on ADA maintenance therapy, and 35 (36.1%) patients discontinued treatment due to LOR. As shown in Table 3, the median time on ADA medication for patients with LOR was 7.0 months [6.0, 12.0], and the median time to sustained response for the follow-up population was 25.0 months [22.25, 30.75], which were statistically different.
Variable | Ongoing response (n = 62) | LOR (n = 35) | p value |
---|---|---|---|
Male gender | 47 (75.8%) | 26 (74.3%) | 0.868 |
Age (year) | 27.4 ± 10.2 | 27.3 ± 9.1 | 0.967 |
Age of onset (year) | 22 [17, 28.8] | 24 [17, 29] | 0.877 |
Illness duration (years) | 2.0 [0.5, 6.0] | 2.0 [0.6, 5.0] | 0.571 |
Height (cm) | 168.6 ± 8.8 | 168.3 ± 8.0 | 0.890 |
Weight (kg) | 54.9 ± 11.4 | 55.2 ± 10.6 | 0.891 |
BMI (kg/m2) | 19.2 [17.1, 21.2] | 19.1 [17.6, 21.7] | 0.804 |
Smoking history | 16 (25.8%) | 4 (11.4%) | 0.093 |
Previous surgery | 35 (56.5%) | 9 (25.7%) | 0.003∗ |
Extraintestinal manifestations | 17 (27.4%) | 9 (25.7%) | 0.856 |
Complications | 38 (61.3%) | 21 (60.0%) | 0.901 |
Previous treatments | |||
Corticosteroids no. | 10 (16.1%) | 9 (25.7%) | 0.292 |
Immunomodulators no. | 19 (30.6%) | 16 (45.7%) | 0.187 |
Infliximab no. | 12 (19.4%) | 15 (42.9%) | 0.018∗ |
Vedolizumab no. | 0 (0%) | 0 (0%) | — |
Baseline laboratory test | |||
Leukocytes (∗109/L) | 6.1 ± 2.0 | 6.5 ± 1.8 | 0.327 |
Hemoglobin (g/L) | 127.4 ± 22.7 | 124.2 ± 24.6 | 0.547 |
Platelets (∗109/L) | 310.4 ± 120.0 | 289.5 ± 117.2 | 0.432 |
Hematocrit (%) | 39.8 [34.7, 43.5] | 39.2 [35.7, 45.2] | 0.705 |
C-reactive protein (mg/L) | 9.4 [3.2, 26.1] | 13.6 [7.2, 29.4] | 0.213 |
ESR (mm/h) | 11.5 [7.0, 28.8] | 18.0 [10.0, 45.0] | 0.017∗ |
Albumin (g/L) | 38.7 ± 6.1 | 37.3 ± 7.2 | 0.331 |
Creatinine (μmol/L) | 63.2 ± 21.8 | 64.8 ± 15.3 | 0.734 |
Uric acid (μmol/L) | 343.5 ± 103.9 | 303.3 ± 74.5 | 0.070 |
Montreal (age of onset), no. | 0.618 | ||
A1 | 16 (25.8%) | 6 (17.1%) | |
A2 | 43 (69.4%) | 27 (77.1%) | |
A3 | 3 (4.8%) | 2 (5.7%) | |
Montreal (location), no. | 0.508 | ||
L1 | 12 (19.4%) | 5 (14.3%) | |
L2 | 3 (4.8%) | 3 (8.6%) | |
L3 | 38 (61.3%) | 25 (71.4%) | |
L4 | 0 (0%) | 0 (0%) | |
L1 + L2 | 6 (9.7%) | 2 (5.7%) | |
L3 + L4 | 3 (4.8%) | 0 (0%) | |
Montreal (behavior), no. | 0.783 | ||
B1 | 25 (40.3%) | 14 (40.0%) | |
B2 | 17 (27.4%) | 10 (28.6%) | |
B3 | 4 (6.5%) | 2 (5.7%) | |
B2 + B3 | 2 (3.2%) | 3 (8.6%) | |
B1 + P | 12 (19.4%) | 4 (11.4%) | |
B2 + P | 2 (3.2%) | 2 (5.7%) | |
B3 + P | 0 (0%) | 0 (0%) | |
Perianal fistulizing | 14 (22.6%) | 6 (17.1%) | 0.608 |
ADA as first treatment | 50 (80.6%) | 20 (57.1%) | 0.018∗ |
Duration of ADA | 25.0 [22.25, 30.75] | 7.0 [6.0, 12.0] | 0.001∗ |
Baseline CDAI | 172.0 ± 98.0 | 222.3 ± 75.6 | 0.011∗ |
CDAI at 3 months | 17.6 [6.4, 41.8] | 73 [22.3, 160.9] | 0.001∗ |
Baseline CDEIS | 5.9 [1.6, 10.1] | 7.2 [2.2, 17.6] | 0.073 |
Concomitant medication no. | 10 (16.1%) | 7 (20.0%) | 0.630 |
Adverse events no. | 15 (24.2%) | 9 (25.7%) | 0.868 |
- ∗represents p < 0.05.
Comparing patients in the LOR group with those who maintained ADA therapy revealed statistical differences in the history of abdominal surgery, history of IFX use, baseline ESR, whether ADA was the patient’s first-line medication, baseline CDAI, and CDAI after 3 months of ADA therapy. Patients who developed LOR had a higher rate of abdominal surgery history, IFX uses history, ADA as non-first-line agent, and disease extent as manifested by higher levels of ESR, baseline CDAI, and CDAI after 3 months of ADA therapy. Unlike the PNR group, there were no statistically significant differences in the history of corticosteroids use, baseline platelets, CRP level, or perianal fistulizing behavior. In addition, there was no significant difference between the two groups who used immunosuppressive combination therapy. Furthermore, there was no statistically significant difference in the proportion of AEs.
3.4. Patients With AEs
During treatment, 27.2% (31/114) of patients experienced at least one AEs. Of these events, the most common AEs were skin allergy with an incidence of 12.3% (14/114), followed by upper respiratory tract infection (8/114, 7.0%) and leukopenia (6/114, 5.3%) (Table 4).
AEs | N (% whole cohort) | Rate of the total IAEs (%) |
---|---|---|
Total | 31 (27.2%) | 100 |
Respiratory infection | 8 (7.0%) | 25.8 |
Skin allergy | 14 (12.3%) | 45.2 |
Leukopenia | 6 (5.3%) | 19.4 |
Abnormal liver function | 6 (5.3%) | 19.4 |
Headache and dizziness | 3 (2.6%) | 9.7 |
Arthralgia | 1 (0.9%) | 3.2 |
Psoriasis | 1 (0.9%) | 3.2 |
Tuberculosis infection | 1 (0.9%) | 3.2 |
Neurologic adverse effects | 1 (0.9%) | 3.2 |
Treatment withdrawal due to IAEs | 3 (2.6%) | 9.7 |
Three patients failed to complete the induction therapy due to serious AEs, including one case of secondary tuberculosis, one case of optic neuritis diagnosed by visual field loss during drug administration, and another case exhibiting persistent systemic allergic reactions that were ineffective with antiallergic treatment and improved after discontinuation.
Supporting Table 1 compares the baseline information of patients with and without AEs. Statistical results revealed no significant differences in each baseline variable between the two groups. There was also no statistically significant difference in the occurrence of AEs in patients with PNR, LOR, and response.
3.5. Predictive Models
Based on the data from the patients, we built the corresponding prediction models of LOR and AEs. These models were set up to search for the best-fitting model (i.e., the strongest predictive model). Since the data for PNR occurred in only eight persons (7.2%), the data size is too small and the model is prone to overfitting, even if the up-sampling method such as SMOTE is used, so the prediction model for PNR occurred was not established in this study for the analysis of related factors. Only variables that were statistically different were analyzed for their relationship with the occurrence of PNR using the LR algorithm. After dividing the data by random stratification, we compared the distribution plots of the training and test sets. The results showed no significant differences in the demographic features, LOR rate, and AEs rate within the two sets (Supporting Information Figures 1 and 2).
3.5.1. Prediction Model for LOR Occurred
A total of 58 variables were initially included in the model building by incorporating baseline data and disease activity score data at 3 months. RFE algorithm was performed to rank and select variables, and the final optimal combination of variables for each of the 10 models was obtained. All variable rankings and hyperparameter results are presented in Supporting Information (Tables 2 and 4). The performance of all LOR models is shown in Table 5. In the training set, the extra tree model had significantly better AUC (0.8967 ± 0.097), accuracy (0.8167 ± 0.115), precision (0.7967 ± 0.314), F1-score (0.7241 ± 0.266), and Brier score (0.183 ± 0.019). However, the performance of this model in the testing set is mediocre (AUC = 0.6319). Since this study plans to find the relationship between variables and LOR within the ML model, it is necessary to choose an approach that balances model fit and complexity for the analysis. In the test set, the highest AUC, which was taken as a global index of discrimination capacity, was for the KNN model (AUC = 0.7396) and was accompanied by the highest recall (0.6667). Therefore, the KNN model consisting of 9 variables was selected for the next step of model analysis.
Model | Variable number | Accuracy | Precision | Recall | F1-score | Brier score | AUC | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Training | Testing | Training | Testing | Training | Testing | Training | Testing | Training | Testing | Training | Testing | ||
LR | 50 | 0.7431 ± 0.129 | 0.44 | 0.695 ± 0.207 | 0.2727 | 0.6917 ± 0.261 | 0.3333 | 0.6785 ± 0.207 | 0.3 | 0.2441 ± 0.113 | 0.4814 | 0.7654 ± 0.124 | 0.4444 |
RR | 33 | 0.7306 ± 0.124 | 0.52 | 0.6449 ± 0.149 | 0.3333 | 0.725 ± 0.277 | 0.3333 | 0.6735 ± 0.205 | 0.3333 | 0.2891 ± 0.145 | 0.3398 | 0.8188 ± 0.081 | 0.5764 |
RF | 23 | 0.7306 ± 0.121 | 0.64 | 0.695 ± 0.299 | 0.5 | 0.65 ± 0.329 | 0.3333 | 0.6202 ± 0.264 | 0.4 | 0.1993 ± 0.021 | 0.218 | 0.8154 ± 0.117 | 0.6597 |
CART | 3 | 0.7042 ± 0.153 | 0.84 | 0.7667 ± 0.307 | 0.8571 | 0.4917 ± 0.254 | 0.6667 | 0.5724 ± 0.252 | 0.75 | 0.2092 ± 0.1 | 0.1833 | 0.7783 ± 0.171 | 0.736 |
ET | 21 | 0.8167 ± 0.115 | 0.64 | 0.7967 ± 0.314 | 0.5 | 0.7 ± 0.284 | 0.3333 | 0.7241 ± 0.266 | 0.375 | 0.183 ± 0.019 | 0.225 | 0.8967 ± 0.097∗ | 0.6319 |
GBDT | 9 | 0.7333 ± 0.099 | 0.6 | 0.7467 ± 0.185 | 0.4545 | 0.675 ± 0.248 | 0.5556 | 0.6684 ± 0.145 | 0.5 | 0.1988 ± 0.077 | 0.3106 | 0.8150 ± 0.136 | 0.6806 |
XGBoost | 19 | 0.7556 ± 0.094 | 0.64 | 0.7317 ± 0.1547 | 0.5 | 0.7333 ± 0.258 | 0.1111 | 0.7008 ± 0.159 | 0.1818 | 0.1933 ± 0.089 | 0.2421 | 0.8054 ± 0.139 | 0.7222 |
SVM | 9 | 0.5944 ± 0.143 | 0.6 | 0.5754 ± 0.202 | 0.4667 | 0.6833 ± 0.307 | 0.7778 | 0.5703 ± 0.172 | 0.5833 | 0.2405 ± 0.021 | 0.2285 | 0.6641 ± 0.135 | 0.6493 |
MLP | 22 | 0.6458 ± 0.173 | 0.6 | 0.6379 ± 0.28 | 0.4444 | 0.6167 ± 0.245 | 0.4444 | 0.5944 ± 0.205 | 0.4444 | 0.2697 ± 0.104 | 0.2928 | 0.7046 ± 0.134 | 0.6528 |
KNN | 9 | 0.6333 ± 0.095 | 0.64 | 0.61 ± 0.156 | 0.5 | 0.675 ± 0.285 | 0.6667 | 0.591 ± 0.138 | 0.5714 | 0.2363 ± 0.049 | 0.2123 | 0.7081 ± 0.132 | 0.7396∗ |
- Note: The best performance is determined by the highest AUC in testing set. The best performance model is represented in boldface and ∗. ET: Extremely Randomized Trees.
- Abbreviations: CART, Classification, and Regression Tree; GBDT, Gradient Boosting Decision Tree; KNN, K- Nearest Neighbor; LR, Logistic Regression; MLP, Multi-Layer Perceptron; RF, Random Forests; RR, Ridge Regression; SVM, Support Vector Machines; XGBoost, Extreme Gradient Boosting.
3.5.2. Prediction Model for AEs Occurred
A total of 47 variables were initially included in the model building by incorporating baseline data, and the results of variable ranking by the RFE and model hyperparameterization are presented in Supporting Information Table 3 and 4. The best performer in the training set is XGBoost, with the best performing AUC (0.7559 ± 0.178), accuracy (0.7333 ± 0.105), precision (0.6333 ± 0.296), recall (0.4583 ± 0.272), F1-score (0.5005 ± 0.245), and Brier score (0.1825 ± 0.066). However, similar to the previous LOR model building, there is a problem of model overfitting, and the best performing model in the testing set is the extra tree model with best AUC = 0.7113, accuracy = 0.7586, precision = 0.6, recall = 0.375, F1-score = 0.4615, and Brier score = 0.1899 (Table 6). Thus, the extra tree model based on nine variables was selected for the next analysis.
Model | Variable number | Accuracy | Precision | Recall | F1-score | Brier score | AUC | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Training | Testing | Training | Testing | Training | Testing | Training | Testing | Training | Testing | Training | Testing | ||
LR | 9 | 0.6356 ± 0.105 | 0.7586 | 0.2333 ± 0.327 | 0.6 | 0.2 ± 0.267 | 0.375 | 0.2105 ± 0.282 | 0.4615 | 0.2503 ± 0.054 | 0.2442 | 0.5456 ± 0.19 | 0.5774 |
RR | 17 | 0.6278 ± 0.143 | 0.5517 | 0.4417 ± 0.398 | 0.1429 | 0.2917 ± 0.233 | 0.125 | 0.3276 ± 0.258 | 0.1333 | 0.2726 ± 0.094 | 0.2611 | 0.5198 ± 0.216 | 0.4167 |
RF | 12 | 0.72 ± 0.088 | 0.7241 | 0.5167 ± 0.391 | 0.5 | 0.3167 ± 0.241 | 0.125 | 0.3833 ± 0.281 | 0.2 | 0.1908 ± 0.031 | 0.1983 | 0.7494 ± 0.179 | 0.631 |
CART | 21 | 0.7222 ± 0.106 | 0.6207 | 0.585 ± 0.354 | 0.3333 | 0.475 ± 0.346 | 0.375 | 0.4790 ± 0.281 | 0.3529 | 0.2480 ± 0.098 | 0.3793 | 0.6941 ± 0.22 | 0.5446 |
ET | 9 | 0.6422 ± 0.117 | 0.7586 | 0.45 ± 0.279 | 0.6 | 0.4583 ± 0.344 | 0.375 | 0.4105 ± 0.238 | 0.4615 | 0.2215 ± 0.052 | 0.1899 | 0.7472 ± 0.188 | 0.7113∗ |
GBDT | 7 | 0.6878 ± 0.06 | 0.6897 | 0.55 ± 0.35 | 0.4 | 0.325 ± 0.259 | 0.25 | 0.3667 ± 0.2 | 0.3077 | 0.2274 ± 0.055 | 0.2133 | 0.7450 ± 0.176 | 0.6607 |
XGBoost | 15 | 0.7333 ± 0.105 | 0.6207 | 0.6333 ± 0.296 | 0.2 | 0.4583 ± 0.272 | 0.125 | 0.5005 ± 0.245 | 0.1538 | 0.1825 ± 0.066 | 0.2693 | 0.7559 ± 0.178∗ | 0.4821 |
SVM | 2 | 0.6678 ± 0.065 | 0.6897 | 0.35 ± 0.391 | 0.3333 | 0.1583 ± 0.16 | 0.125 | 0.21 ± 0.212 | 0.1818 | 0.2082 ± 0.013 | 0.2079 | 0.6643 ± 0.19 | 0.5595 |
MLP | 17 | 0.6033 ± 0.135 | 0.7586 | 0.35 ± 0.391 | 0.6 | 0.1583 ± 0.16 | 0.375 | 0.2133 ± 0.218 | 0.4615 | 0.2795 ± 0.096 | 0.2321 | 0.4966 ± 0.194 | 0.5774 |
KNN | 9 | 0.6778 ± 0.082 | 0.7241 | 0.4 ± 0.49 | 0.5 | 0.1249 ± 0.155 | 0.125 | 0.19 ± 0.234 | 0.2 | 0.2032 ± 0.049 | 0.2336 | 0.7264 ± 0.206 | 0.3571 |
- Note: The best performance is determined by the highest AUC in the testing set. The best performance model is represented in boldface and ∗. ET: extremely randomized trees.
- Abbreviations: CART, classification and regression tree; GBDT, gradient boosting decision tree; KNN, K-nearest neighbor; LR, logistic regression; MLP, multilayer perceptron; RF, random forests; RR, ridge regression; SVM, support vector machines; XGBoost, extreme gradient boosting.
3.6. Model and Risk Factors Analysis
The SHAP approach and PDP variable correlation analysis were introduced in this study to analyze the variables and the occurrence of PNR, LOR, and AEs in several dimensions. SHAP is a model-agnostic explanation technique derived from cooperative game theory and was used to interpret the ML model. SHAP allows for the analysis of variables from both global and local perspectives. PDP analysis, on the other hand, reflects the relationship between individual variables and the occurrence of related events, and it combines with the accuracy, AUC, PPV, and NPV of each variable to represent the predictive value.
3.6.1. Factors Associated With the Occurrence of LOR
Firstly, the LOR occurrence prediction model is analyzed using SHAP, and the importance of the variables is shown in Figure 2(a). The three most important variables are baseline CDAI, uric acid, and CDAI at 3 months of ADA treatment (3M CDAI). Patients with higher baseline CDAI (CDAI score > 190.1), lower uric acid (uric acid < 329.0 μmol/L), and higher 3M CDAI (3M CDAI score > 55.4) indicated a higher probability of LOR. Also, note that some of the data points with high baseline CDAI and low uric acid fall in the high incidence range of LOR, while all data points with high 3M CDAI are in the range of high LOR incidence, suggesting that a high 3M CDAI is more effective in predicting the occurrence of LOR, which is also consistent with the data that the 3M CDAI was much higher in the LOR group than in the response group (17.6 [6.4, 41.8] vs73 [22.3, 160.9], p < 0.01). Other variables in the model were also associated with the occurrence of LOR; however, combined with Figure 2(b) shows that most of the discriminations of the data rely on the top five variables, and the top three of them play a major role.




The contribution of variables in individual data was further analyzed using SHAP (Figure 2(c)); we analyzed the data of the four cases, including true-negative, true-positive, false-positive, and false-negative prediction data. It can be found that for predicting the occurrence of LOR, the level of uric acid, which with a significant proportion of the determination weight, does not represent absolute discriminatory efficacy because uric acid also varies in false-positive and false-negative data. Similarly, a high baseline CDAI does not definitely indicate the occurrence of LOR. However, a high 3M CDAI is more significant for the occurrence of LOR.
Further PDP analysis of the first five variables of the model revealed that when baseline CDAI increased, the probability of LOR increased, after baseline CDAI higher than 300, the probability no longer increases significantly. The probability of LOR decreased in patients with uric acid higher than 275 μmol/L and increased in patients with 3M CDAI higher than 50 (Figure 2(d)). Compared to the previous three variables, CRP and ESR do not have a relatively large effect on the probability of LOR occurrence, although the overall trend increases with increasing CRP and ESR; the confidence interval (blue area) still contains the red baseline (Supporting Information Figure 3). When comparing the predictive value of individual variables in the test set, it was found that 3MCDAI had the highest AUC of 0.677. However, it should be noted that this value was still lower than the efficacy of the KNN model. Additionally, CRP and ESR demonstrated a lower predictive value when compared to the other three variables (Supporting Information Table 5).
3.6.2. Factors Associated With the Occurrence of AEs
The AE prediction model constructed from nine variables is then analyzed. SHAP analysis found that the variables included in the model had similar predictive values for the occurrence of AEs, and the three top variables of this list were uric acid, PLT, and CRP (Figure 3(a)). The majority of the low uric acid data (uric acid < 372.8 μmol/L) predicted a lower probability of AEs; the different levels of other variables data points fall in the middle or on either side of the occurrence of AEs. Together with Figure 3(b), we can obtain that all nine variables contribute to the prediction of outcome, while the first five variables are more correlated with AEs. Local analysis of single cases also further demonstrated the predictive efficacy of relatively low levels of uric acid for the nonoccurrence of AEs (Figure 3(c)). PDP analysis was performed and found that the probability of AEs increased with increasing levels of uric acid. Moreover, at PLT less than 250∗109/L, the probability of AEs decreases as PLT increases; when it exceeded 250∗109/L, the probability of AEs relatively increased. CRP showed a similar pattern, with the probability of AEs decreasing as CRP increased and an increase in the relative probability at CRP above 20 mg/dL (Figure 3(d)). This trend is corroborated by the similar changes in WBC (Supporting Information Figure 3C), an inflammation-related index, suggesting it may be possible that AEs are less likely to occur in low-grade or noninflammatory states relative to other states. The individual predictive values of the above variables in the test set were low, indicating poor predictive power. Among them, uric acid demonstrated the highest predictive value with an AUC of 0.612, while the other variables all exhibited AUCs below 0.6 (Supporting Information Table 5).




3.6.3. Factors Associated With the Occurrence of PNR
During the analysis of factors associated with the occurrence of PNR, the variables that differed between the PNR group and non-PNR group were the history of corticosteroid use, perianal fistulizing behavior, PLT, and CRP. The regression equations were obtained through the logistic regression as follows:
P = 1/[1 + e−(−11.070 + 4.1662X1 + 1.7844X2 + 0.0135X3 + 0.0086X4)] (X1, Corticosteroids history; X2, perianal fistulizing behavior; X3, PLT; X4, CRP)
As shown in the risk factor analysis figure and PDP plots (Figure 4), patients with a history of corticosteroid use, perianal fistulizing behavior, elevated PLT, and elevated CRP had an increased risk of PNR, with an increased probability of 6347.100%, 495.6%, 1.3%, and 0.8%, respectively. All individual variables exhibit strong predictive performance across the entire dataset, with the PLT achieving an accuracy of 92.86% and an AUC of up to 0.767 (Supporting Information Table 5). However, considering the sample size, it is necessary to validate in external data to form a comprehensive assessment.


4. Discussion
ADA has been widely used in Western countries as a first-line treatment for CD. Nevertheless, this biologic has only been approved for CD treatment in China since 2020. Compared to IFX, studies on the response to ADA treatment and related risk factors in the Chinese population are still relatively lacking. To date, there have been several observational studies on the ADA in Asian populations [12, 14–16]. However, the majority of these studies have mainly focused on the evaluation of ADA efficacy or comparison with the effects of IFX. Studies have been rarely conducted on the analysis of therapy failure or AEs in the treatment of ADA. This study explored therapy poor efficacy events on ADA treatment in a cohort of 114 CD patients. Among the cohort, 54.4% of the patients remained on ADA maintenance therapy, 7% of them experienced PNR, 30.7% of them experienced LOR, and 27.2% of them experienced AEs. Analysis revealed that the previous use of corticosteroids, higher baseline platelets, higher CRP level, and perianal fistulizing behavior were associated with the occurrence of PNR. Following the development of the KNN method-based model, the possible correlates of factors and LOR were analyzed by the SHAP method, which in turn led to the baseline CDAI, uric acid, and CDAI at 3 months most associated with the occurrence of LOR. Based on the extra-tree model, uric acid, PLT, and CRP were the three most important factors associated with the occurrence of AEs.
In reviewing the literature, patient demographics in this study were broadly similar to those of patient populations in Asian CD studies [12–16]. While comparing Western studies, the patients in this study had a lower BMI and lower smoking rates, which is related to the racial differences between the Asian and Western patient groups [31, 32]. However, in terms of the incidence of therapy failure events, the performance was roughly similar among different regions. Research in this area has shown that up to 40% of CD patients do not respond or lose response after the initial benefit from anti-TNF agents [33]. Of this, around 25%–30% of the population does not respond to treatment during the induction phase [34]. So far, the incidence of PNR in the studies of ADA was approximately 20%–44%, with the reported occurrence of PNR in Asian populations ranging from approximately 24.3% to 30% [16, 34–36]. The percentage of patients who experienced a PNR event after ADA use in our study was 7%. The difference from previous research was considered to be related to the sample size and the fact that this study was retrospective. Most randomized controlled clinical studies have strict criteria for inclusion, such as the severity of disease and previous biologics use history. The clinical practice has found that some patients with mild to moderate disease may also need biologics therapy, such as a combination of anal fistulas, or corticosteroids resistance. Moreover, some patients who faced LOR or intolerance to first-line biologics preferred to switch to other kinds of biologics in the early course of LOR. In the present study, 48.2% (55/114) of patients treated with ADA were in mild disease activity or remission state (CDAI ≤ 220), which may lead to a lower-level incidence of PNR in our study. Besides, statistical differences were found between the PNR and non-PNR groups in terms of the history of corticosteroid use, baseline platelets, CRP level, and perianal fistulizing behavior. This indicates a high correlation of ADA efficacy with previous medication and disease behavior. They are also the main features in the assessment of medication strategies for patients with CD. At the same time, no PNR occurred in the population treated with the combination therapy of immunomodulators in this study. Currently, there is controversy as to whether immunomodulator combination therapy can reduce the incidence of PNR and LOR in ADA. Combination therapy of the IFX with azathioprine has been shown to be superior to either treatment alone in patients with CD [37]. However, a meta-analysis of combination therapy between ADA and immunomodulators found only marginally greater benefits than monotherapy with ADA [38]. And in the CALM study, a strategy of starting ADA as monotherapy escalated to combined immunosuppression had been successfully explored [39]. Reinisch et al. suggested considering stopping any concomitant immunomodulator during the first 6–12 months of treatment in patients achieving stable remission [34]. As of now, it appears that there may be a chance of some benefit from adding immunosuppressive therapy to ADA treatment, but the exact strategy needs further investigation. In our study, it was observed that none of the patients who received immunosuppressive therapy experienced PNR, but the incidence of LOR was not significantly different. Although the sample size was small, these findings suggest that combining immunosuppression may offer some short-term benefits.
LOR, another cause of treatment failure, is a common issue encountered by many biologics when used to treat CD. According to the report data, LOR is estimated to occur in approximately 13%–40% of CD patients per year [40]. In our 32-month study, 30.7% of the cohort occurred LOR, which was in line with the previous Asian study [36]. Moreover, the median duration of treatment for patients in the LOR group in the present study was 7 months, which was a significant difference compared to 25 months in the maintenance treatment group. These results are in keeping with previous observational studies, in which approximately half of the patients who develop LOR occur in the first year [16]. However, the median time of LOR in our study was shorter than in other studies, possibly due to the fact that 27% of the ADA patients included in our study had a history of IFX use [41]. Interestingly, the ML combined with SHAP analysis revealed that baseline CDAI, uric acid, and CDAI at 3 months were associated with the occurrence of LOR in this study. Among them, the higher the baseline CDAI (CDAI score > 190.1) and 3M CDAI (3M CDAI score > 55.4), the higher the probability of LOR, and LOR occurred in all patients with high 3M CDAI. A CDAI score of ≥ 150 is typically regarded as an indicative of active disease, whereas a decrease of more than 100 points or a score below 150 following therapy would signify clinical remission of CD. Our study found that in treating patients with CD, despite the absence of PNR following ADA administration and a reduction in CDAI score to below 150 points, those with a 3M CDAI score above 55.4 remained at an elevated risk of developing LOR. This also suggests that CD patients with high levels of disease activity before ADA treatment have a higher probability of LOR, and that the initial 3-month response to ADA is suggestive of overall treatment efficacy. More active clinical follow-up monitoring of this group is needed, with early switching of treatment if necessary. It is noteworthy that the higher the uric acid in this study, the lower the probability of LOR. Patients in this study with baseline uric acid greater than 275 μmol/L, which is a normal high value, had a relatively lower probability of LOR. The intestine is one of the uric acid excretion organs, but so far, there are a few clinical studies on uric acid changes in CD patients. Previous studies all suggest the presence of altered uric acid levels in CD patients, but there is no consistent conclusion on the direction of changes [42–44]. Chiaro et al. reported that DSS-induced experiment colitis mice do have altered intestinal uric acid metabolism and it has been suggested that the dysfunction of intestinal epithelium disrupts uric acid excretion [42]. Therefore, uric acid may be related to the inflammatory state of the intestine in CD patients; however, uric acid is related to renal function, gender, diet, and nutritional status, so the relationship between uric acid and disease activity and LOR in patients with CD needs further study in the future. Previous studies have identified a high BMI as a risk factor for LOR; however, the limited sample size and lower BMI in Chinese patients compared to those in Western countries may have contributed to the lack of significant difference in this study [45].
AE, as one of the major factors influencing the maintenance of ADA therapy in CD patients, is an efficacy feedback event valued by clinicians. The incidence of AEs in the phase III clinical study of ADA in China was 37%, and the percentage of AEs occurring in this study was 27.2% [12]. The incidence of adverse reactions was generally consistent compared to other studies. In contrast to IFX, which requires intravenous administration, ADA only needs subcutaneous injection and allows patients to complete maintenance therapy in a non-hospital setting. But, at the same time, monitoring for AEs may not be as timely as IFX. Thus, the knowledge of factors that increase the relative risk of AEs in our study can help guide patients’ maintenance therapy. In the present study, high baseline uric acid, PLT, CRP, and WBC were associated with the occurrence of AEs. Notably among them, PLT, CRP, and WBC, which respond to the degree of inflammation in patients, all showed a decrease in the probability of AEs at low levels and a relative increase at high levels, suggesting that CD patients with an underlying non- or low inflammatory state have a lower probability of AEs. Alleviating the inflammatory status of CD patients before ADA use may benefit the subsequent maintenance therapy.
A highlight of this study was the inclusion of an ML approach to analyze the study data, whereas KNN and extra tree-based models were obtained by comparing the modeling effects of 10 models. The SHAP approach was used to analyze the risk factors associated with the occurrence of LOR and AEs in patients with ADA use. The relationship between the variables was visualized in this study. Until now, SHAP has been used in a lot of studies and has proven to be effective [46–49]. In addition, for most clinical research, the amounts of data are relatively small when compared to the large numbers of variables involved. The introduction of ML methods can help address this challenge and thus maximize overall data retention. Numerous studies on big data have consistently exhibited the advantages of ML algorithms. Additionally, there is an increasing number of instances where ML algorithms are being applied in studies with small sample sizes [50–53]. Theoretical research has indicated that ML algorithms outperform traditional statistical algorithms when employed on high-quality datasets comprising more than 110 samples, thereby revealing deeper insights into potential internal relationships [54–57]. During the initial stages of the COVID-19 epidemic, ML algorithms effectively summarized the characteristics of small sample cases from various regions, thereby contributing to subsequent disease studies [58]. Consequently, incorporating ML algorithms into this research introduces new avenues for exploring and extracting insights from our small sample data. This study uses the MICE and SMOTE methods to expand the amount of data and improve the validity and consistency of model building while retaining the intrinsic relationships of the original data. Although there may be a difference between the ML simulation data and the real data, the conclusions obtained can still provide guidance to ADA treatment in CD. Another advantage of ML is that most models can be updated in real time and can reflect nonlinear and complex variable relationships. As the underlying database expands, the models can be continually optimized for simulation learning to better reveal the intrinsic connections.
A number of limitations need to be noted regarding the present study. The first limitation was that, because it is a single-center study, the sample size was too small and the characteristics of the data among different populations themselves were not well presented. Secondly, this was a retrospective study that there was some recall bias and failed to obtain meaningful test results such as fecal calprotectin and antidrug antibodies and also failed to capture the dynamics of variables change over the course of treatment. Due to the limited amount of data and the retrospective nature of this study, there were insufficient prospective data to validate externally. Thus, it would be more meaningful to proceed with a larger prospective experimental study in order to further examine the results and inform clinical practice. Despite these limitations, the conclusions obtained through ML methods are instructive and can direct subsequent research.
In conclusion, our study confirms the effectiveness of ADA in the treatment of Chinese patients with CD and highlights the factors associated with the occurrence of PNR, LOR, and AEs. These findings have clinical implications for the selection of treatment options for CD patients. It is important to consider the increased risk of poor outcomes with ADA in patients with a history of corticosteroid use, high levels of disease activity, and high inflammatory state, which can help guide the clinical application of ADA.
Conflicts of Interest
The authors declare no conflicts of interest.
Author Contributions
Xiaojun Li and Maomao Tang contributed equally to this work and should be considered co-first authors.
Funding
This work was supported by the National Natural Science Foundation of China (grant number: 81470802).
Acknowledgments
We thank and acknowledge Dr. Wenchen Dong (University College London) and Dr. Jun Chao (Hunan Aicortech Intelligent Research Institute Co.) for software assistance and Dr. Mengyuan Qi for linguistic assistance.
Supporting Information
Supporting file 1.pdf: Table 1: Univariate analyses for variables in the no AE group and AE group. Table 2: Ranking of variable importance in LOR based on recursive feature elimination (RFE) feature reduction strategies. Table 3: Ranking of variable importance in AEs based on recursive feature elimination (RFE) feature reduction strategies. Table 4: Optimization results of hyperparameters for each model. Table 5: Predictive value of each variable. Figure 1: Distribution of ongoing response-secondary loss of response, gender, and age of patients in the training set and test set in LOR model. Figure 2: Distribution of adverts events, gender, and age of patients in the training set and test set in AE model. Figure 3: Partial dependence plots in KNN-based LOR model and extra-tree-based AE model.
Open Research
Data Availability Statement
The data that support the findings of this study are available from the corresponding author upon reasonable request.