Volume 2025, Issue 1 8930012
Review Article
Open Access

Artificial Intelligence–Based Psychotherapeutic Intervention on Psychological Outcomes: A Meta-Analysis and Meta-Regression

Ying Lau

Corresponding Author

Ying Lau

The Nethersole School of Nursing , Faculty of Medicine , The Chinese University of Hong Kong , Hong Kong , China , cuhk.edu.hk

Search for more papers by this author
Wei How Darryl Ang

Wei How Darryl Ang

Alice Lee Centre for Nursing Studies , Yong Loo Lin School of Medicine , National University of Singapore , Singapore , Singapore , nus.edu.sg

Global Nursing Research Centre , Graduate School of Medicine , University of Tokyo , Tokyo , Japan , u-tokyo.ac.jp

Search for more papers by this author
Wen Wei Ang

Wen Wei Ang

Alice Lee Centre for Nursing Studies , Yong Loo Lin School of Medicine , National University of Singapore , Singapore , Singapore , nus.edu.sg

Search for more papers by this author
Patrick Cheong-Iao Pang

Patrick Cheong-Iao Pang

Faculty of Applied Sciences , Macao Polytechnic University , Macau , China

Search for more papers by this author
Sai Ho Wong

Sai Ho Wong

Department of Nursing , Alexandra Hospital , Singapore , Singapore , ah.com.sg

Search for more papers by this author
Kin Sun Chan

Kin Sun Chan

Department of Government and Public Administration , University of Macau , Macau , China , umac.mo

Search for more papers by this author
First published: 27 January 2025
Academic Editor: Diogo Lamela

Abstract

Background: Artificial intelligence (AI)–based psychotherapeutic interventions may bring a new and viable approach to expanding psychiatric care. However, evidence of their effectiveness remains scarce. We evaluated the efficacy of AI-based psychotherapeutic interventions on depressive, anxiety, and stress symptoms at postintervention and follow-up assessments.

Methods: A three-step comprehensive search via nine electronic databases (PubMed, Embase, CINAHL, Cochrane Library, Scopus, IEEE Xplore, Web of Science, PsycINFO, and ProQuest Dissertations and Theses) was performed.

Results: Thirty randomized controlled trials (RCTs) in 31 publications involving 6100 participants from nine countries were included. The majority (79.1%) of trials with intention-to-treat analysis but less than half (48.6%) of trials with perprotocol analysis were graded as low risk. Meta-analyses showed that interventions significantly reduced depressive symptoms at the postintervention assessment (t = −4.40, p = 0.001) with medium effect size (g = −0.54, 95% CI: −0.79 to −0.29) and at 6–12 months of assessment (t = −3.14, p < 0.016) with small effect size (g = −0.23, 95% CI: −0.40 to −0.06) in comparison with comparators. Our subgroup analyses revealed that the depressed participants had a significantly larger effect size in reducing depressive symptoms than participants with stress and other conditions. At postintervention and follow-up assessments, we discovered that AI-based psychotherapeutic interventions did not significantly alter anxiety, stress, and the total scores of depressive, anxiety, and stress symptoms in comparison to comparators. The random-effects univariate meta-regression did not identify any significant covariates for depressive and anxiety symptoms at postintervention. The certainty of evidence ranged between moderate and very low.

Conclusions: AI-based psychotherapeutic interventions can be used in addition to usual treatments for reducing depressive symptoms. Well-designed RCTs with long-term follow-up data are warranted.

Trial Registration: CRD42022330228

1. Introduction

The World Health Organization (WHO) found that approximately 1 billion people worldwide struggle with some form of psychological problems, which are the leading cause of years lived with disability [1]. Between 1990 and 2019, the global number of disability-adjusted life years due to mental disorders increased from 80.8 to 125.3 million over 20 years [2]. Many of these people with mental disorders are not receiving treatment due to a shortage of therapists, stigmatization, transport costs, and expensive consultation fees [3, 4]. During the coronavirus disease of 2019 (COVID-19) pandemic, an additional 53.2 million cases of major depressive disorders and 76.2 million cases of anxiety disorders were found globally [5]. A systematic review found that the global prevalence of depression, anxiety, and stress among the general population during the COVID-19 pandemic ranged from 25.18% to 29.57% [6]. This pandemic has created an increased urgency to consider the accessibility of psychotherapy due to restrictions and lockdowns [7]. The WHO found that mental health interventions are insufficient and inadequate globally [1], prompting the utilization of new technology to meet the needs.

In parallel with the advancements in artificial intelligence (AI) technology, psychotherapy has begun to incorporate AI techniques for creating psychotherapeutic interventions [8, 9]. This human–computer interaction technology is believed to be intelligent enough to comprehend the conversation between a patient and a chatbot therapist based on machine learning (ML) algorithms [4, 10]. Applications could help prevent, treat, and prevent relapses in behavioral and psychiatric issues [11]. According to Bendig et al. [11], AI chatbots, also known as conversational or relational agents, are machine conversation systems that interact with human users using various AI technologies. Responses can be generated using a rule-based model (predefined rules or decision tree), natural language processing (NLP), or ML through text-based or speech-enabled conversations [4, 7]. AI chatbots try to talk like humans, including the emotional, social, and relational parts of natural conversation [11]. They do this to imitate a therapeutic conversational style that can help users transfer therapeutic content and mirror therapeutic processes [7, 12].

According to Boucher et al. [7] and Vaidyam et al. [4], AI chatbots are thought to possess sufficient intelligence to comprehend conversations with human users using written, spoken, and visual language through an interactive interface. Some scientists have developed an avatar, a computer-generated character, as an embodied conversational agent in an intervention aimed at improving usability and intention to use [13]. An embodied agent can emulate some human interactions, including gaze, speech, hand gestures, and other nonverbal modalities [13]. Different platforms, including websites, mobile applications, short message services, virtual reality, and smart technology, can integrate AI chatbots to perform various functions such as therapy, counseling, monitoring, engagement, adherence, or psychoeducation [7].

Psychotherapeutic interventions can use AI chatbots and tailor them to specific populations [4]. This innovative approach may improve the shortage of therapists and engagement in therapy [4, 13]. AI-based psychotherapeutic interventions offer several advantages when used, including lowering the stigma associated with therapy, fostering a comfortable environment for self-disclosure, being cost-effective, reducing travel time, eliminating geographical restrictions, freeing up human resources, and broadening overall accessibility [4, 10]. Hence, AI-based psychotherapeutic interventions may offer a potential solution to overcome barriers and expand psychiatric care.

Given that AI-based psychotherapeutic intervention is an emerging field, different types of systematic review have been found, including three integrative reviews [4, 7, 10], four scoping reviews [8, 11, 14, 15], one mixed-method review [16], and three systematic reviews [9, 17, 18].

Boucher et al. [7] highlighted the potential integration of AI-based chatbots into digital mental health interventions. Pham, Nabizadeh, and Selek [10] described different AI-based interventions and their clinical practices. Vaidyam et al. [4] explored the roles of conversational agents, or chatbots in the screening, diagnosis, and treatment of mental illness. The integrative review suggested that an AI-supported intervention could increase engagement, even though its therapeutic effect was not reported enough [4]. The integrative review proposed that an AI-supported intervention could enhance engagement, despite the underreporting of its therapeutic effect [4]. According to these integrative reviews [7, 10], future research should focus on utilizing randomized controlled trials (RCTs) to investigate the efficacy of AI psychotherapeutic interventions.

A mixed-method review [16] aimed to evaluate the use of conversational agent interventions in the treatment of mental health problems. The scoping reviews were supposed to look at how chatbots have been developed and used in public health [15], how they are used for mental health [8, 14], and how useful, acceptable, and practicable they are in clinical psychology and psychotherapy [11]. Results regarding the practicability, feasibility, and acceptability of AI-supported intervention for mental problems were promising [14, 16], but there was a lack of consensus on reporting and evaluation for chatbots [8] and no direct transferability to psychotherapeutic context [11]. Hence, more reviews are required to demonstrate its efficacy [14].

We found three systematic reviews [9, 17, 18] relating to the efficacy of AI-based psychotherapeutic intervention in existing literature. Gual-Montolio et al. [9] aimed to use AI-based methods to enhance outcomes in psychological interventions in real-time or close to real-time. Li et al. [17] and Lim et al. [18] examined the feasibility and/or effectiveness of AI-based psychotherapeutic interventions. However, these reviews had certain limitations. These included relying on a limited number of databases [9], combining AI-based and non-AI-based interventions [18], only providing a narrative synthesis [9, 17], focusing only on depressive symptoms as an outcome [18], and incorporating different research designs [9].

Emerging evidence has shown that AI-based psychotherapeutic intervention may improve psychological outcomes. However, relatively few reviews have investigated the long-term effects of interventions. To fill this gap, the current review aims to evaluate the efficacy of AI-based psychotherapeutic interventions on depressive, anxiety, and stress symptoms at postintervention and follow-up assessments.

2. Material and Method

This systematic review was reported following the preferred reporting items for systematic reviews and meta-analyses (PRISMA) (Table S1) [19].

2.1. Eligibility Criteria

Given that RCTs are considered the gold standard for evaluating the effectiveness of interventions [20], only RCTs were included in this review. The population targeted adults aged ≥18 years old with or without medical, psychological, and behavioral problems. The intervention used a conversational (chatbot) interface to deliver any form of psychotherapy with self-guided or therapist support incorporating AI technology. Response generation contained rule-based or other AI technologies. Input and output modalities involved written, spoken, visual, or emoji. The presentation could use either an embodied or nonembodied chatbot. The comparator included treatment as usual, waitlist, placebo control, or any type of intervention. The psychological outcomes included depressive, anxiety, and stress symptoms at postintervention and follow-up assessments. No restrictions were imposed on the population and publication date. This review included published and unpublished trials in the English language [21]. The details of the eligibility criteria can be found in Table S2.

2.2. Search Strategy

A scoping search for existing systematic reviews with similar aims was conducted in the Cochrane Database of Systematic Reviews, Joanna Briggs Institute, and the PROSPERO database to prevent any duplication. An iterative process was used to develop the search terms. An initial keyword search comprising “artificial intelligence” AND “psychological outcomes” was used to conduct a simple search. After the inclusion of potential articles, the search terms and keywords were revised in consultation with a university librarian. The eventual search terms comprising both keywords and index terms for the respective databases can be found in Table S3.

Following the development of the search terms, a three-step search [22] was conducted from inception to February 9, 2023. First, a search was conducted in nine English electronic databases (PubMed, Embase, CINAHL, Cochrane Library, Scopus, IEEE Xplore, Web of Science, PsycINFO, and ProQuest Dissertations and Theses) to locate relevant articles. Second, a search for unpublished trials was conducted in three clinical trial registries (ANZCTR, ISRCTN, and CenterWatch), and an email was sent to all corresponding authors to obtain information about their trials. Finally, a hand search of the reference lists of the selected studies and gray literature was conducted to maximize the search. We contacted the authors via email for additional data when the information included in their publication was insufficient.

2.3. Study Selection

EndNote X20 was used to manage the retrieved citations from the search. Duplicates were removed using automated and manual functions. Two authors (W.W. and S.H.W.) screened all the articles by title and abstract, with reference to the eligibility criteria. When disagreements occurred, a third author (L.Y.) was consulted. Inter-rater reliability was measured using Cohen’s kappa (κ), l, with −1 suggesting an absence of agreement, and 1 indicating perfect agreement [23]. Values greater than 0.75 were considered excellent agreement, whereas values between 0.40 and 0.75 were considered good agreement [24].

2.4. Data Management and Extraction

The data extraction form was designed with reference to the Cochrane Handbook [22]. Two reviewers (W.W. and S.H.W.) extracted all the data independently. The data elements extracted included trial characteristics (number of studies, author, publication year, country, recruitment setting, design, nature of participants, mean age, gender distribution, AI-based psychotherapeutic intervention, name, comparator, sample size, psychological outcomes, measures, attrition rate, intention-to-treat analysis [ITT], missing data management [MDM], protocol, trial registration, and grant support), description of the intervention (intervention content, type of AI chatbot, psychological principle, duration of the intervention, follow-up assessment, frequency of use, and mean amount of time engagement in minutes), and psychological outcomes (depressive, anxiety, and stress symptoms) at postintervention and follow-up assessments (mean, standard deviation, and total numbers).

2.5. Risk of Bias Version 2 (RoB 2.0)

The Cochrane risk of RoB 2.0 [25] was used to appraise the methodological quality of all included studies. Risk of bias was performed via an Excel tool to implement RoB 2.0 by two independent reviewers (W.W. and D.A.). The risk of bias was evaluated against the following five domains of bias: (1) randomization process, (2) deviations from intended intervention, (3) missing outcome data, (4) measurement of the outcome, and (5) selection of the reported result [25]. Two reviewers responded to signaling questions in each domain to select the options of “yes,” “probably yes,” “probably no,” “no,” or “no information.” The RoB 2.0 algorithmic tool rates the risk of bias as “low,” “high,” or “some concerns” [25].

2.6. Certainty of Evidence

The grading of recommendations, assessment, development, and evaluation (GRADE) criteria was used to assess the overall certainty of evidence [26]. To determine the certainty of evidence, two reviewers (D.A. and L.Y.) independently evaluated the studies based on the following domains: risk of bias, inconsistency, indirectness, imprecision, and effect. The ratings were classified as very low, low, moderate, or high, and the decision was determined based on justifications [26]. Publication bias was determined using the Egger regression test [27] and funnel plot of precision using standardized mean difference [28]. Publication bias was ascertained using a p-value of less than 0.05 from the Egger test and asymmetrical funnel plot [29].

2.7. Data Synthesis

We used the meta [30] and metaphor [31] packages of R software to conduct the meta-analysis, subgroup analysis, and meta-regression analysis. Prediction interval (PI) was used based on t-distribution (t) to predict a range of true effects for future trials with similar settings [32]. A 95% PI was used to estimate the 95% probability that the next trial will be contained within this range. A statistically significant effect is expected for a future trial if all values of the 95% PI are on the same side of the null of 0, whereas an insignificant effect is expected if all values are on both sides of the null of 0 [32]. Hedges’ g was used to communicate the effect size because of its precision for studies with small sample sizes [33, 34]. Random-effects model was used to assume that the observed estimates of treatment effect can vary across studies [35]. Restricted maximum likelihood method was used as the estimator for random-effect meta-analysis to provide unbiased estimates [36]. Hartung–Knapp adjustment for random-effects models was selected to prevent counterintuitive effects [37]. A 95% confidence interval was used to communicate the precision of the summary estimate and derive the p-value [22].

Heterogeneity was assessed using Cochran Q test and I2 values [22]. A p-value of <0.01 indicated heterogeneity. The extent of heterogeneity was quantified using I2 values [22]. A Cochran Q test p-value of <0.01 and I2 > 50% indicated heterogeneity [22]. Additional subgroup and meta-regression analyses were conducted to explore the reasons for heterogeneity [22].

Subgroup analyses were conducted based on the predetermined groups based on the nature of participants (depression ± others, stress/distress ± others, other condition, or healthy), age groups (18–30, 31–40, 41–50, or >50), type of AI-based chatbot (Deprexis vs. others), different comparator (passive vs. active), type of psychotherapy (cognitive behavioral therapy [CBT] vs. others), type of platforms (Internet vs. others), response generation (rule-based vs. NLP), and embodiment (yes vs. no) use of ITT/MDM (yes vs no), and protocol publication/trial registration (yes vs. no). Significant subgroup differences were determined based on the Q statistic with a subgroup effect of p < 0.1 [38]. Meta-regression analyses were conducted to examine the effects of potential covariates (publication year, duration of intervention, sample size, attrition rate, and portion of males) on the psychological symptoms. The relationships were expressed using coefficient β, which represents the change in the value of depressive symptoms relative to the unit change in the covariates [39]. A p-value of <0.05 was used to conclude the association between the covariate and outcomes based on effect size [22].

3. Results

The outcomes of the three-step search are shown in Figure 1. A total of 13,521 articles were retrieved from 12 electronic databases and three clinical trial registries. Ten records were found from trial registries and excluded, providing reasons for each exclusion (Table S4). Following the removal of 2389 duplicates, a total of 11,132 articles were screened based on their title and abstract. Twenty-five records were identified from websites, organizations, and citation searching. Fifty-seven articles from both sources were assessed in full text for eligibility. Twenty-six articles were excluded, and their reasons were documented in Table S5. A total of 30 RCTs in 31 publications with a study number ranging from 1 to 30 [12, 4068 ] were included in this systematic review and meta-analysis.

Details are in the caption following the image
Preferred reporting items for systematic reviews and meta-analyses (PRISMA) flow diagram of the article selection process.

3.1. Trial Characteristics

The characteristics of the 30 trials evaluating the effect of AI-based psychotherapeutic interventions involving 6100 participants can be found in Table 1. The trials were published from 2009 [12] to 2022 [70]. They were conducted in Argentina (n = 1) [71], China (n = 2) [70, 72], Germany (n = 10) [11, 12, 26, 39, 40, 42, 48, 5052], Italy (n = 2) [7, 44], Korea (n = 2) [16, 47], Romania (n = 1) [73], the United Kingdom (n = 3) [8, 41, 74], the United States (n = 5) [2, 9, 23, 45, 46], and three countries (n = 1) [43]. The participants were recruited from the community (n = 21) (1–3,5,6,8,9,11–16,19,20–23,25,27,28), clinical setting (n = 6) [11, 16, 26, 39, 43, 47], and a mixture of both (n = 3) [48, 50, 52]. Twenty-five trials adopted a two-arm RCT, three trials [40, 49, 72] adopted a three-arm RCT, one trial [7] adopted a four-arm RCT, and one trial [23] adopted a crossover design. The sample sizes of the trials ranged from 21 [44] to 1013 [48]. Half of them reported follow-up outcomes after postassessment, which ranged from 2 weeks [74] to 12 months [48].

Table 1. Characteristics of selected 30 randomized controlled trials in 31 publications.
Number Author, year Country/recruitment Design Nature of participants (criteria) Mean age (gender portion) AI-based psychotherapeutic intervention (name) Type of comparator (name) Sample size Psychological outcomes (measures) Follow-up Attri rate (%) ITT/MDM Protocol/registry/grant
1. Beevers et al. [40] United States/community Two-arm RCT Adults with depression (QIDS-SR ≥10)
  • 31.91
  • M: 24.7%
  • F: 74.4%
Internet-based intervention (Deprexis)
  • Passive control
  • (Waitlist)
  • T: 376
  • I: 285
  • C: 91
Depressive symptoms (QIDS-SR and HRSD-17) No 20.4 Y/Y N/Y/Ya
  
2. Bennion et al. [41] United Kingdom/community Two-arm RCT Older adults (>50) with emotional distress (NR)
  • 69.21
  • M: 26.8%
  • F: 73.2%
Internet-based conversational agent (Chatbot, MYLO)
  • Active control
  • Internet-based conversational agent (Chatbot, ELIZA)
  • T: 112
  • I: 59
  • C: 53
Depressive, anxiety, and stress symptoms (DASS-21) 2 weeks 12.5 N/N N/N/Ya
  
3. Berger et al. [42] Sweden and Germany/community Three-arm RCT
  • Adults with depression
  • (BDI-II >13, suicide item <2)
  • 38.8
  • M: 30.3%
  • F: 69.7%
  • I1: Internet-based intervention (Deprexis, guided)
  • I2: Internet-based intervention
  • (Deprexis, unguided)
  • Passive control
  • (waitlist)
  • T: 76
  • I1: 25
  • I2: 25
  • C: 26
Depressive symptoms (BDI-II) 6 months 0 Y/Y N/N/Ya
  
4. Berger et al. [43] Germany/clinical Two-arm RCT Adults with depression (BDI-II >13)/ unipolar affective disorder
  • 43.1
  • M: 33.7%
  • F: 66.3%
Internet-based intervention (Deprexis) + psychotherapy Active control (psychotherapy)
  • T: 98
  • I: 51
  • C: 47
Depressive (BDI-II), anxiety (GAD-7) symptoms 6 months 29.6 Y/Y Y/Y/Ya
  
5. Bird et al. [44] United Kingdom/community Two-arm RCT Adults (students and staff in university) with distress
  • 21.3
  • M: 18.4%
  • F: 81.6%
Internet-based conversational agent (Chatbot, MYLO) Active control Internet-based conversational agent (Chatbot, ELIZA)
  • T: 171
  • I: 85
  • C: 86
Depressive, anxiety, and stress symptoms (DASS-21) 2 weeks 0 Y/Y N/N/Ya
  
6. Bücker et al. [45] Germany/community Two-arm RCT Adults with gambling and mood problems (NR)
  • 35.71
  • M: 76.4%
  • F: 23.6%
Internet-based intervention (Deprexis)
  • Passive control
  • (waitlist)
  • T: 145
  • I: 74
  • C: 71
Depressive (PHQ-9), anxiety (GAD-7) symptoms No 47.9 Y/Y Y/Y/Yb
  
7. Burton et al. [46] Romania, Spain, and United Kingdom/clinical Two-arm RCT Adults with major depressive disorder (NR)
  • 38.65
  • M: 33.3%
  • F: 66.7%
Embodied virtual agent-based system (Help4Mood)
  • Passive control
  • (treatment as usual)
  • T: 28
  • I: 14
  • C: 14
Depressive symptoms (BDI-II, QIDS-SR) No 25.0 Y/N N/Y/Ya
  
8. Danieli et al. [47] Italy/community Two-arm RCT Adults with distress, anxiety, and depression (NR)
  • 47.76
  • M: 19.0%
  • F: 81.0%
Mobile-based (TEO) intervention (Chatbot, m-PHA) + SMT-CBT
  • Active control
  • (SMT-CBT)
  • T: 21
  • I: 11
  • C: 10
Depressive and anxiety (SCL-90-R), stress (PSS) symptoms 3 months 0 N/N N/Y/Ya
  
9. Danieli et al. [48] Italy/community Four-arm RCT Older adults stress and anxiety symptoms (NR)
  • 55.58
  • M: 22.0%
  • F: 78.0%
  • I1: Mobile-based (TEO) intervention (chatbot, m-PHA) + SMT-CBT
  • I2: Mobile-based (TEO) intervention (Chatbot, m-PHA)
  • C1: Active control
  • (SMT-CBT)
  • C2: Passive control (waitlist)
  • T: 60
  • I1: 16
  • I2: 14
  • C1: 16
  • C2: 14
Depressive (SCL-90-R and PHQ-8), anxiety (SCL-90-R and GAD-7), and stress (PSS) symptoms 3 months 5 N/N N/Y/Ya
  
10. Fischer et al. [49] Germany/clinical Two-arm RCT Adults with multiple sclerosis and depressive symptoms (NR)
  • 45.28
  • M: 22.2%
  • F: 77.8%
Internet-based intervention (Deprexis) Passive control (waitlist)
  • T: 90
  • I: 45
  • C: 45
Depressive symptoms (BDI) 3 months 21.1 Y/Y N/Y/Ya
  
11. Fitzpatrick, Darcy, and Vierhile [12] United States/community Two-arm RCT Young adults with anxiety and depressive symptoms (NR)
  • 22.2
  • M: 19.0%
  • F: 81.0%
Computer-based/mobile-based intervention (Chatbot, Woebot) Active control (eBook on depression)
  • T: 70
  • I: 34
  • C: 36
Depressive (PHQ-9), anxiety (GAD-7) symptoms No 20.0 Y/Y N/N/Yc
  
12. Fitzsimmons-Craft et al. [50] United States/community Two-arm RCT Female young adults with risk of eating disorders (NR)
  • 21.08
  • M: 0.0%
  • F: 100.0%
Internet-based (StudentBodies) intervention (Chatbot, Tessa) Passive control (waitlist)
  • T: 700
  • I: 352
  • C: 348
Depressive (PHQ-8), anxiety (GAD-7) symptoms 6 months 37.3 Y/Y N/Y/Ya
  
13. Gaffney et al. [51] United Kingdom/community Two-arm RCT Young adults (students in university) with distress
  • 21.4
  • M: 21.4%
  • F: 78.6%
Internet-based conversational agent (Chatbot, MYLO)
  • Active control
  • Internet-based conversational agent (Chatbot, ELIZA)
  • T: 48
  • I: 26
  • C: 22
Depressive, anxiety, and stress symptoms (DASS-21) 2 weeks 10.4 N/N N/N/Ya
  
14. Guțu et al. [52] Romania/community Two-arm RCT Young adults from social media
  • 21.82
  • M: 1.1%
  • F: 98.9%
Computer-based/mobile-based intervention (Chatbot, Woebot) Active control (psychoeducational daily email)
  • T: 212
  • I: 106
  • C: 106
Depression, and anxiety symptoms (DASS-21) No 55.2 Y/Y N/N/NR
  
15. He et al. [53] China/community Three-arm RCT Young adults with depressive symptoms (CSMHSS: 2–3)
  • 18.78
  • M: 62.8%
  • F: 37.2%
Internet-based intervention (Chatbot, XiaoE)
  • C1: Active control (e-book on depression)
  • C2: Active control (Chatbot, Xiaoai)
  • T: 148
  • I: 49
  • C1: 49
  • C2: 50
Depressive symptoms (PHQ-9) 1 month 15.5 Y/Y N/Y/Ya
  
16. Hunt et al. [54] United States/community Crossover trial Adults with IBS (by physician or Rome IV criteria)
  • 32.00
  • M: 24.8%
  • F: 75.2%
Mobile-based intervention (Chatbot, Zemedy) Passive control (waitlist)
  • T: 121
  • I: 62
  • C: 59
Depressive (PHQ-9), and anxiety symptoms (DASS-21) 3 months 28.1 Y/Y N/Y/Yb
  
17. Jang et al. [55] Korea/clinical Two-arm RCT Adults with attention-deficit (ADHD score: 4/6 items)
  • 25.1
  • M: 43%
  • F: 57%
Mobile-based intervention (Chatbot, Todaki) Active control (self-help information of ADHD)
  • T: 46
  • I: 23
  • C: 23
Depressive (QIDS-SR), anxiety (SAS), and stress (PSS) symptoms No 19.6 Y/N N/N/Yc
  
18. Klein et al. [56] Germany/clinical and community Two-arm RCT Adults with depressive symptoms (PHQ-9 : 5–14)
  • 42.9
  • M: 31.4%
  • F: 68.6%
Internet-based intervention (Deprexis)
  • Passive control
  • (waitlist)
  • T: 1013
  • I: 509
  • C: 504
Depressive symptoms (PHQ-9, HDRS-24) 12 months 21.6 Y/N Y/Y/Ya
  
19. Klos et al. [57] Argentina/community Two-arm RCT Young adults (students from university)
  • 18–33
  • M: 12.7%
  • F: 87.3%
Internet-based intervention (Chatbot, Tess) Active control (psychoeducation eBook on affective symptoms)
  • T: 181
  • I: 99
  • C: 82
Depressive (PHQ-9), anxiety (GAD-7) symptoms No 59.7 N/N N/N/NR
  
20. Liu et al. [58] China/community Two-arm RCT Young adults with depressive symptoms (PHQ-9 ≥ 9)
  • 23.08
  • M: 44.6%
  • F: 55.4%
Pipeline-based intervention (Chatbot, XiaoNan) Active control (self-help bibliotherapy intervention)
  • T: 83
  • I: 41
  • C: 42
Depressive (PHQ-9), anxiety (GAD-7) symptoms No 24.1 Y/Y N/N/No
  
21. Ly, Ly, and Andersson [59] Sweden/community Two-arm RCT Young adults (students from universities, website, and social media)
  • 26.2
  • M: 46.4%
  • F: 53.6%
Mobile-based intervention (Chatbot, Shim) Passive control (waitlist)
  • T: 28
  • I: 14
  • C: 14
Stress symptoms (PSS) No 0 Y/N N/N/Ya
  
22. Maeda et al. [60] Japan/community Three-arm RCT Female young adults who want a baby
  • 28.77
  • M: 0%
  • F: 100%
Internet-based intervention (online chatbot for fertility education)
  • C1: Active control (online information related fertility)
  • C2: Passive control (online generic information)
  • T: 927
  • I: 309
  • C1: 309
  • C2: 309
Anxiety symptoms (STAI) No 0 Y/N N/Y/Ya,c
  
23. Meyer et al. [61] Germany/community Two-arm RCT Adults with depressive symptoms (NR)
  • 34.76
  • M: 24%
  • F: 76%
Internet-based intervention (Deprexis)
  • Passive control
  • (waitlist)
  • T: 396
  • I: 320
  • C: 76
Depressive symptoms (BDI) 6 months 45.5 Y/Y N/N/NR
  
24. Meyer et al. [62] Germany/clinical and community Two-arm RCT Adults with depressive symptoms (PHQ-9 : 15 – 27)
  • 42.0
  • M: 25.2%
  • F: 74.8%
Internet-based intervention (Deprexis)
  • Passive control
  • (waitlist)
  • T: 163
  • I: 78
  • C: 85
Depressive (PHQ-9), anxiety (GAD-7) symptoms 6 months 17.8 Y/Y N/Y/Yc
  
25. Moritz et al. [63] Germany/community Two-arm RCT Adults with depressive symptoms (NR)
  • 38.57
  • M: 11.5%
  • F: 78.6%
Internet-based intervention (Deprexis)
  • Passive control
  • (waitlist)
  • T: 210
  • I: 105
  • C: 105
Depressive symptoms (BDI) NR 19.0 Y/N N/Y/Yc
  
26. Oh et al. [64] Korea/clinical Two-arm RCT Adults with panic symptoms (MINI)
  • 40.97
  • M: 48.8%
  • F: 51.2%
Mobile-based intervention (Chatbot, Todaki Active control (book for panic disorder)
  • T: 45
  • I: 23
  • C: 22
Depressive and anxiety symptoms (HADS) No 8.89 N/N N/N/Yc
  
27. Prochaska et al. [65] United States/community Two-arm RCT Adult with substance misuse (CAGE-AID > 1)
  • 40
  • M: 35%
  • F: 65%
Computer-based/mobile-based intervention (Chatbot, Woebot) Passive control (waitlist)
  • T: 180
  • I: 88
  • C: 92
Depressive (PHQ-8), anxiety (GAD-7) symptoms 8 weeks 15.6 Y/N N/Y/NR
  
28. Sandoval et al. [66] United States/community Two-arm RCT Adults with MDD or dysthymic disorder (DSM-IV-TR, PHQ-9 > 9)
  • 28.78
  • M: 37.8%
  • F: 62.2%
Interactive media based, computer-delivered depression treatment program (imbPST)
  • Passive control
  • (waitlist)
  • T: 45
  • I: 25
  • C: 20
Depressive symptoms (BDI-II, HSCL-20-d) No 0 N/N N/N/Ya
  
29. Schroder et al. [67] Germany/clinical and community Two-arm RCT Adults with epilepsy (PESOS) and depressive symptoms (NR)
  • 37.59
  • M: 24.4%
  • F: 75.6%
Internet-based intervention (Deprexis)
  • Passive control
  • (waitlist)
  • T: 78
  • I: 38
  • C: 40
Depressive symptoms (BDI) No 26.9 Y/N N/Y/NR
  
30 Zwerenz et al. [68, 69] Germany/clinical Two-arm RCT Adults with depressive symptoms (BDI-II >13, ICD-10)
  • 47.98
  • M: 39.3%
  • F: 60.7%
Internet-based intervention (Deprexis)
  • Passive control
  • (treatment as usual)
  • T: 229
  • I: 115
  • C: 114
Depression (BDI-II) 6 months 13.5 Y/Y Y/Y/Ya
  • Abbreviations: ADHD, attention-deficit/hyperactivity disorder; ADHD score, Attention-Deficit/Hyperactivity Disorder Self-Rating Scale Version 1.1 regardless psychiatric diagnosis; AI, artificial intelligence; Attri, attrition rate; BDI, Beck Depression Inventory; BDI-II, Beck Depression Inventory-II; C, comparator; CAGE-AID, cut down, annoyed, guilty, eye opener-adapted to included drugs; CSMHSS, College Students Mental Health Screening Scale; DASS-21, Depression, Anxiety, and Stress Scale short form; Deprexis, an Internet-based software platform that provides personalized cognitive behavioral therapy-based support to help improve depression symptoms; DSM-IV-TR, Diagnostic and Statistical Manual of Mental Disorders Text Revision Fourth Edition; ELIZA, a chatbot that mimics a therapist using a humanistic principle; F, female; GAD-7, General Anxiety Disorder 7-item scale; HADS, Hospital Anxiety and Depression Scale; HDRS-17, Hamilton Depression Rating Scale; HDRS-24, Hamilton Depression Rating Scale; Help4Mood, an interactive system with an embodied virtual agent (avatar) to assist in self-monitoring of patients receiving treatment for depression; HSCL-20-d, Hopkins Symptom Checklist 20-Item Depression Scale; I, intervention; IBS, irritable bowel syndrome; ICD-10, International Classification of Diseases 10th Revision); imbPST, interactive media-based, computer-delivered depression treatment program; ITT, intention-to-treat analysis; M, male; MDD, major depressive disorder; MDM, missing data management; MINI, Mini-International Neuropsychiatric Interview; m-PHA, mobile personal health care agent; MYLO, Manage Your Life Online; N, no; NR, not reported; PESOS, an epilepsy-specific inventory, the performance, sociodemographic aspects, subjective estimation; PHQ-8, Patient Health Questionnaire-8-item scale; PHQ-9, Patient Health Questionnaire 9-item scale; PSS, Perceived Stress Scale; QIDS-SR, Quick Inventory of Depressive Symptoms-Self-Report; RCT, randomized controlled trial; ROME IV, ROME IV diagnostic criteria for irritable bowel syndrome; SAS, Self-Rating Anxiety Scale; SCL–90-R, Symptom Checklist−90-Revised; SMT-CBT, stress management training and cognitive behavioral therapy; STAI, State-Trait Anxiety Inventory; T, total; TEO, therapy empowerment opportunity; Xiaoai, a chatbot for small talk with unrestricted content; Y, yes.
  • aGrants were not industry sponsored.
  • bGrants were industry sponsored but declared that the funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
  • cGrants were industry sponsored.

3.2. Description of AI-Based Psychotherapeutic Interventions

Fourteen types of AI-based chatbot were found, including Deprexis (n = 11) [11, 12, 14, 26, 39, 40, 42, 48, 5052], Manage Your Life Online (n = 3) [8, 41, 74], Woebot (n = 3) [2, 39, 73], Therapy Empowerment Opportunity (n = 2) [7, 44], and others. The content of the interventions is described in Table 2. Table S6 provides a summary of 14 different types of AI-based chatbots. The psychological principle was largely grounded in CBT (n = 25), and the area of use was mainly for the treatment of various conditions (n = 26). The main functions included counseling (n = 18), therapy (n = 18), and monitoring (n = 20) via the Internet (n = 18). Most interventions (n = 27) were self-guided and four of them [7, 40, 44, 46] were supported by a therapist. Two trials [40, 46] used therapists for counseling and therapy, while others used a self-help version of an AI-based chatbot. Berger et al. [40, 42] adopted one intervention group with a low-intensity therapist-guided self-help version of Deprexis, and Fitzsimmons-Craft et al. [46, 50] relied on human authoring of conversations via a chatbot (Tessa). Two trials [7, 44] used the mobile personal health care agent (m-PHA) to communicate with patients, but the therapist supervised the m-PHA interactions with the patients. The therapists provided support regarding the events mentioned during the therapy sessions, as well as reviewing the notes and recollections [7, 44]. The duration of intervention ranged from one time [41] to 16 weeks [58]. Most trials did not report the frequency and time of usage. Response generation contained rule-based (n = 16), NLP (n = 14), and other AI technologies. Input and output modalities involved written (n = 30), spoken (n = 3) [43, 70, 72], visual (n = 1) [72], and emojis (n = 1) [71]. Seven embodied chatbots [9, 16, 23, 43, 47, 49, 70] were observed.

Table 2. Description of artificial intelligence-based psychotherapeutic intervention of selected 30 randomized controlled trials in 31 publications.
Number Author, year Name of chatbot Deprexis/other Principle CBT/other Area of use treatment/prevention Function Platform Internet/other Guide Frequency/Duration/Time use Response generation rule-based/other Input Output Embodied Yes/No
1. Beevers et al. [40] Deprexis CBT Treatment of depression
  • -Counselling
  • -Therapy
  • -Monitoring
Internet Self NR/8 weeks/261.6 min Rule-based Written Written No
  
2. Bennion et al. [41] MYLO
  • Method of levels
  • therapy
Treatment of distress -Counselling Internet Self One time/24.17 min Rule-based Written Written No
  
3. Berger et al. [42] Deprexis CBT Treatment of depression
  • -Counselling
  • -Therapy
  • -Monitoring
Internet
  • I1: Self
  • I2: Therapist
  • NR/
  • 10 weeks
  • I1 : 417,
  • I2 : 255
Rule-based Written Written No
  
4. Berger et al. [43] Deprexis CBT Treatment of depression
  • -Counselling
  • -Therapy
  • -Monitoring
Internet Self
  • NR/12 weeks/
  • 599 min
Rule-based Written Written No
  
5. Bird et al. [44] MYLO
  • Method of levels
  • therapy
Treatment of distress -Counselling Internet Self
  • One time/
  • 13 min
Rule-based Written Written No
  
6. Bücker et al. [45] Deprexis CBT Treatment of gambling and mood
  • -Counselling
  • -Therapy
  • -Monitoring
Internet Self
  • NR/
  • 8 weeks/82.6 min
Rule-based Written Written No
  
7. Burton et al. [46] Help4Mood CBT Treatment of depression
  • -Engagement
  • -Adherence
  • -Monitoring
  • Computer
  • app
Self
  • 10.5 time (median) /
  • 4 weeks /
  • 134 min
  • (median)
Natural language processing Written
  • Written
  • Spoken
  • Visual
Yes
  
8. Danieli et al. [47] m-PHA (TEO) CBT Treatment of distress, anxiety, depression -Monitoring
  • Mobile
  • app
Therapist
  • NR/
  • 8 weeks/NR
Natural language processing Written Written No
  
9. Danieli et al. [48] m-PHA (TEO) CBT Treatment of distress, anxiety symptoms -Monitoring
  • Mobile
  • app
Therapist NR/8 weeks /NR Natural language processing Written Written No
  
10. Fischer et al. [49] Deprexis CBT Treatment of depression
  • -Counselling
  • -Therapy
  • -Monitoring
Internet Self NR/9 weeks/332 min Rule-based Written Written No
  
11. Fitzpatrick, Darcy, and Vierhile [12] Woebot CBT Treatment of depression and anxiety
  • -Counselling
  • -Therapy
  • -Monitoring
  • -Engagement
  • -Motivation
  • -Reflection
  • Computer
  • app or
  • mobile
  • app
Self
  • 12.14/
  • 2 weeks/
  • NR
Natural language processing Written Written No
  
12. Fitzsimmons-Craft et al. [50] Tessa CBT Eating disorders
  • -Counselling
  • -Therapy
  • -Healthy eating
  • Internet
  • (X2AI) via SMS or
  • Facebook
  • Messenger
Therapist
  • NR/
  • 12 weeks/
  • NR
Rule-based/algorithm-based Written Written No
  
13. Gaffney et al. [51] MYLO
  • Method of levels
  • therapy
Treatment of distress -Counselling Internet Self One time/19.23 min Rule-based Written Written No
  
14. Guțu et al. [52] Woebot CBT Prevention
  • -Counselling
  • -Therapy
  • -Monitoring
  • -Engagement
  • -Motivation
  • -Reflection
  • Computer
  • app or
  • Mobile
  • app
Self
  • NR/
  • 2 weeks/
  • NR
Natural language processing Written Written No
  
15. He et al. [53] XiaoE CBT Treatment of depression
  • -Counselling
  • -Monitoring
  • -Engagement
Internet (WeChat) Self
  • 25.45/day/
  • 1 week/
  • NR
Natural language processing and deep learning Written
  • Written
  • Spoken
  • Image
No
  
16. Hunt et al. [54] Zemedy CBT Irritable bowel syndrome
  • -Counselling
  • -Education
  • -Therapy
  • -Healthy diet
  • Mobile
  • app
Self
  • 1 per week/
  • 8 weeks
Natural language processing Written Written Yes
  
17. Jang et al. [55] Todaki CBT Attention-deficit/hyperactivity disorder
  • -Self-diagnosis
  • -Education
  • -Therapy
  • -Compliance
  • Mobile
  • app
Self
  • 20.32/
  • 4 weeks/
  • 75 min
Natural language processing Written Written Yes
  
18. Klein et al. [56] Deprexis CBT Treatment of depression
  • -Counselling
  • -Therapy
  • -Monitoring
Internet Self
  • NR /
  • 12 weeks/
  • 497 min
Rule-based Written Written No
  
19. Klos et al. [57] Tess CBT, EFT, SFT, MI Prevention
  • -Reminders
  • -Psychoeducational
  • -Emotional support
  • Internet/app
  • via
  • Facebook
  • Messenger
Self
  • NR /
  • 8 weeks/
  • NR
Natural language processing
  • Written
  • Emojis
  • Written
  • Emojis
No
  
20. Liu et al. [58] XiaoNan CBT Treatment of depression
  • -Emotion assessment
  • -Counselling
  • -Therapy
  • -Monitoring
  • Internet/app (WeChat /
  • IFLYTEK open platform)
Self
  • NR/16 weeks/
  • NR
Natural language processing, intention classification and emotion recognition.
  • Written
  • Spoken
Written Yes
  
21. Ly, Ly, and Andersson [59] Shim CBT, positive psychology Prevention
  • -Reflection
  • -Awareness
  • -Value-based life
Smartphone app Self
  • 17.1/
  • 2 weeks/
  • NR
Rule-based Written Written No
  
22. Maeda et al. [60] Chatbot for fertility education Transtheoretical model Prevention
  • -Education
  • -Counselling
Internet (chat via Google Cloud’s Dialogflow) Self
  • NR/
  • 12 weeks/
  • NR
Natural language processing Written Written Yes
  
23. Meyer et al. [61] Deprexis CBT Treatment of depression
  • -Counselling
  • -Therapy
  • -Monitoring
Internet Self
  • NR/
  • 9 weeks/
  • NR
Rule-based Written Written No
  
24. Meyer et al. [62] Deprexis CBT Treatment of depression
  • -Counselling
  • -Therapy
  • -Monitoring
Internet Self
  • NR/
  • 12 weeks/
  • 457 min
Rule-based Written Written No
  
25. Moritz et al. [63] Deprexis CBT Treatment of depression
  • -Counselling
  • -Therapy
  • -Monitoring
Internet Self
  • NR /
  • 12 weeks/
  • 210 min
Rule-based Written Written No
  
26. Oh et al. [64] Todaki CBT Panic disorder
  • -Checking
  • -Education
  • -Therapy
  • -Exposure training
  • Mobile
  • app
Self
  • NR/
  • 4 weeks/
  • 50 min
Natural language processing Written Written Yes
  
27. Prochaska et al. [65] Woebot CBT Substance use
  • -Counselling
  • -Therapy
  • -Monitoring
  • -Engagement
  • -Motivation
  • -Reflection
  • Mobile
  • app
Self
  • NR/
  • 8 weeks/
  • NR
Natural language processing Written Written No
  
28. Sandoval et al. [66] imbPST Problem-solving therapy Treatment of depression
  • -Problem-solving
  • -Psychoeducation
  • -Monitoring
  • -Engagement
Computer software Self
  • NR/
  • 6 weeks/
  • 4.9 h
Rule-based Written Written Yes
  
29. Schroder et al. [67] Deprexis CBT Treatment of depression
  • -Counselling
  • -Therapy
  • -Monitoring
Internet Self
  • NR/
  • 9 weeks/
  • NR
Rule-based Written Written No
  
30. Zwerenz et al. [68, 69] Deprexis CBT Treatment of depression
  • -Counselling
  • -Therapy
  • -Monitoring
Internet Self
  • NR/
  • 12 weeks/
  • NR
Rule-based Written Written No
  • Abbreviations: App, application; CBT, cognitive behavioral therapy; EFT, emotion-focused therapy; MI, motivational interviewing; NR, not reported; SFT, solution-focused brief therapy.

3.3. Individual Quality Assessment

A total of 30 RCTs were evaluated against the RoB 2.0 criteria (Figure S1). Twenty-three trials used ITT analysis, and seven trials used perprotocol analysis. Low risk of bias rating across five domains was found in the majority (79.1%) of trials with ITT analysis but less than half (48.6%) of trials with perprotocol analysis. Nine trials (30%) did not provide information on allocation. Seventeen trials (56.7%) rated some concerns for deviations from the intended intervention because the participants and personals were aware of the assigned intervention. Table 1 displays that 13 trials (43.3%) did not publish a protocol or did not register in clinical trial registries, so the selection of the reported results had some concerns. The attrition rate ranged from 0% [9] to 59.7% [71]. The majority (80%) of trials received grants from various sources, including 16 nonindustry-sponsored grants, seven industry-sponsored grants, and one trial [45] that included both industry-sponsored and nonindustry-sponsored grants. However, six trials did not report or had no grant support. Even though seven trials mentioned industry-sponsored grants, two trials [23, 42] declared that they were not involved in data analysis.

3.4. Depressive Symptoms

A total of 29 arms of 26 RCTs [2, 7, 9, 11, 12, 14, 16, 23, 26, 3948, 5052, 70, 7274] among 4349 individuals, eight arms of six RCTs [7, 23, 41, 44, 72, 74] among 418 individuals, and eight arms of seven RCTs [11, 12, 26, 40, 46, 48, 50] among 2268 participants evaluated the effects of AI-based psychotherapeutic interventions at the postintervention assessment and 2 weeks to 3 months and 6–12 months of follow-up assessments (Figure 2). The meta-analyses showed that AI-based psychotherapeutic interventions significantly reduced depressive symptoms at the postintervention assessment (t = −4.40, p = 0.001) with medium effect size (g = −0.54, 95% CI: −0.79 to −0.29) and 6–12 months of follow-up assessment (t = −3.14, p < 0.016) with small effect size (g = −0.23, 95% CI: −0.40 to −0.06) compared with the comparators. No differences were observed between the intervention and comparator at 2 weeks to 3 months of follow-up assessment (t = −0.08, p = 0.936).

Details are in the caption following the image
Forest plot of depressive symptoms at postintervention, follow-up at 2 weeks–3 months and 6–12 months for artificial intelligence-based psychotherapeutic interventions and comparators.

The 95% PIs were −1.77 to 0.69, −1.27 to 1.24, and −0.64 to 0.19 for three-time points. Given that the 95% PI contained values on both sides of the null of 0, suggesting that the intervention will predict an insignificant reduction of depressive symptoms in future similar studies. Heterogeneity was substantial (I2 = 70%–85%) for postintervention assessment and 2 weeks to 3 months of follow-up assessment and moderate (I2 = 42%) for 6–12 months of follow-up assessment. Given the presence of substantial heterogeneity at the postintervention assessment, subgroup analyses, and meta-regression analyses were conducted to explore the reasons for heterogeneity.

We conducted subgroup analyses as shown in Table 3 and Figures S2S26. Significant differences (p < 0.1) were found between subgroups based on participants’ nature, age group, embodiment, ITT/MDM, and protocol/registration on reduction of depressive symptoms at three-time points. Trials that were conducted among the participants with depression or depression combined with other health issues had a larger effect on reducing depressive symptoms at postintervention (g = −0.81, 95% CI: −1.16. to −0.45) and follow-up 2 weeks to 3 months (g = −0.64, 95% CI: −2.22. to 0.95) compared with their counterparts. The interventions used embodied chatbot (g = −0.57, 95% CI: −1.17 to 0.03) among those aged 31–40 (g = −0.57, 95% CI: −1.17 to 0.03) using ITT or MDM (g = −0.35, 95% CI: −1.11 to 0.40) had a greater effect on reducing depressive symptoms at follow-up 2 weeks to 3 months than their counterparts. We observed trials with a protocol or registration (g = −0.32, 95% CI: −0.65 to 0.02) have a greater effect on the reduction of depressive symptoms at follow-up 6 months to 12 months than those without a protocol or registration. Hence, between-trial heterogeneity could be partially explained by participant characteristics.

Table 3. Subgroup analyses of AI chatbot on depressive and anxiety symptoms.
Category Subgroups Number of arms Sample size Effect size (g) 95% CI I2 Subgroup difference
Depressive symptoms (postintervention)
Nature of participants Depression ± others 17 2789 −0.81 −1.16, −0.45 86%
  • χ2 = 3.41,
  • p < 0.001∗∗∗
Stress/distress ± others 6 308 0.09 −0.18, 0.35 0%
Other condition 5 1040 −0.34 −0.87, 0.19 76%
Healthy 1 212 −0.06 −0.33, 0.21 NA
Age groups 18–30 years 10 1482 −0.48 −0.94, −0.02 84%
  • χ2 = 3.13,
  • p = 0.371
31–40 years 11 1251 −0.54 −0.77, −0.30 64%
41–50 years 6 1571 −0.69 −1.95, 0.57 94%
>50 years 2 45 −0.24 −2.00, 1.52 0%
Type of AI chatbot Deprexis 13 2719 −0.68 −1.12, −0.24 88%
  • χ2 = 1.09,
  • p = 0.296
Others 16 1630 −0.42 −0.74, −0.09 79%
Different comparators Active control 6 410 −0.64 −2.05, 0.76 95%
  • χ2 = 0.08,
  • p = 0.772
Passive control 23 3939 −0.48 −0.65, −0.32 72%
Type of psychotherapy CBT 26 4103 −0.54 −0.79, −0.29 84%
  • χ2 = 0.00,
  • p = 0.990
Others 3 246 −0.53 −3.46, 2.39 93%
Type of platforms Internet 19 3686 −0.61 −0.93, −0.30 87%
  • χ2 = 0.85,
  • p = 0.356
Others 10 663 −0.37 −0.85, 0.11 78%
Response generation Rule-based 16 3453 −0.65 −1.06, −0.24 90%
  • χ2 = 1.07,
  • p = 0.300
Others 13 896 −0.41 −0.70, −0.12 69%
Embodiment Yes 5 271 −0.59 −1.08, −0.10 50%
  • χ2 = 0.05,
  • p = 0.815
No 24 4078 −0.53 −0.84, −0.23 87%
ITT or MDM Yes 22 3455 −0.60 −0.88, −0.33 85%
  • χ2 = 0.97,
  • p = 0.326
No 7 894 −0.29 −1.00, 0.42 78%
Protocol or registration Yes 15 2534 −0.55 −0.97, −0.12 87%
  • χ2 = 0.02,
  • p = 0.877
No 14 1815 −0.51 −0.84, −0.19 81%
Depressive symptoms (follow-up assessments at 2 weeks to 3 months)
Nature of participants Depression ± others 2 106 −0.64 −2.22, 0.95 0%
  • χ2 = 38.76,
  • p < 0.001∗∗∗
Stress/distress ± others 5 267 0.32 0.04, 0.59 0%
Other condition 1 45 −0.57 −1.17, 0.03 NA
Age groups 18–30 years 4 307 −0.17 −1.01, 0.68 78%
  • χ2 = 10.78,
  • p = 0.013∗∗
31–40 years 1 45 −0.57 −1.17, 0.03 NA
41–50 years 1 21 0.99 0.07, 1.90 NA
>50 years 2 45 0.29 −1.82, 2.40 0%
Different comparators Active control 5 301 0.17 −0.43, 0.78 56%
  • χ2 = 1.65,
  • p = 0.199
Passive control 3 117 −0.37 −1.94, 1.19 61%
Type of psychotherapy CBT 6 217 −0.12 −0.82, 0.59 68%
  • χ2 = 1.85,
  • p = 0.174
Others 2 201 0.26 0.15, 0.37 0%
Type of platforms Internet 4 307 −0.17 −1.01, 0.68 78%
  • χ2 = 0.75,
  • p = 0.388
Others 4 111 0.20 −0.86, 1.27 67%
Response generation Rule-based 2 201 0.26 0.15, 0.37 0%
  • χ2 = 1.85,
  • p = 0.174
Others 6 217 −0.12 −0.82, 0.59 68%
Embodiment Yes 1 45 −0.57 −1.17, 0.03 NA
  • χ2 = 2.92,
  • p = 0.088
No 7 373 0.07 −0.47, 0.60 68%
ITT or MDM Yes 4 310 −0.35 −1.11, 0.40 79%
  • χ2 = 6.96,
  • p = 0.008∗∗
No 4 108 0.41 −0.12, 0.94 0%
Protocol or registration Yes 4 111 0.20 −0.86, 1.27 67%
  • χ2 = 0.75,
  • p = 0.388
No 4 307 −0.17 −1.01, 0.68 78%
Depressive symptoms (follow-up assessments at 6 months to 12 months)
Nature of participants Depression ± others 7 1568 −0.27 −0.48, −0.07 45%
  • χ2 = 2.40,
  • p = 0.121
Other condition 1 700 −0.10 −0.24, 0.05 NA
Age groups 18–30 years 1 700 −0.10 −0.24, 0.05 NA
  • χ2 = 2.96,
  • p = 0.228
31–40 years 3 175 −0.13 −0.79, 0.54 0%
41–50 years 4 1393 −0.32 −0.65, 0.02 67%
Type of AI chatbot Deprexis 7 1568 −0.27 −0.48, −0.07 45%
  • χ2 = 2.40,
  • p = 0.121
Others 1 700 −0.10 −0.24, 0.05 NA
Different comparators Active control 1 44 −0.27 −0.88, 0.33 NA
  • χ2 = 0.02,
  • p = 0.886
Passive control 7 2224 −0.23 −0.43, −0.03 51%
ITT or MDM Yes 7 1568 −0.27 −0.48, −0.07 45%
  • χ2 = 2.40,
  • p = 0.121
No 1 700 −0.10 −0.24, 0.05 NA
Protocol or registration Yes 4 1393 −0.32 −0.65, 0.02 67%
  • χ2 = 3.48,
  • p = 0.062
No 4 875 −0.10 −0.25, 0.05 0%
Anxiety symptoms (postintervention)
Nature of participants Depression ± others 4 385 −0.60 −1.97, 0.77 92%
  • χ2 = 5.89,
  • p = 0.117
Stress/distress ± others 6 308 0.12 −0.37, 0.61 45%
Other condition 5 1040 −0.15 −0.28, −0.02 0%
Healthy 4 1189 −0.31 −0.61, −0.02 48%
Age groups 18–30 years 10 2289 −0.15 −0.33, 0.02 58%
  • χ2 = 1.43,
  • p = 0.698
31–40 years 4 335 −0.25 −0.45, −0.06 0%
41–50 years 3 253 −0.40 −4.14, 3.35 95%
>50 years 2 45 −0.36 −3.79, 3.07 0%
Type of AI chatbot Deprexis 3 294 −0.86 −3.00, 1.28 93%
  • χ2 = 1.97,
  • p = 0.160
Other 16 2628 −0.15 −0.31, 0.00 53%
Different comparators Active control 12 1313 −0.20 −0.64, 0.23 85%
  • χ2 = 0.03,
  • p = 0.860
Passive control 7 1609 −0.24 −0.36, −0.12 0%
Type of psychotherapy CBT 15 1794 −0.27 −0.59, 0.04 78%
  • χ2 = 0.33,
  • p = 0.563
Others 4 1128 −0.14 −0.70, 0.42 79%
Type of platforms Internet 10 2255 −0.38 −0.78, 0.03 86%
  • χ2 = 1.90,
  • p = 0.168
Others 9 667 −0.10 −0.31, 0.12 32%
Response generation Rule-based 6 1195 −0.37 −1.18, 0.44 91%
  • χ2 = 0.19,
  • p = 0.663
Others 13 1727 −0.23 −0.40, −0.06 43%
Embodiment Yes 6 1177 −0.36 −0.48, −0.24 0%
  • χ2 = 0.68,
  • p = 0.409
No 13 1745 −0.20 −0.59, 0.19 83%
ITT or MDM Yes 13 2723 −0.31 −0.60, −0.02 82%
  • χ2 = 1.19,
  • p = 0.276
No 6 199 −0.00 −0.64, 0.63 58%
Protocol or registration Yes 11 2219 −0.36 −0.80, 0.07 84%
  • χ2 = 2.62,
  • p = 0.105
No 8 703 −0.03 −0.21, 0.15 1%
Anxiety symptoms (follow-up assessments at 2 weeks to 3 months)
Nature of participants Stress/distress ± others 5 175 0.38 0.05, 0.70 0%
  • χ2 = 2.37,
  • p = 0.124
Other condition 1 45 −0.12 −0.71, 0.47 NA
Age groups 18–30 years 2 109 0.31 −2.13, 2.74 0%
  • χ2 = 3.41,
  • p = 0.332
31–40 years 1 45 −0.12 −0.71, 0.47 NA
41–50 years 1 21 0.83 −0.07, 1.73 NA
>50 years 2 45 0.35 −1.36, 2.06 0%
Different comparators Active control 4 157 0.36 −0.08, 0.80 0%
  • χ2 = 0.71,
  • p = 0.400
Passive control 2 63 0.08 −3.68, 3.84 20%
Type of psychotherapy CBT 4 111 0.27 −0.39, 0.93 12%
  • χ2 = 0.02,
  • p = 0.892
Others 2 109 0.31 −2.13, 2.74 0%
Type of platforms Internet 2 109 0.31 −2.13, 2.74 0%
  • χ2 = 0.02,
  • p = 0.892
Others 4 111 0.27 −0.39, 0.93 12%
Response generation Rule-based 2 109 0.31 −2.13, 2.74 0%
  • χ2 = 0.02,
  • p = 0.892
Others 4 111 0.27 −0.39, 0.93 12%
Embodiment Yes 1 45 −0.12 −0.71, 0.47 NA
  • χ2 = 2.37,
  • p = 0.124
No 5 175 0.38 0.05, 0.70 0%
ITT or MDM Yes 2 112 0.05 −1.67, 1.76 0%
  • χ2 = 7.15,
  • p = 0.008∗∗
No 4 108 0.52 0.16, 0.88 0%
Protocol or registration Yes 4 111 0.27 −0.39, 0.93 12%
  • χ2 = 0.02,
  • p = 0.890
No 2 109 0.31 −2.13, 2.74 0%
Anxiety symptoms (follow-up assessments at 6 months)
Nature of participants Depression ± others 3 380 −0.37 −0.70, −0.05 0%
  • χ2 = 6.71,
  • p < 0.001∗∗∗
Other condition 1 700 −0.10 −0.24, 0.05 NA
Age groups 18–30 years 1 700 −0.10 −0.24, 0.05 NA
  • χ2 = 6.71,
  • p < 0.001∗∗∗
41–50 years 3 380 −0.37 −0.70, −0.05 0%
Type of AI chatbot Deprexis 3 380 −0.37 −0.70, −0.05 0%
  • χ2 = 6.71,
  • p < 0.001∗∗∗
Other 1 700 −0.10 −0.24, 0.05 NA
Different comparators Active control 1 44 −0.22 −0.82, 0.39 NA
  • χ2 = 0.01,
  • p = 0.910
Passive control 3 1036 −0.25 −0.74, 0.23 65%
Stress symptoms (postintervention)
Nature of participants Stress/distress ± others 5 267 0.07 −0.23, 0.37 0%
  • χ2 = 5.85,
  • p = 0.054
Other condition 2 126 −0.32 −4.67, 4.02 70%
Healthy 1 28 −0.85 −1.63, −0.07 NA
Age groups 18–30 years 4 275 −0.03 −0.66, 0.59 46%
  • χ2 = 5.93,
  • p = 0.115
31–40 years 1 80 −0.64 −1.09, −0.19 NA
41–50 years 1 21 0.31 −0.55, 1.17 NA
>50 years 2 45 −0.35 −3.34, 2.63 0%
Different comparators Active control 4 249 0.08 −0.32, 0.49 0%
  • χ2 = 3.80,
  • p = 0.051
Passive control 4 172 −0.40 −1.07, 0.27 41%
Type of psychotherapy CBT 6 220 −0.33 −0.79, 0.12 36%
  • χ2 = 6.91,
  • p = 0.009∗∗
Others 2 201 0.14 −0.15, 0.43 0%
Type of platforms Internet 2 201 0.14 −0.15, 0.43 0%
  • χ2 = 6.91,
  • p = 0.009∗∗
Others 6 220 −0.33 −0.79, 0.12 36%
Response generation Rule-based 3 229 −0.12 −1.41, 1.17 64%
  • χ2 = 0.13,
  • p = 0.717
Others 5 192 −0.25 −0.75, 0.26 34%
Embodiment Yes 2 126 −0.32 −4.67, 4.02 70%
  • χ2 = 0.37,
  • p = 0.544
No 6 295 −0.09 −0.54, 0.35 37%
ITT or MDM Yes 4 313 −0.27 −1.05, 0.50 74%
  • χ2 = 0.56,
  • p = 0.453
No 4 108 −0.05 −0.61, 0.51 0%
Protocol or registration Yes 4 146 −0.34 −1.03, 0.34 30%
  • χ2 = 1.14,
  • p = 0.286
No 4 275 −0.03 −0.66, 0.59 46%
Stress symptoms (follow-up assessments at 2 weeks to 3 months)
Nature of participants Stress/distress ± others 5 267 0.34 0.02, 0.66 0%
  • χ2 = 2.78,
  • p = 0.096
Other condition 1 45 −0.20 −0.78, 0.39 NA
Age groups 18–30 years 2 201 0.31 0.27, 0.35 0%
  • χ2 = 5.59,
  • p = 0.133
31–40 years 1 45 −0.20 −0.78, 0.39 NA
41–50 years 1 21 1.06 0.13, 1.98 NA
>50 years 2 45 0.17 −3.45, 3.79 0%
Different comparators Active control 4 249 0.33 −0.11, 0.76 11%
  • χ2 = 0.48,
  • p = 0.490
Passive control 2 63 0.07 −4.37, 4.51 39%
Type of psychotherapy CBT 4 111 0.25 −0.65, 1.16 49%
  • χ2 = 0.04,
  • p = 0.839
Others 2 201 0.31 0.27, 0.35 0%
Type of platforms Internet 2 201 0.31 0.27, 0.35 0%
  • χ2 = 0.04,
  • p = 0.839
Others 4 111 0.25 −0.65, 1.16 49%
Response generation Rule-based 2 201 0.31 0.27, 0.35 0%
  • χ2 = 0.04,
  • p = 0.839
Others 4 111 0.25 −0.65, 1.16 49%
Embodiment Yes 1 45 −0.20 −0.78, 0.39 NA
  • χ2 = 2.78,
  • p = 0.096
No 5 267 0.34 0.02, 0.66 0%
ITT or MDM Yes 2 204 0.12 −3.01, 3.25 56%
  • χ2 = 0.64,
  • p = 0.424
No 4 108 0.38 −0.29, 1.06 13%
Protocol or registration Yes 4 111 0.25 −0.65, 1.16 49%
  • χ2 = 0.04,
  • p = 0.839
No 2 201 0.31 0.27, 0.35 0%
  • Note: I2 means heterogeneity.
  • p < 0.05, p < 0.01∗∗, p < 0.001∗∗∗.

A series of random-effects meta-regression analyses were conducted to evaluate the effect of the various covariates on the effect size of depressive symptoms (Table 4). The univariate meta-regression analyses concluded that publication year (β = 0.017, p = 0.617), duration of intervention based on the number of days (β = −0.004, p = 0.270), sample size (β = 0.001, p = 0.383), attrition rate (β = −0.003, p = 0.774), and the portion of males (β = −0.011, p = 0.134) had no effects on depressive symptoms. Thus, the between-trial heterogeneity could not be explained by these covariates.

Table 4. Random-effects univariate meta-regression analyses of covariates on depression and anxiety at postintervention.
Covariates Depressive symptoms Anxiety symptoms
β SE 95% lower 95% upper p-Value β SE 95% lower 95% upper p-Value
Year of publication 0.017 0.033 −0.052 0.085 0.617 −0.001 0.056 −0.118 0.117 0.993
Duration of intervention (days) −0.004 0.004 −0.01 0.003 0.270 −0.007 0.003 −0.013 <0.001 0.056
Sample size 0.001 0.001 −0.001 0.002 0.383 <−0.001 <0.001 −0.001 0.001 0.671
Attrition rate −0.003 0.009 −0.020 0.015 0.774 −0.007 0.007 −0.020 0.007 0.339
Portion of males −0.011 0.007 −0.025 0.004 0.134 −0.006 0.007 −0.021 0.009 0.407
  • Note: β means regression coefficients.
  • Abbreviation: SE, standard error.

3.5. Anxiety Symptoms

A total of 19 arms of 17 trials [2, 7, 16, 23, 4042, 4447, 49, 50, 70, 71, 73, 74] involving 2922 participants at the postintervention assessment, six arms of five trials [7, 23, 41, 44, 74] including 220 participants at 2 weeks to 3 months of follow-up assessment, and four trials [11, 26, 46, 50] of 1080 participants at 6–12 months of follow-up assessment were found. Meta-analyses showed no differences between the intervention and comparator at the postintervention assessment (t = −1.95, p = 0.067) and 2 weeks to 3 months (t = 2.08, p = 0.093) and 6–12 months (t = −2.82, p = 0.067) of follow-up assessment, as shown in Figure 3.

Details are in the caption following the image
Forest plot of anxiety symptoms at postintervention, follow-up at 2 weeks–3 months and 6 months for artificial intelligence-based psychotherapeutic interventions and comparators.

The 95% PIs were −1.19 to 0.71, −0.11 to 0.65, and −0.99 to 0.51 for three-time points. Hence, the intervention will predict an insignificant reduction in anxiety symptoms compared with comparators in future similar studies. Heterogeneity was substantial (I2 = 78%) at the postintervention assessment, insignificant (I2 = 0%) at 2 weeks to 3 months of follow-up assessment, and moderate (I2 = 45%) at 6–12 months of follow-up assessment. To explore the sources of heterogeneity, subgroup analyses and meta-regression analyses were performed.

We conducted a series of subgroup analyses for three-time points (Table 3 and Figures S27S49). Significant differences (p < 0.1) were found between subgroups based on the nature of participants, age groups, the type of AI chatbot they used, and how they used ITT/MDM to improve anxiety symptoms at follow-up 2 weeks to 3 months and 6 months. Subgroup analyses showed that AI-based psychotherapeutic interventions using Deprexis with people aged 41–50 in Europe who had depression or depression along with other health problems had a bigger effect (g = −0.37, 95% CI: −0.70. to −0.05) on lowering anxiety symptoms at follow-up 6 months later than the other groups. We found a smaller effect size in the trials using ITT or MDM (g = 0.05, 95% CI: −1.67 to 1.76) on decreasing depressive symptoms at follow-up 2 weeks to 3 months when compared to its counterpart.

The univariate meta-regression analyses suggested that the publication year (β = −0.001, p = 0.993), duration of intervention based on number of days (β < −0.007, p = 0.056), sample size (β < −0.001, p = 0.671), attrition rate (β = −0.007, p = 0.339), and portion of males (β = −0.006, p = 0.407) had no effects on anxiety symptoms (Table 4). Therefore, the cause of high heterogeneity could not be explained by these covariates.

3.6. Stress Symptoms

A total of eight arms of seven RCTs [7, 23, 27, 41, 44, 47, 74] among 421 participants at the postintervention assessment and six arms of five RCTs [7, 23, 41, 44, 74] involving 312 participants at 2 weeks to 3 months of follow-up assessment were pooled to evaluate the effect of intervention on stress symptoms (Figure 4). Meta-analyses did not yield any significant differences (t = −1.18 to 2.04, p = 0.098–0.277) between the intervention and comparator at the postintervention assessment and 2 weeks to 3 months of follow-up assessment.

Details are in the caption following the image
Forest plot of stress symptoms at postintervention and follow-up at 2 weeks–3 months for artificial intelligence-based psychotherapeutic interventions and comparators.

A series of subgroup analyses were performed for two-time points (Table 3 and Figures S50S67). Significant differences (p < 0.1) were revealed between subgroups based on the nature of participants, their different comparators, the type of psychotherapy they received, the platforms they used, and how the embodied chatbot presented itself to decrease stress symptoms at postintervention and follow-up assessment. Trials that were conducted among participants with conditions other than stress or distress had a larger effect on reducing stress symptoms at postintervention (g = −0.32, 95% CI: −4.67 to 4.02) and follow-up (g = −0.20, 95% CI: −0.78 to 0.39) compared with their counterparts. The interventions that adopted CBT (g = −0.33, 95% CI: −0.79 to 0.12) using non-Internet platforms (g = −0.33, 95% CI: −0.79 to 0.12) and passive control (g = −0.40, 95% CI: −1.07 to 0.27) had a greater effect on decreasing stress symptoms at postintervention than their counterparts. Trials conducted in non-Europe (g = −0.20, 95% CI: −0.78. to 0.39) when the interventions used embodied chatbot (g = −0.20, 95% CI: −0.78. to 0.39) had a greater effect on improving stress symptoms at follow-up assessment when compared to their counterparts.

3.7. Depressive, Anxiety, and Stress Symptoms

Three RCTs [8, 41, 74] were found to examine the effect of intervention on the total scores of depressive, anxiety, and stress symptoms using the 21-item Depressive, Anxiety, and Stress Scale [75] in 295 participants at the postintervention assessment and 2 weeks to 3 months of follow-up assessment (Figure 5). The meta-analyses did not reveal any differences between the two groups (t = 1.34–1.46, p = 0.281–0.311).

Details are in the caption following the image
Forest plot of depression, anxiety, and stress symptoms at postintervention and follow-up at 2 weeks for artificial intelligence-based psychotherapeutic interventions and comparators.

3.8. Overall Evidence

The GRADE criteria were used to evaluate 10 outcomes of this review (Tables S7), and the certainty of evidence ranged from very low to moderate. Inconsistency, indirectness, and imprecision were downgraded due to the presence of high heterogeneity, various populations and interventions, a small sample, and a wide confidence interval. Given the more than 10 trials for depressive and anxiety symptoms at postintervention, funnel plots and Egger’s test were performed. No evidence of publication bias was found because of symmetrical funnel plots and the Egger tests (p = 0.091–0.983; Figures S68 and S69).

4. Discussion

4.1. Summary of Findings

Through 13,546 records from the 12 databases, three clinical trial registries, and other methods by using three-step comprehensive searching, we found 30 RCTs among 6100 samples across nine countries. Our review showed that AI-based psychotherapeutic interventions significantly reduced depressive symptoms at postintervention assessment with a medium effect size and 6–12 months of follow-up assessment with a small effect size compared with comparators. No significant effect of AI-based psychotherapeutic interventions was found on anxiety, stress, or the total scores of depressive, anxiety, and stress symptoms at postintervention or different periods of follow-up assessments. A series of subgroup analyses revealed significant differences in the reduction of psychological symptoms at various points based on participants’ nature, age group, type of AI chatbot, type of psychotherapy, type of platform, embodiment, different comparator, ITT/MDM, and protocol/registration. The random-effects univariate meta-regression did not detect a significant covariate on depressive and anxiety symptoms at postintervention. The majority (79.1%) of trials with ITT analysis and less than half (48.6%) of trials with perprotocol analysis rated a low risk of bias across five domains using the RoB 2.0 criteria. No publication bias was detected for depressive and anxiety symptoms at postintervention. The certainty of evidence ranged from very low to moderate for 10 psychological outcomes according to the GRADE criteria.

4.2. Depressive Symptoms

In line with a piece of previous meta-analytic evidence [18], we found that depressive symptoms significantly reduced following AI-based psychotherapeutic interventions at postintervention. Our result also indicated a significant effect at 6–12 months of follow-up assessment. Thus, AI-based psychotherapeutic interventions reduce immediate and long-term effects. AI chatbots can be designed to deliver various psychotherapies using AI technology according to different psychological principles, such as CBT [40], method of levels therapy [44], or problem-solving therapy [66]. Users may engage in the intervention in text-based or voice-activated conversations [70], and such interactions can offer psychological, relational, and emotional support [76]. Chatbots can also provide initial counseling, guide users to use a self-help library, and lead users to correct services [74]. Chatbots use AI algorithms to interpret user dialogues and conduct useful interactions. They may have a low attrition rate due to increased engagement and motivation [17]. Therefore, AI-based psychotherapeutic interventions can ameliorate depressive symptoms. Given that only seven trials included 6–12 months of follow-up assessment, a conclusion of the long-term effect of intervention cannot be made.

In our review, the majority of the interventions used rule-based response generation and less than half used NLP. Rule-based response generation consists of simple dialogue components based on rules, following a predefined decision tree and communicating in a scripted manner [74]. Conversely, generative-based response generation is more complex and relies on ML to construct its dialogues; AI uses this method to generate possible answers and enhance conversational proficiency [11]. With the increasing integration of AI technology into psychotherapy [77], future interventions can consider using advanced generative deep learning techniques that may allow AI chatbots to interact with users in an empathetic, coherent, and personalized manner [7, 74].

Seven interventions used embodied conversational agents in our review. Our subgroup analysis showed a greater effect size for embodied agents compared with nonembodied agents. An embodied conversational agent is a computer-based dialogue system with a virtual embodiment (full body or face-only) that typically interacts with users using multimodal communication cues of speech, text, animated facial expressions, or gestures [73]. Evidence showed that embodied conversational agents can build trust and rapport and can create a sense of warmth, leading to companionship and long-term usage [13, 78]. Future interventions can consider adopting embodied agents. Only one intervention [57] used emojis (images depicting facial expressions) to share and track the participants’ moods over time. Considering emotions can be used to express, imitate, and appraise the varying degrees of emotions [79]; more research is needed to evaluate its effectiveness.

Notably, the intervention failed to demonstrate superior effects at 2 weeks to 3 months of follow-up assessment in eight trials. Most comparators (62.5%) were active control groups, such as using another conversational chatbot [44, 51, 53], stress management training and CBT [47, 48], and e-books on depression [53]. This finding aligns with the results of a previous mixed-method review [16] demonstrating similar patterns. Our review revealed comparable effects between AI-based psychotherapeutic interventions and active comparators. Furthermore, a few of the participants (25%) had depressive problems at 2 weeks to 3 months of follow-up assessment. The plausible interpretation of the findings suggested that AI-based psychotherapeutic intervention may not alleviate depression symptoms in persons who are not depressed. However, we could not conclude an absolute treatment efficacy on the reduction of depressive symptoms at 2 weeks to 3 months of follow-up assessment.

Our subgroup analyses revealed that intervention had a greater effect size among participants with depression or depression combined with other health issues aged 31–40 than other age groups. One reason could be that younger adults had greater knowledge of AI [72] and more engagement in activities [80] than older adults. Therefore, young adults were more likely to adhere to interventions than older adults. Consistent with a previous review 18], the intervention significantly improved depressive symptoms in participants with depression or depression combined with other health issues. This finding suggests that interventions were more effective for treatment in depressive participants compared to other health conditions. Hence, the intervention was more beneficial for the young depressive group. Our subgroup results showed a significant subgroup difference based on the nature of participants, age groups, embodiment, ITT/MDM, and protocol/registration at follow-up 2 weeks to 12 months, but the subgroup analysis only used 1–4 trials. Hence, the results should be interpreted with caution. Hence, more investigations are recommended for future trials to confirm the findings.

4.3. Other Psychological Outcomes

Contrary to our expectations, the meta-analyses revealed that AI-based psychotherapeutic interventions did not improve anxiety symptoms, stress symptoms, and a combination of depressive, anxiety, and stress symptoms at postintervention and follow-up assessments compared with comparators. These findings are inconsistent with a previous review [17]. One possible reason may be attributed to the fact that most comparators are active control for these outcomes. Another possibility is the differences between depressive, anxiety, and stress symptoms [81]. Stress symptoms are a sense of feeling overwhelmed that measures chronic nonspecific arousal, tension, agitation, and irritability; anxiety symptoms are a sense of fear or dread that focuses on autonomic arousal, physical symptoms of anxiety, and the subjective experience of anxious affect; and depression symptoms are a sense of unhappiness or sadness, such as dysphoria, hopelessness, low self-esteem, anhedonia, and loss of interest [82, 83]. These discrepancies of feeling with specific cognitive processes and coping strategies may explain the different results [83]. At this stage, we can only speculate about the reason for this occurrence. Hence, conclusions cannot be drawn, and further studies are required.

According to our subgroup analyses, we found significant differences between subgroups based on participant’s nature, age groups, type of AI chatbot, psychotherapy, platform, comparators used, and how they used ITT/MDM to improve anxiety and stress symptoms at postintervention and follow-up assessments. However, these subgroup comparisons used only 1–6 trials in each group, and we also found an uneven number of trials in the subgroups. It is therefore important to evaluate the data cautiously[38]. Therefore, we advise further research to validate the results in subsequent studies.

4.4. Strengths and Limitations

The current systematic review has several strengths. This review was the first to examine the short- and long-term effects of AI-based psychotherapeutic interventions on psychological outcomes. A comprehensive search strategy, including 12 databases and three clinical trial registries, was used to identify 30 RCTs to reduce publication bias. The random-effect meta-analysis applied the restricted maximum likelihood method [36] with Hartung–Knapp adjustment [37]. The 95% PI for the meta-analyses was reported to predict true effects in future settings [32], and the certainty of evidence on each outcome was assessed.

Notwithstanding the strengths, this review had several limitations. First, the psychological outcomes were self-reported, which may cause social desirability bias. Second, the number of trials included in some meta-analyses was limited, especially for follow-up assessment; thus, statistical power was reduced. Third, the uneven number of trials in the subgroups may have failed to estimate valid results [38]. Fourth, included interventions were designed from a wide variety of psychological principles, and six meta-analyses revealed substantial heterogeneities that restricted the accuracy of pooled estimates. Fifth, the certainty of the evidence for the six outcomes was either very low or low, which may eliminate the confidence in implementing AI-based psychotherapeutic interventions. Sixth, some trials did not provide a regimen of intervention that limited the feature comparison. Lastly, the majority (n = 18) of the trials were from European countries, which might restrict the generalization of the findings.

4.5. Clinical Implications and Future Research

In this review, we found that depressive symptoms had small to medium effects at postintervention and follow-up assessments at 6–12 months. Given that the intervention was under variable comparator conditions, the active control groups may mask effectiveness [84]. Hence, small to medium effects can be considered either clinically important differences or the minimum clinically important differences [71, 85]. The participants could have experienced meaningful treatment benefits from AI-based psychotherapeutic interventions. However, the certainty of evidence quality of six outcomes was very low or low; thus, AI-based psychotherapeutic interventions can be supported as a supplementary intervention. Given the shortage of mental health workers globally, such intervention can be considered adjunctive to the usual treatments during the therapeutic process. Interventions can be incorporated into comprehensive web applications to facilitate access to psychotherapy amid physical distancing requirements, particularly during the ongoing COVID-19 pandemic. However, designing an interface adaptable to diverse user profiles presents certain challenges. Technical challenges are encountered in interpreting emotions in dialogues and improving features of chatbots in a human-like manner. Privacy and security of interventions are other important issues to pay attention to during the development of the intervention.

Despite the COVID-19 pandemic driving the use of AI technology, AI chatbots may have the risk of being used inappropriately. Healthcare research teams should collaborate closely and regularly with computing scientists to modify and upgrade human–computer interactions. The subgroup results suggest that intervention can target depressive populations aged 31–40 years. Sustainable heterogeneities exist in some meta-analyses, suggesting that the interventions varied across regimes of interventions, settings, and populations. Future interventions should consider using standardized regimes among specific populations in the same setting to draw a conclusive result. In addition, future research should include more detailed content and regimen according to the Template for Intervention Description and Replication Guide [86]. Given the low or very low certainty for the six outcomes, well-designed RCTs are necessary to minimize selection, performance, and reporting biases by reporting allocation concealment, blinding participants, use of ITT or MDM, and registering/publishing trial protocols. Future RCTs should recruit large samples in non-European countries to improve the generalizability of the findings.

5. Conclusion

This review revealed significant effects in reducing depressive symptoms after AI-based psychotherapeutic interventions at postintervention assessment and 6–12 months of follow-up assessment. We found comparable effects on anxiety, stress, and combined symptoms between AI-based psychotherapeutic interventions and active comparators. AI-based psychotherapeutic interventions can supplement the existing psychiatric care targeting depressive groups ages 31–40. Future studies should improve the transparency of the intervention’s content and regimen. Further investigations should also use methodologically robust approaches with a large-scale and long-term follow-up assessment to evaluate the sustainability of the intervention.

Conflicts of Interest

The authors declare no conflicts of interest.

Author Contributions

Ying Lau, Kin Sun Chan, Patrick Cheong-Iao Pang, and Sai Ho Wong conceptualized and designed the study. Wei How Darryl Ang and Wen Wei Ang conducted a systematic literature search with the help of a senior librarian. Sai Ho Wong, Wen Wei Ang, and Ying Lau performed the title and abstract screening, data extraction, and assessed the quality of selected studies. Ying Lau, Sai Ho Wong, Wei How Darryl Ang, and Wen Wei Ang conducted data management, data analysis, and data synthesis. Ying Lau supervised the systematic review and wrote the article. All authors have read and approved the final version of the article.

Funding

No funding was used in the study.

Acknowledgments

We acknowledge the senior librarian, Suei Nee Wong, for her support in developing the search strategy. We also appreciated the supplementary data from the trial authors.

    Supporting Information

    Additional supporting information can be found online in the Supporting Information section.

    Data Availability Statement

    The data that support the findings of this study are available from the corresponding author upon reasonable request.

      The full text of this article hosted at iucr.org is unavailable due to technical difficulties.