Background: Artificial intelligence (AI)–based psychotherapeutic interventions may bring a new and viable approach to expanding psychiatric care. However, evidence of their effectiveness remains scarce. We evaluated the efficacy of AI-based psychotherapeutic interventions on depressive, anxiety, and stress symptoms at postintervention and follow-up assessments.

Methods: A three-step comprehensive search via nine electronic databases (PubMed, Embase, CINAHL, Cochrane Library, Scopus, IEEE Xplore, Web of Science, PsycINFO, and ProQuest Dissertations and Theses) was performed.

Results: Thirty randomized controlled trials (RCTs) in 31 publications involving 6100 participants from nine countries were included. The majority (79.1%) of trials with intention-to-treat analysis but less than half (48.6%) of trials with perprotocol analysis were graded as low risk. Meta-analyses showed that interventions significantly reduced depressive symptoms at the postintervention assessment (t = −4.40, p = 0.001) with medium effect size (g = −0.54, 95% CI: −0.79 to −0.29) and at 6–12 months of assessment (t = −3.14, p < 0.016) with small effect size (g = −0.23, 95% CI: −0.40 to −0.06) in comparison with comparators. Our subgroup analyses revealed that the depressed participants had a significantly larger effect size in reducing depressive symptoms than participants with stress and other conditions. At postintervention and follow-up assessments, we discovered that AI-based psychotherapeutic interventions did not significantly alter anxiety, stress, and the total scores of depressive, anxiety, and stress symptoms in comparison to comparators. The random-effects univariate meta-regression did not identify any significant covariates for depressive and anxiety symptoms at postintervention. The certainty of evidence ranged between moderate and very low.

Conclusions: AI-based psychotherapeutic interventions can be used in addition to usual treatments for reducing depressive symptoms. Well-designed RCTs with long-term follow-up data are warranted.

Trial Registration: CRD42022330228

1. Introduction

The World Health Organization (WHO) found that approximately 1 billion people worldwide struggle with some form of psychological problems, which are the leading cause of years lived with disability [1]. Between 1990 and 2019, the global number of disability-adjusted life years due to mental disorders increased from 80.8 to 125.3 million over 20 years [2]. Many of these people with mental disorders are not receiving treatment due to a shortage of therapists, stigmatization, transport costs, and expensive consultation fees [3, 4]. During the coronavirus disease of 2019 (COVID-19) pandemic, an additional 53.2 million cases of major depressive disorders and 76.2 million cases of anxiety disorders were found globally [5]. A systematic review found that the global prevalence of depression, anxiety, and stress among the general population during the COVID-19 pandemic ranged from 25.18% to 29.57% [6]. This pandemic has created an increased urgency to consider the accessibility of psychotherapy due to restrictions and lockdowns [7]. The WHO found that mental health interventions are insufficient and inadequate globally [1], prompting the utilization of new technology to meet the needs.

In parallel with the advancements in artificial intelligence (AI) technology, psychotherapy has begun to incorporate AI techniques for creating psychotherapeutic interventions [8, 9]. This human–computer interaction technology is believed to be intelligent enough to comprehend the conversation between a patient and a chatbot therapist based on machine learning (ML) algorithms [4, 10]. Applications could help prevent, treat, and prevent relapses in behavioral and psychiatric issues [11]. According to Bendig et al. [11], AI chatbots, also known as conversational or relational agents, are machine conversation systems that interact with human users using various AI technologies. Responses can be generated using a rule-based model (predefined rules or decision tree), natural language processing (NLP), or ML through text-based or speech-enabled conversations [4, 7]. AI chatbots try to talk like humans, including the emotional, social, and relational parts of natural conversation [11]. They do this to imitate a therapeutic conversational style that can help users transfer therapeutic content and mirror therapeutic processes [7, 12].

According to Boucher et al. [7] and Vaidyam et al. [4], AI chatbots are thought to possess sufficient intelligence to comprehend conversations with human users using written, spoken, and visual language through an interactive interface. Some scientists have developed an avatar, a computer-generated character, as an embodied conversational agent in an intervention aimed at improving usability and intention to use [13]. An embodied agent can emulate some human interactions, including gaze, speech, hand gestures, and other nonverbal modalities [13]. Different platforms, including websites, mobile applications, short message services, virtual reality, and smart technology, can integrate AI chatbots to perform various functions such as therapy, counseling, monitoring, engagement, adherence, or psychoeducation [7].

Psychotherapeutic interventions can use AI chatbots and tailor them to specific populations [4]. This innovative approach may improve the shortage of therapists and engagement in therapy [4, 13]. AI-based psychotherapeutic interventions offer several advantages when used, including lowering the stigma associated with therapy, fostering a comfortable environment for self-disclosure, being cost-effective, reducing travel time, eliminating geographical restrictions, freeing up human resources, and broadening overall accessibility [4, 10]. Hence, AI-based psychotherapeutic interventions may offer a potential solution to overcome barriers and expand psychiatric care.

Given that AI-based psychotherapeutic intervention is an emerging field, different types of systematic review have been found, including three integrative reviews [4, 7, 10], four scoping reviews [8, 11, 14, 15], one mixed-method review [16], and three systematic reviews [9, 17, 18].

Boucher et al. [7] highlighted the potential integration of AI-based chatbots into digital mental health interventions. Pham, Nabizadeh, and Selek [10] described different AI-based interventions and their clinical practices. Vaidyam et al. [4] explored the roles of conversational agents, or chatbots in the screening, diagnosis, and treatment of mental illness. The integrative review suggested that an AI-supported intervention could increase engagement, even though its therapeutic effect was not reported enough [4]. The integrative review proposed that an AI-supported intervention could enhance engagement, despite the underreporting of its therapeutic effect [4]. According to these integrative reviews [7, 10], future research should focus on utilizing randomized controlled trials (RCTs) to investigate the efficacy of AI psychotherapeutic interventions.

A mixed-method review [16] aimed to evaluate the use of conversational agent interventions in the treatment of mental health problems. The scoping reviews were supposed to look at how chatbots have been developed and used in public health [15], how they are used for mental health [8, 14], and how useful, acceptable, and practicable they are in clinical psychology and psychotherapy [11]. Results regarding the practicability, feasibility, and acceptability of AI-supported intervention for mental problems were promising [14, 16], but there was a lack of consensus on reporting and evaluation for chatbots [8] and no direct transferability to psychotherapeutic context [11]. Hence, more reviews are required to demonstrate its efficacy [14].

We found three systematic reviews [9, 17, 18] relating to the efficacy of AI-based psychotherapeutic intervention in existing literature. Gual-Montolio et al. [9] aimed to use AI-based methods to enhance outcomes in psychological interventions in real-time or close to real-time. Li et al. [17] and Lim et al. [18] examined the feasibility and/or effectiveness of AI-based psychotherapeutic interventions. However, these reviews had certain limitations. These included relying on a limited number of databases [9], combining AI-based and non-AI-based interventions [18], only providing a narrative synthesis [9, 17], focusing only on depressive symptoms as an outcome [18], and incorporating different research designs [9].

Emerging evidence has shown that AI-based psychotherapeutic intervention may improve psychological outcomes. However, relatively few reviews have investigated the long-term effects of interventions. To fill this gap, the current review aims to evaluate the efficacy of AI-based psychotherapeutic interventions on depressive, anxiety, and stress symptoms at postintervention and follow-up assessments.

2. Material and Method

This systematic review was reported following the preferred reporting items for systematic reviews and meta-analyses (PRISMA) (Table S1) [19].

2.1. Eligibility Criteria

Given that RCTs are considered the gold standard for evaluating the effectiveness of interventions [20], only RCTs were included in this review. The population targeted adults aged ≥18 years old with or without medical, psychological, and behavioral problems. The intervention used a conversational (chatbot) interface to deliver any form of psychotherapy with self-guided or therapist support incorporating AI technology. Response generation contained rule-based or other AI technologies. Input and output modalities involved written, spoken, visual, or emoji. The presentation could use either an embodied or nonembodied chatbot. The comparator included treatment as usual, waitlist, placebo control, or any type of intervention. The psychological outcomes included depressive, anxiety, and stress symptoms at postintervention and follow-up assessments. No restrictions were imposed on the population and publication date. This review included published and unpublished trials in the English language [21]. The details of the eligibility criteria can be found in Table S2.

2.2. Search Strategy

A scoping search for existing systematic reviews with similar aims was conducted in the Cochrane Database of Systematic Reviews, Joanna Briggs Institute, and the PROSPERO database to prevent any duplication. An iterative process was used to develop the search terms. An initial keyword search comprising “artificial intelligence” AND “psychological outcomes” was used to conduct a simple search. After the inclusion of potential articles, the search terms and keywords were revised in consultation with a university librarian. The eventual search terms comprising both keywords and index terms for the respective databases can be found in Table S3.

Following the development of the search terms, a three-step search [22] was conducted from inception to February 9, 2023. First, a search was conducted in nine English electronic databases (PubMed, Embase, CINAHL, Cochrane Library, Scopus, IEEE Xplore, Web of Science, PsycINFO, and ProQuest Dissertations and Theses) to locate relevant articles. Second, a search for unpublished trials was conducted in three clinical trial registries (ANZCTR, ISRCTN, and CenterWatch), and an email was sent to all corresponding authors to obtain information about their trials. Finally, a hand search of the reference lists of the selected studies and gray literature was conducted to maximize the search. We contacted the authors via email for additional data when the information included in their publication was insufficient.

2.3. Study Selection

EndNote X20 was used to manage the retrieved citations from the search. Duplicates were removed using automated and manual functions. Two authors (W.W. and S.H.W.) screened all the articles by title and abstract, with reference to the eligibility criteria. When disagreements occurred, a third author (L.Y.) was consulted. Inter-rater reliability was measured using Cohen’s kappa (κ), l, with −1 suggesting an absence of agreement, and 1 indicating perfect agreement [23]. Values greater than 0.75 were considered excellent agreement, whereas values between 0.40 and 0.75 were considered good agreement [24].

2.4. Data Management and Extraction

The data extraction form was designed with reference to the Cochrane Handbook [22]. Two reviewers (W.W. and S.H.W.) extracted all the data independently. The data elements extracted included trial characteristics (number of studies, author, publication year, country, recruitment setting, design, nature of participants, mean age, gender distribution, AI-based psychotherapeutic intervention, name, comparator, sample size, psychological outcomes, measures, attrition rate, intention-to-treat analysis [ITT], missing data management [MDM], protocol, trial registration, and grant support), description of the intervention (intervention content, type of AI chatbot, psychological principle, duration of the intervention, follow-up assessment, frequency of use, and mean amount of time engagement in minutes), and psychological outcomes (depressive, anxiety, and stress symptoms) at postintervention and follow-up assessments (mean, standard deviation, and total numbers).

2.5. Risk of Bias Version 2 (RoB 2.0)

The Cochrane risk of RoB 2.0 [25] was used to appraise the methodological quality of all included studies. Risk of bias was performed via an Excel tool to implement RoB 2.0 by two independent reviewers (W.W. and D.A.). The risk of bias was evaluated against the following five domains of bias: (1) randomization process, (2) deviations from intended intervention, (3) missing outcome data, (4) measurement of the outcome, and (5) selection of the reported result [25]. Two reviewers responded to signaling questions in each domain to select the options of “yes,” “probably yes,” “probably no,” “no,” or “no information.” The RoB 2.0 algorithmic tool rates the risk of bias as “low,” “high,” or “some concerns” [25].

2.6. Certainty of Evidence

The grading of recommendations, assessment, development, and evaluation (GRADE) criteria was used to assess the overall certainty of evidence [26]. To determine the certainty of evidence, two reviewers (D.A. and L.Y.) independently evaluated the studies based on the following domains: risk of bias, inconsistency, indirectness, imprecision, and effect. The ratings were classified as very low, low, moderate, or high, and the decision was determined based on justifications [26]. Publication bias was determined using the Egger regression test [27] and funnel plot of precision using standardized mean difference [28]. Publication bias was ascertained using a p-value of less than 0.05 from the Egger test and asymmetrical funnel plot [29].

2.7. Data Synthesis

We used the meta [30] and metaphor [31] packages of R software to conduct the meta-analysis, subgroup analysis, and meta-regression analysis. Prediction interval (PI) was used based on t-distribution (t) to predict a range of true effects for future trials with similar settings [32]. A 95% PI was used to estimate the 95% probability that the next trial will be contained within this range. A statistically significant effect is expected for a future trial if all values of the 95% PI are on the same side of the null of 0, whereas an insignificant effect is expected if all values are on both sides of the null of 0 [32]. Hedges’ g was used to communicate the effect size because of its precision for studies with small sample sizes [33, 34]. Random-effects model was used to assume that the observed estimates of treatment effect can vary across studies [35]. Restricted maximum likelihood method was used as the estimator for random-effect meta-analysis to provide unbiased estimates [36]. Hartung–Knapp adjustment for random-effects models was selected to prevent counterintuitive effects [37]. A 95% confidence interval was used to communicate the precision of the summary estimate and derive the p-value [22].

Heterogeneity was assessed using Cochran Q test and I² values [22]. A p-value of <0.01 indicated heterogeneity. The extent of heterogeneity was quantified using I² values [22]. A Cochran Q test p-value of <0.01 and I² > 50% indicated heterogeneity [22]. Additional subgroup and meta-regression analyses were conducted to explore the reasons for heterogeneity [22].

Subgroup analyses were conducted based on the predetermined groups based on the nature of participants (depression ± others, stress/distress ± others, other condition, or healthy), age groups (18–30, 31–40, 41–50, or >50), type of AI-based chatbot (Deprexis vs. others), different comparator (passive vs. active), type of psychotherapy (cognitive behavioral therapy [CBT] vs. others), type of platforms (Internet vs. others), response generation (rule-based vs. NLP), and embodiment (yes vs. no) use of ITT/MDM (yes vs no), and protocol publication/trial registration (yes vs. no). Significant subgroup differences were determined based on the Q statistic with a subgroup effect of p < 0.1 [38]. Meta-regression analyses were conducted to examine the effects of potential covariates (publication year, duration of intervention, sample size, attrition rate, and portion of males) on the psychological symptoms. The relationships were expressed using coefficient β, which represents the change in the value of depressive symptoms relative to the unit change in the covariates [39]. A p-value of <0.05 was used to conclude the association between the covariate and outcomes based on effect size [22].

3. Results

The outcomes of the three-step search are shown in Figure 1. A total of 13,521 articles were retrieved from 12 electronic databases and three clinical trial registries. Ten records were found from trial registries and excluded, providing reasons for each exclusion (Table S4). Following the removal of 2389 duplicates, a total of 11,132 articles were screened based on their title and abstract. Twenty-five records were identified from websites, organizations, and citation searching. Fifty-seven articles from both sources were assessed in full text for eligibility. Twenty-six articles were excluded, and their reasons were documented in Table S5. A total of 30 RCTs in 31 publications with a study number ranging from 1 to 30 [12, 40–68 ] were included in this systematic review and meta-analysis.

Details are in the caption following the image — **Figure 1**
Open in figure viewer PowerPoint

Preferred reporting items for systematic reviews and meta-analyses (PRISMA) flow diagram of the article selection process.

3.1. Trial Characteristics

The characteristics of the 30 trials evaluating the effect of AI-based psychotherapeutic interventions involving 6100 participants can be found in Table 1. The trials were published from 2009 [12] to 2022 [70]. They were conducted in Argentina (n = 1) [71], China (n = 2) [70, 72], Germany (n = 10) [11, 12, 26, 39, 40, 42, 48, 50–52], Italy (n = 2) [7, 44], Korea (n = 2) [16, 47], Romania (n = 1) [73], the United Kingdom (n = 3) [8, 41, 74], the United States (n = 5) [2, 9, 23, 45, 46], and three countries (n = 1) [43]. The participants were recruited from the community (n = 21) (1–3,5,6,8,9,11–16,19,20–23,25,27,28), clinical setting (n = 6) [11, 16, 26, 39, 43, 47], and a mixture of both (n = 3) [48, 50, 52]. Twenty-five trials adopted a two-arm RCT, three trials [40, 49, 72] adopted a three-arm RCT, one trial [7] adopted a four-arm RCT, and one trial [23] adopted a crossover design. The sample sizes of the trials ranged from 21 [44] to 1013 [48]. Half of them reported follow-up outcomes after postassessment, which ranged from 2 weeks [74] to 12 months [48].

Table 1. Characteristics of selected 30 randomized controlled trials in 31 publications.

Number	Author, year	Country/recruitment	Design	Nature of participants (criteria)	Mean age (gender portion)	AI-based psychotherapeutic intervention (name)	Type of comparator (name)	Sample size	Psychological outcomes (measures)	Follow-up	Attri rate (%)	ITT/MDM	Protocol/registry/grant
1.	Beevers et al. [40]	United States/community	Two-arm RCT	Adults with depression (QIDS-SR ≥10)	31.91 M: 24.7% F: 74.4%	Internet-based intervention (Deprexis)	Passive control (Waitlist)	T: 376 I: 285 C: 91	Depressive symptoms (QIDS-SR and HRSD-17)	No	20.4	Y/Y	N/Y/Y^a

2.	Bennion et al. [41]	United Kingdom/community	Two-arm RCT	Older adults (>50) with emotional distress (NR)	69.21 M: 26.8% F: 73.2%	Internet-based conversational agent (Chatbot, MYLO)	Active control Internet-based conversational agent (Chatbot, ELIZA)	T: 112 I: 59 C: 53	Depressive, anxiety, and stress symptoms (DASS-21)	2 weeks	12.5	N/N	N/N/Y^a

3.	Berger et al. [42]	Sweden and Germany/community	Three-arm RCT	Adults with depression (BDI-II >13, suicide item <2)	38.8 M: 30.3% F: 69.7%	I₁: Internet-based intervention (Deprexis, guided) I₂: Internet-based intervention (Deprexis, unguided)	Passive control (waitlist)	T: 76 I₁: 25 I₂: 25 C: 26	Depressive symptoms (BDI-II)	6 months	0	Y/Y	N/N/Y^a

4.	Berger et al. [43]	Germany/clinical	Two-arm RCT	Adults with depression (BDI-II >13)/ unipolar affective disorder	43.1 M: 33.7% F: 66.3%	Internet-based intervention (Deprexis) + psychotherapy	Active control (psychotherapy)	T: 98 I: 51 C: 47	Depressive (BDI-II), anxiety (GAD-7) symptoms	6 months	29.6	Y/Y	Y/Y/Y^a

5.	Bird et al. [44]	United Kingdom/community	Two-arm RCT	Adults (students and staff in university) with distress	21.3 M: 18.4% F: 81.6%	Internet-based conversational agent (Chatbot, MYLO)	Active control Internet-based conversational agent (Chatbot, ELIZA)	T: 171 I: 85 C: 86	Depressive, anxiety, and stress symptoms (DASS-21)	2 weeks	0	Y/Y	N/N/Y^a

6.	Bücker et al. [45]	Germany/community	Two-arm RCT	Adults with gambling and mood problems (NR)	35.71 M: 76.4% F: 23.6%	Internet-based intervention (Deprexis)	Passive control (waitlist)	T: 145 I: 74 C: 71	Depressive (PHQ-9), anxiety (GAD-7) symptoms	No	47.9	Y/Y	Y/Y/Y^b

7.	Burton et al. [46]	Romania, Spain, and United Kingdom/clinical	Two-arm RCT	Adults with major depressive disorder (NR)	38.65 M: 33.3% F: 66.7%	Embodied virtual agent-based system (Help4Mood)	Passive control (treatment as usual)	T: 28 I: 14 C: 14	Depressive symptoms (BDI-II, QIDS-SR)	No	25.0	Y/N	N/Y/Y^a

8.	Danieli et al. [47]	Italy/community	Two-arm RCT	Adults with distress, anxiety, and depression (NR)	47.76 M: 19.0% F: 81.0%	Mobile-based (TEO) intervention (Chatbot, m-PHA) + SMT-CBT	Active control (SMT-CBT)	T: 21 I: 11 C: 10	Depressive and anxiety (SCL-90-R), stress (PSS) symptoms	3 months	0	N/N	N/Y/Y^a

9.	Danieli et al. [48]	Italy/community	Four-arm RCT	Older adults stress and anxiety symptoms (NR)	55.58 M: 22.0% F: 78.0%	I₁: Mobile-based (TEO) intervention (chatbot, m-PHA) + SMT-CBT I₂: Mobile-based (TEO) intervention (Chatbot, m-PHA)	C₁: Active control (SMT-CBT) C₂: Passive control (waitlist)	T: 60 I₁: 16 I₂: 14 C₁: 16 C₂: 14	Depressive (SCL-90-R and PHQ-8), anxiety (SCL-90-R and GAD-7), and stress (PSS) symptoms	3 months	5	N/N	N/Y/Y^a

10.	Fischer et al. [49]	Germany/clinical	Two-arm RCT	Adults with multiple sclerosis and depressive symptoms (NR)	45.28 M: 22.2% F: 77.8%	Internet-based intervention (Deprexis)	Passive control (waitlist)	T: 90 I: 45 C: 45	Depressive symptoms (BDI)	3 months	21.1	Y/Y	N/Y/Y^a

11.	Fitzpatrick, Darcy, and Vierhile [12]	United States/community	Two-arm RCT	Young adults with anxiety and depressive symptoms (NR)	22.2 M: 19.0% F: 81.0%	Computer-based/mobile-based intervention (Chatbot, Woebot)	Active control (eBook on depression)	T: 70 I: 34 C: 36	Depressive (PHQ-9), anxiety (GAD-7) symptoms	No	20.0	Y/Y	N/N/Y^c

12.	Fitzsimmons-Craft et al. [50]	United States/community	Two-arm RCT	Female young adults with risk of eating disorders (NR)	21.08 M: 0.0% F: 100.0%	Internet-based (StudentBodies) intervention (Chatbot, Tessa)	Passive control (waitlist)	T: 700 I: 352 C: 348	Depressive (PHQ-8), anxiety (GAD-7) symptoms	6 months	37.3	Y/Y	N/Y/Y^a

13.	Gaffney et al. [51]	United Kingdom/community	Two-arm RCT	Young adults (students in university) with distress	21.4 M: 21.4% F: 78.6%	Internet-based conversational agent (Chatbot, MYLO)	Active control Internet-based conversational agent (Chatbot, ELIZA)	T: 48 I: 26 C: 22	Depressive, anxiety, and stress symptoms (DASS-21)	2 weeks	10.4	N/N	N/N/Y^a

14.	Guțu et al. [52]	Romania/community	Two-arm RCT	Young adults from social media	21.82 M: 1.1% F: 98.9%	Computer-based/mobile-based intervention (Chatbot, Woebot)	Active control (psychoeducational daily email)	T: 212 I: 106 C: 106	Depression, and anxiety symptoms (DASS-21)	No	55.2	Y/Y	N/N/NR

15.	He et al. [53]	China/community	Three-arm RCT	Young adults with depressive symptoms (CSMHSS: 2–3)	18.78 M: 62.8% F: 37.2%	Internet-based intervention (Chatbot, XiaoE)	C₁: Active control (e-book on depression) C₂: Active control (Chatbot, Xiaoai)	T: 148 I: 49 C₁: 49 C₂: 50	Depressive symptoms (PHQ-9)	1 month	15.5	Y/Y	N/Y/Y^a

16.	Hunt et al. [54]	United States/community	Crossover trial	Adults with IBS (by physician or Rome IV criteria)	32.00 M: 24.8% F: 75.2%	Mobile-based intervention (Chatbot, Zemedy)	Passive control (waitlist)	T: 121 I: 62 C: 59	Depressive (PHQ-9), and anxiety symptoms (DASS-21)	3 months	28.1	Y/Y	N/Y/Y^b

17.	Jang et al. [55]	Korea/clinical	Two-arm RCT	Adults with attention-deficit (ADHD score: 4/6 items)	25.1 M: 43% F: 57%	Mobile-based intervention (Chatbot, Todaki)	Active control (self-help information of ADHD)	T: 46 I: 23 C: 23	Depressive (QIDS-SR), anxiety (SAS), and stress (PSS) symptoms	No	19.6	Y/N	N/N/Y^c

18.	Klein et al. [56]	Germany/clinical and community	Two-arm RCT	Adults with depressive symptoms (PHQ-9 : 5–14)	42.9 M: 31.4% F: 68.6%	Internet-based intervention (Deprexis)	Passive control (waitlist)	T: 1013 I: 509 C: 504	Depressive symptoms (PHQ-9, HDRS-24)	12 months	21.6	Y/N	Y/Y/Y^a

19.	Klos et al. [57]	Argentina/community	Two-arm RCT	Young adults (students from university)	18–33 M: 12.7% F: 87.3%	Internet-based intervention (Chatbot, Tess)	Active control (psychoeducation eBook on affective symptoms)	T: 181 I: 99 C: 82	Depressive (PHQ-9), anxiety (GAD-7) symptoms	No	59.7	N/N	N/N/NR

20.	Liu et al. [58]	China/community	Two-arm RCT	Young adults with depressive symptoms (PHQ-9 ≥ 9)	23.08 M: 44.6% F: 55.4%	Pipeline-based intervention (Chatbot, XiaoNan)	Active control (self-help bibliotherapy intervention)	T: 83 I: 41 C: 42	Depressive (PHQ-9), anxiety (GAD-7) symptoms	No	24.1	Y/Y	N/N/No

21.	Ly, Ly, and Andersson [59]	Sweden/community	Two-arm RCT	Young adults (students from universities, website, and social media)	26.2 M: 46.4% F: 53.6%	Mobile-based intervention (Chatbot, Shim)	Passive control (waitlist)	T: 28 I: 14 C: 14	Stress symptoms (PSS)	No	0	Y/N	N/N/Y^a

22.	Maeda et al. [60]	Japan/community	Three-arm RCT	Female young adults who want a baby	28.77 M: 0% F: 100%	Internet-based intervention (online chatbot for fertility education)	C₁: Active control (online information related fertility) C₂: Passive control (online generic information)	T: 927 I: 309 C₁: 309 C₂: 309	Anxiety symptoms (STAI)	No	0	Y/N	N/Y/Y^a,c

23.	Meyer et al. [61]	Germany/community	Two-arm RCT	Adults with depressive symptoms (NR)	34.76 M: 24% F: 76%	Internet-based intervention (Deprexis)	Passive control (waitlist)	T: 396 I: 320 C: 76	Depressive symptoms (BDI)	6 months	45.5	Y/Y	N/N/NR

24.	Meyer et al. [62]	Germany/clinical and community	Two-arm RCT	Adults with depressive symptoms (PHQ-9 : 15 – 27)	42.0 M: 25.2% F: 74.8%	Internet-based intervention (Deprexis)	Passive control (waitlist)	T: 163 I: 78 C: 85	Depressive (PHQ-9), anxiety (GAD-7) symptoms	6 months	17.8	Y/Y	N/Y/Y^c

25.	Moritz et al. [63]	Germany/community	Two-arm RCT	Adults with depressive symptoms (NR)	38.57 M: 11.5% F: 78.6%	Internet-based intervention (Deprexis)	Passive control (waitlist)	T: 210 I: 105 C: 105	Depressive symptoms (BDI)	NR	19.0	Y/N	N/Y/Y^c

26.	Oh et al. [64]	Korea/clinical	Two-arm RCT	Adults with panic symptoms (MINI)	40.97 M: 48.8% F: 51.2%	Mobile-based intervention (Chatbot, Todaki	Active control (book for panic disorder)	T: 45 I: 23 C: 22	Depressive and anxiety symptoms (HADS)	No	8.89	N/N	N/N/Y^c

27.	Prochaska et al. [65]	United States/community	Two-arm RCT	Adult with substance misuse (CAGE-AID > 1)	40 M: 35% F: 65%	Computer-based/mobile-based intervention (Chatbot, Woebot)	Passive control (waitlist)	T: 180 I: 88 C: 92	Depressive (PHQ-8), anxiety (GAD-7) symptoms	8 weeks	15.6	Y/N	N/Y/NR

28.	Sandoval et al. [66]	United States/community	Two-arm RCT	Adults with MDD or dysthymic disorder (DSM-IV-TR, PHQ-9 > 9)	28.78 M: 37.8% F: 62.2%	Interactive media based, computer-delivered depression treatment program (imbPST)	Passive control (waitlist)	T: 45 I: 25 C: 20	Depressive symptoms (BDI-II, HSCL-20-d)	No	0	N/N	N/N/Y^a

29.	Schroder et al. [67]	Germany/clinical and community	Two-arm RCT	Adults with epilepsy (PESOS) and depressive symptoms (NR)	37.59 M: 24.4% F: 75.6%	Internet-based intervention (Deprexis)	Passive control (waitlist)	T: 78 I: 38 C: 40	Depressive symptoms (BDI)	No	26.9	Y/N	N/Y/NR

30	Zwerenz et al. [68, 69]	Germany/clinical	Two-arm RCT	Adults with depressive symptoms (BDI-II >13, ICD-10)	47.98 M: 39.3% F: 60.7%	Internet-based intervention (Deprexis)	Passive control (treatment as usual)	T: 229 I: 115 C: 114	Depression (BDI-II)	6 months	13.5	Y/Y	Y/Y/Y^a

Abbreviations: ADHD, attention-deficit/hyperactivity disorder; ADHD score, Attention-Deficit/Hyperactivity Disorder Self-Rating Scale Version 1.1 regardless psychiatric diagnosis; AI, artificial intelligence; Attri, attrition rate; BDI, Beck Depression Inventory; BDI-II, Beck Depression Inventory-II; C, comparator; CAGE-AID, cut down, annoyed, guilty, eye opener-adapted to included drugs; CSMHSS, College Students Mental Health Screening Scale; DASS-21, Depression, Anxiety, and Stress Scale short form; Deprexis, an Internet-based software platform that provides personalized cognitive behavioral therapy-based support to help improve depression symptoms; DSM-IV-TR, Diagnostic and Statistical Manual of Mental Disorders Text Revision Fourth Edition; ELIZA, a chatbot that mimics a therapist using a humanistic principle; F, female; GAD-7, General Anxiety Disorder 7-item scale; HADS, Hospital Anxiety and Depression Scale; HDRS-17, Hamilton Depression Rating Scale; HDRS-24, Hamilton Depression Rating Scale; Help4Mood, an interactive system with an embodied virtual agent (avatar) to assist in self-monitoring of patients receiving treatment for depression; HSCL-20-d, Hopkins Symptom Checklist 20-Item Depression Scale; I, intervention; IBS, irritable bowel syndrome; ICD-10, International Classification of Diseases 10th Revision); imbPST, interactive media-based, computer-delivered depression treatment program; ITT, intention-to-treat analysis; M, male; MDD, major depressive disorder; MDM, missing data management; MINI, Mini-International Neuropsychiatric Interview; m-PHA, mobile personal health care agent; MYLO, Manage Your Life Online; N, no; NR, not reported; PESOS, an epilepsy-specific inventory, the performance, sociodemographic aspects, subjective estimation; PHQ-8, Patient Health Questionnaire-8-item scale; PHQ-9, Patient Health Questionnaire 9-item scale; PSS, Perceived Stress Scale; QIDS-SR, Quick Inventory of Depressive Symptoms-Self-Report; RCT, randomized controlled trial; ROME IV, ROME IV diagnostic criteria for irritable bowel syndrome; SAS, Self-Rating Anxiety Scale; SCL–90-R, Symptom Checklist−90-Revised; SMT-CBT, stress management training and cognitive behavioral therapy; STAI, State-Trait Anxiety Inventory; T, total; TEO, therapy empowerment opportunity; Xiaoai, a chatbot for small talk with unrestricted content; Y, yes.
^aGrants were not industry sponsored.
^bGrants were industry sponsored but declared that the funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
^cGrants were industry sponsored.

3.2. Description of AI-Based Psychotherapeutic Interventions

Fourteen types of AI-based chatbot were found, including Deprexis (n = 11) [11, 12, 14, 26, 39, 40, 42, 48, 50–52], Manage Your Life Online (n = 3) [8, 41, 74], Woebot (n = 3) [2, 39, 73], Therapy Empowerment Opportunity (n = 2) [7, 44], and others. The content of the interventions is described in Table 2. Table S6 provides a summary of 14 different types of AI-based chatbots. The psychological principle was largely grounded in CBT (n = 25), and the area of use was mainly for the treatment of various conditions (n = 26). The main functions included counseling (n = 18), therapy (n = 18), and monitoring (n = 20) via the Internet (n = 18). Most interventions (n = 27) were self-guided and four of them [7, 40, 44, 46] were supported by a therapist. Two trials [40, 46] used therapists for counseling and therapy, while others used a self-help version of an AI-based chatbot. Berger et al. [40, 42] adopted one intervention group with a low-intensity therapist-guided self-help version of Deprexis, and Fitzsimmons-Craft et al. [46, 50] relied on human authoring of conversations via a chatbot (Tessa). Two trials [7, 44] used the mobile personal health care agent (m-PHA) to communicate with patients, but the therapist supervised the m-PHA interactions with the patients. The therapists provided support regarding the events mentioned during the therapy sessions, as well as reviewing the notes and recollections [7, 44]. The duration of intervention ranged from one time [41] to 16 weeks [58]. Most trials did not report the frequency and time of usage. Response generation contained rule-based (n = 16), NLP (n = 14), and other AI technologies. Input and output modalities involved written (n = 30), spoken (n = 3) [43, 70, 72], visual (n = 1) [72], and emojis (n = 1) [71]. Seven embodied chatbots [9, 16, 23, 43, 47, 49, 70] were observed.

Table 2. Description of artificial intelligence-based psychotherapeutic intervention of selected 30 randomized controlled trials in 31 publications.

Number	Author, year	Name of chatbot Deprexis/other	Principle CBT/other	Area of use treatment/prevention	Function	Platform Internet/other	Guide	Frequency/Duration/Time use	Response generation rule-based/other	Input	Output	Embodied Yes/No
1.	Beevers et al. [40]	Deprexis	CBT	Treatment of depression	-Counselling -Therapy -Monitoring	Internet	Self	NR/8 weeks/261.6 min	Rule-based	Written	Written	No

2.	Bennion et al. [41]	MYLO	Method of levels therapy	Treatment of distress	-Counselling	Internet	Self	One time/24.17 min	Rule-based	Written	Written	No

3.	Berger et al. [42]	Deprexis	CBT	Treatment of depression	-Counselling -Therapy -Monitoring	Internet	I₁: Self I₂: Therapist	NR/ 10 weeks I1 : 417, I2 : 255	Rule-based	Written	Written	No

4.	Berger et al. [43]	Deprexis	CBT	Treatment of depression	-Counselling -Therapy -Monitoring	Internet	Self	NR/12 weeks/ 599 min	Rule-based	Written	Written	No

5.	Bird et al. [44]	MYLO	Method of levels therapy	Treatment of distress	-Counselling	Internet	Self	One time/ 13 min	Rule-based	Written	Written	No

6.	Bücker et al. [45]	Deprexis	CBT	Treatment of gambling and mood	-Counselling -Therapy -Monitoring	Internet	Self	NR/ 8 weeks/82.6 min	Rule-based	Written	Written	No

7.	Burton et al. [46]	Help4Mood	CBT	Treatment of depression	-Engagement -Adherence -Monitoring	Computer app	Self	10.5 time (median) / 4 weeks / 134 min (median)	Natural language processing	Written	Written Spoken Visual	Yes

8.	Danieli et al. [47]	m-PHA (TEO)	CBT	Treatment of distress, anxiety, depression	-Monitoring	Mobile app	Therapist	NR/ 8 weeks/NR	Natural language processing	Written	Written	No

9.	Danieli et al. [48]	m-PHA (TEO)	CBT	Treatment of distress, anxiety symptoms	-Monitoring	Mobile app	Therapist	NR/8 weeks /NR	Natural language processing	Written	Written	No

10.	Fischer et al. [49]	Deprexis	CBT	Treatment of depression	-Counselling -Therapy -Monitoring	Internet	Self	NR/9 weeks/332 min	Rule-based	Written	Written	No

11.	Fitzpatrick, Darcy, and Vierhile [12]	Woebot	CBT	Treatment of depression and anxiety	-Counselling -Therapy -Monitoring -Engagement -Motivation -Reflection	Computer app or mobile app	Self	12.14/ 2 weeks/ NR	Natural language processing	Written	Written	No

12.	Fitzsimmons-Craft et al. [50]	Tessa	CBT	Eating disorders	-Counselling -Therapy -Healthy eating	Internet (X2AI) via SMS or Facebook Messenger	Therapist	NR/ 12 weeks/ NR	Rule-based/algorithm-based	Written	Written	No

13.	Gaffney et al. [51]	MYLO	Method of levels therapy	Treatment of distress	-Counselling	Internet	Self	One time/19.23 min	Rule-based	Written	Written	No

14.	Guțu et al. [52]	Woebot	CBT	Prevention	-Counselling -Therapy -Monitoring -Engagement -Motivation -Reflection	Computer app or Mobile app	Self	NR/ 2 weeks/ NR	Natural language processing	Written	Written	No

15.	He et al. [53]	XiaoE	CBT	Treatment of depression	-Counselling -Monitoring -Engagement	Internet (WeChat)	Self	25.45/day/ 1 week/ NR	Natural language processing and deep learning	Written	Written Spoken Image	No

16.	Hunt et al. [54]	Zemedy	CBT	Irritable bowel syndrome	-Counselling -Education -Therapy -Healthy diet	Mobile app	Self	1 per week/ 8 weeks	Natural language processing	Written	Written	Yes

17.	Jang et al. [55]	Todaki	CBT	Attention-deficit/hyperactivity disorder	-Self-diagnosis -Education -Therapy -Compliance	Mobile app	Self	20.32/ 4 weeks/ 75 min	Natural language processing	Written	Written	Yes

18.	Klein et al. [56]	Deprexis	CBT	Treatment of depression	-Counselling -Therapy -Monitoring	Internet	Self	NR / 12 weeks/ 497 min	Rule-based	Written	Written	No

19.	Klos et al. [57]	Tess	CBT, EFT, SFT, MI	Prevention	-Reminders -Psychoeducational -Emotional support	Internet/app via Facebook Messenger	Self	NR / 8 weeks/ NR	Natural language processing	Written Emojis	Written Emojis	No

20.	Liu et al. [58]	XiaoNan	CBT	Treatment of depression	-Emotion assessment -Counselling -Therapy -Monitoring	Internet/app (WeChat / IFLYTEK open platform)	Self	NR/16 weeks/ NR	Natural language processing, intention classification and emotion recognition.	Written Spoken	Written	Yes

21.	Ly, Ly, and Andersson [59]	Shim	CBT, positive psychology	Prevention	-Reflection -Awareness -Value-based life	Smartphone app	Self	17.1/ 2 weeks/ NR	Rule-based	Written	Written	No

22.	Maeda et al. [60]	Chatbot for fertility education	Transtheoretical model	Prevention	-Education -Counselling	Internet (chat via Google Cloud’s Dialogflow)	Self	NR/ 12 weeks/ NR	Natural language processing	Written	Written	Yes

23.	Meyer et al. [61]	Deprexis	CBT	Treatment of depression	-Counselling -Therapy -Monitoring	Internet	Self	NR/ 9 weeks/ NR	Rule-based	Written	Written	No

24.	Meyer et al. [62]	Deprexis	CBT	Treatment of depression	-Counselling -Therapy -Monitoring	Internet	Self	NR/ 12 weeks/ 457 min	Rule-based	Written	Written	No

25.	Moritz et al. [63]	Deprexis	CBT	Treatment of depression	-Counselling -Therapy -Monitoring	Internet	Self	NR / 12 weeks/ 210 min	Rule-based	Written	Written	No

26.	Oh et al. [64]	Todaki	CBT	Panic disorder	-Checking -Education -Therapy -Exposure training	Mobile app	Self	NR/ 4 weeks/ 50 min	Natural language processing	Written	Written	Yes

27.	Prochaska et al. [65]	Woebot	CBT	Substance use	-Counselling -Therapy -Monitoring -Engagement -Motivation -Reflection	Mobile app	Self	NR/ 8 weeks/ NR	Natural language processing	Written	Written	No

28.	Sandoval et al. [66]	imbPST	Problem-solving therapy	Treatment of depression	-Problem-solving -Psychoeducation -Monitoring -Engagement	Computer software	Self	NR/ 6 weeks/ 4.9 h	Rule-based	Written	Written	Yes

29.	Schroder et al. [67]	Deprexis	CBT	Treatment of depression	-Counselling -Therapy -Monitoring	Internet	Self	NR/ 9 weeks/ NR	Rule-based	Written	Written	No

30.	Zwerenz et al. [68, 69]	Deprexis	CBT	Treatment of depression	-Counselling -Therapy -Monitoring	Internet	Self	NR/ 12 weeks/ NR	Rule-based	Written	Written	No

Abbreviations: App, application; CBT, cognitive behavioral therapy; EFT, emotion-focused therapy; MI, motivational interviewing; NR, not reported; SFT, solution-focused brief therapy.

3.3. Individual Quality Assessment

A total of 30 RCTs were evaluated against the RoB 2.0 criteria (Figure S1). Twenty-three trials used ITT analysis, and seven trials used perprotocol analysis. Low risk of bias rating across five domains was found in the majority (79.1%) of trials with ITT analysis but less than half (48.6%) of trials with perprotocol analysis. Nine trials (30%) did not provide information on allocation. Seventeen trials (56.7%) rated some concerns for deviations from the intended intervention because the participants and personals were aware of the assigned intervention. Table 1 displays that 13 trials (43.3%) did not publish a protocol or did not register in clinical trial registries, so the selection of the reported results had some concerns. The attrition rate ranged from 0% [9] to 59.7% [71]. The majority (80%) of trials received grants from various sources, including 16 nonindustry-sponsored grants, seven industry-sponsored grants, and one trial [45] that included both industry-sponsored and nonindustry-sponsored grants. However, six trials did not report or had no grant support. Even though seven trials mentioned industry-sponsored grants, two trials [23, 42] declared that they were not involved in data analysis.

3.4. Depressive Symptoms

A total of 29 arms of 26 RCTs [2, 7, 9, 11, 12, 14, 16, 23, 26, 39–48, 50–52, 70, 72–74] among 4349 individuals, eight arms of six RCTs [7, 23, 41, 44, 72, 74] among 418 individuals, and eight arms of seven RCTs [11, 12, 26, 40, 46, 48, 50] among 2268 participants evaluated the effects of AI-based psychotherapeutic interventions at the postintervention assessment and 2 weeks to 3 months and 6–12 months of follow-up assessments (Figure 2). The meta-analyses showed that AI-based psychotherapeutic interventions significantly reduced depressive symptoms at the postintervention assessment (t = −4.40, p = 0.001) with medium effect size (g = −0.54, 95% CI: −0.79 to −0.29) and 6–12 months of follow-up assessment (t = −3.14, p < 0.016) with small effect size (g = −0.23, 95% CI: −0.40 to −0.06) compared with the comparators. No differences were observed between the intervention and comparator at 2 weeks to 3 months of follow-up assessment (t = −0.08, p = 0.936).

The 95% PIs were −1.77 to 0.69, −1.27 to 1.24, and −0.64 to 0.19 for three-time points. Given that the 95% PI contained values on both sides of the null of 0, suggesting that the intervention will predict an insignificant reduction of depressive symptoms in future similar studies. Heterogeneity was substantial (I² = 70%–85%) for postintervention assessment and 2 weeks to 3 months of follow-up assessment and moderate (I² = 42%) for 6–12 months of follow-up assessment. Given the presence of substantial heterogeneity at the postintervention assessment, subgroup analyses, and meta-regression analyses were conducted to explore the reasons for heterogeneity.

We conducted subgroup analyses as shown in Table 3 and Figures S2–S26. Significant differences (p < 0.1) were found between subgroups based on participants’ nature, age group, embodiment, ITT/MDM, and protocol/registration on reduction of depressive symptoms at three-time points. Trials that were conducted among the participants with depression or depression combined with other health issues had a larger effect on reducing depressive symptoms at postintervention (g = −0.81, 95% CI: −1.16. to −0.45) and follow-up 2 weeks to 3 months (g = −0.64, 95% CI: −2.22. to 0.95) compared with their counterparts. The interventions used embodied chatbot (g = −0.57, 95% CI: −1.17 to 0.03) among those aged 31–40 (g = −0.57, 95% CI: −1.17 to 0.03) using ITT or MDM (g = −0.35, 95% CI: −1.11 to 0.40) had a greater effect on reducing depressive symptoms at follow-up 2 weeks to 3 months than their counterparts. We observed trials with a protocol or registration (g = −0.32, 95% CI: −0.65 to 0.02) have a greater effect on the reduction of depressive symptoms at follow-up 6 months to 12 months than those without a protocol or registration. Hence, between-trial heterogeneity could be partially explained by participant characteristics.

Table 3. Subgroup analyses of AI chatbot on depressive and anxiety symptoms.

Category	Subgroups	Number of arms	Sample size	Effect size (g)	95% CI	I²	Subgroup difference
Depressive symptoms (postintervention)
Nature of participants	Depression ± others	17	2789	−0.81	−1.16, −0.45	86%	χ² = 3.41, p < 0.001 ^∗∗∗
	Stress/distress ± others	6	308	0.09	−0.18, 0.35	0%
	Other condition	5	1040	−0.34	−0.87, 0.19	76%
	Healthy	1	212	−0.06	−0.33, 0.21	NA
Age groups	18–30 years	10	1482	−0.48	−0.94, −0.02	84%	χ² = 3.13, p = 0.371
	31–40 years	11	1251	−0.54	−0.77, −0.30	64%
	41–50 years	6	1571	−0.69	−1.95, 0.57	94%
	>50 years	2	45	−0.24	−2.00, 1.52	0%
Type of AI chatbot	Deprexis	13	2719	−0.68	−1.12, −0.24	88%	χ² = 1.09, p = 0.296
Type of AI chatbot	Others	16	1630	−0.42	−0.74, −0.09	79%	χ² = 1.09, p = 0.296
Different comparators	Active control	6	410	−0.64	−2.05, 0.76	95%	χ² = 0.08, p = 0.772
Different comparators	Passive control	23	3939	−0.48	−0.65, −0.32	72%	χ² = 0.08, p = 0.772
Type of psychotherapy	CBT	26	4103	−0.54	−0.79, −0.29	84%	χ² = 0.00, p = 0.990
Type of psychotherapy	Others	3	246	−0.53	−3.46, 2.39	93%	χ² = 0.00, p = 0.990
Type of platforms	Internet	19	3686	−0.61	−0.93, −0.30	87%	χ² = 0.85, p = 0.356
Type of platforms	Others	10	663	−0.37	−0.85, 0.11	78%	χ² = 0.85, p = 0.356
Response generation	Rule-based	16	3453	−0.65	−1.06, −0.24	90%	χ² = 1.07, p = 0.300
Response generation	Others	13	896	−0.41	−0.70, −0.12	69%	χ² = 1.07, p = 0.300
Embodiment	Yes	5	271	−0.59	−1.08, −0.10	50%	χ² = 0.05, p = 0.815
Embodiment	No	24	4078	−0.53	−0.84, −0.23	87%	χ² = 0.05, p = 0.815
ITT or MDM	Yes	22	3455	−0.60	−0.88, −0.33	85%	χ² = 0.97, p = 0.326
ITT or MDM	No	7	894	−0.29	−1.00, 0.42	78%	χ² = 0.97, p = 0.326
Protocol or registration	Yes	15	2534	−0.55	−0.97, −0.12	87%	χ² = 0.02, p = 0.877
Protocol or registration	No	14	1815	−0.51	−0.84, −0.19	81%	χ² = 0.02, p = 0.877
Depressive symptoms (follow-up assessments at 2 weeks to 3 months)
Nature of participants	Depression ± others	2	106	−0.64	−2.22, 0.95	0%	χ² = 38.76, p < 0.001 ^∗∗∗
	Stress/distress ± others	5	267	0.32	0.04, 0.59	0%
	Other condition	1	45	−0.57	−1.17, 0.03	NA
Age groups	18–30 years	4	307	−0.17	−1.01, 0.68	78%	χ² = 10.78, p = 0.013 ^∗∗
	31–40 years	1	45	−0.57	−1.17, 0.03	NA
	41–50 years	1	21	0.99	0.07, 1.90	NA
	>50 years	2	45	0.29	−1.82, 2.40	0%
Different comparators	Active control	5	301	0.17	−0.43, 0.78	56%	χ² = 1.65, p = 0.199
Different comparators	Passive control	3	117	−0.37	−1.94, 1.19	61%	χ² = 1.65, p = 0.199
Type of psychotherapy	CBT	6	217	−0.12	−0.82, 0.59	68%	χ² = 1.85, p = 0.174
Type of psychotherapy	Others	2	201	0.26	0.15, 0.37	0%	χ² = 1.85, p = 0.174
Type of platforms	Internet	4	307	−0.17	−1.01, 0.68	78%	χ² = 0.75, p = 0.388
Type of platforms	Others	4	111	0.20	−0.86, 1.27	67%	χ² = 0.75, p = 0.388
Response generation	Rule-based	2	201	0.26	0.15, 0.37	0%	χ² = 1.85, p = 0.174
Response generation	Others	6	217	−0.12	−0.82, 0.59	68%	χ² = 1.85, p = 0.174
Embodiment	Yes	1	45	−0.57	−1.17, 0.03	NA	χ² = 2.92, p = 0.088 ^∗
Embodiment	No	7	373	0.07	−0.47, 0.60	68%	χ² = 2.92, p = 0.088 ^∗
ITT or MDM	Yes	4	310	−0.35	−1.11, 0.40	79%	χ² = 6.96, p = 0.008 ^∗∗
ITT or MDM	No	4	108	0.41	−0.12, 0.94	0%	χ² = 6.96, p = 0.008 ^∗∗
Protocol or registration	Yes	4	111	0.20	−0.86, 1.27	67%	χ² = 0.75, p = 0.388
Protocol or registration	No	4	307	−0.17	−1.01, 0.68	78%	χ² = 0.75, p = 0.388
Depressive symptoms (follow-up assessments at 6 months to 12 months)
Nature of participants	Depression ± others	7	1568	−0.27	−0.48, −0.07	45%	χ² = 2.40, p = 0.121
Nature of participants	Other condition	1	700	−0.10	−0.24, 0.05	NA	χ² = 2.40, p = 0.121
Age groups	18–30 years	1	700	−0.10	−0.24, 0.05	NA	χ² = 2.96, p = 0.228
	31–40 years	3	175	−0.13	−0.79, 0.54	0%
	41–50 years	4	1393	−0.32	−0.65, 0.02	67%
Type of AI chatbot	Deprexis	7	1568	−0.27	−0.48, −0.07	45%	χ² = 2.40, p = 0.121
Type of AI chatbot	Others	1	700	−0.10	−0.24, 0.05	NA	χ² = 2.40, p = 0.121
Different comparators	Active control	1	44	−0.27	−0.88, 0.33	NA	χ² = 0.02, p = 0.886
Different comparators	Passive control	7	2224	−0.23	−0.43, −0.03	51%	χ² = 0.02, p = 0.886
ITT or MDM	Yes	7	1568	−0.27	−0.48, −0.07	45%	χ² = 2.40, p = 0.121
ITT or MDM	No	1	700	−0.10	−0.24, 0.05	NA	χ² = 2.40, p = 0.121
Protocol or registration	Yes	4	1393	−0.32	−0.65, 0.02	67%	χ² = 3.48, p = 0.062 ^∗
Protocol or registration	No	4	875	−0.10	−0.25, 0.05	0%	χ² = 3.48, p = 0.062 ^∗
Anxiety symptoms (postintervention)
Nature of participants	Depression ± others	4	385	−0.60	−1.97, 0.77	92%	χ² = 5.89, p = 0.117
	Stress/distress ± others	6	308	0.12	−0.37, 0.61	45%
	Other condition	5	1040	−0.15	−0.28, −0.02	0%
	Healthy	4	1189	−0.31	−0.61, −0.02	48%
Age groups	18–30 years	10	2289	−0.15	−0.33, 0.02	58%	χ² = 1.43, p = 0.698
	31–40 years	4	335	−0.25	−0.45, −0.06	0%
	41–50 years	3	253	−0.40	−4.14, 3.35	95%
	>50 years	2	45	−0.36	−3.79, 3.07	0%
Type of AI chatbot	Deprexis	3	294	−0.86	−3.00, 1.28	93%	χ² = 1.97, p = 0.160
Type of AI chatbot	Other	16	2628	−0.15	−0.31, 0.00	53%	χ² = 1.97, p = 0.160
Different comparators	Active control	12	1313	−0.20	−0.64, 0.23	85%	χ² = 0.03, p = 0.860
Different comparators	Passive control	7	1609	−0.24	−0.36, −0.12	0%	χ² = 0.03, p = 0.860
Type of psychotherapy	CBT	15	1794	−0.27	−0.59, 0.04	78%	χ² = 0.33, p = 0.563
Type of psychotherapy	Others	4	1128	−0.14	−0.70, 0.42	79%	χ² = 0.33, p = 0.563
Type of platforms	Internet	10	2255	−0.38	−0.78, 0.03	86%	χ² = 1.90, p = 0.168
Type of platforms	Others	9	667	−0.10	−0.31, 0.12	32%	χ² = 1.90, p = 0.168
Response generation	Rule-based	6	1195	−0.37	−1.18, 0.44	91%	χ² = 0.19, p = 0.663
Response generation	Others	13	1727	−0.23	−0.40, −0.06	43%	χ² = 0.19, p = 0.663
Embodiment	Yes	6	1177	−0.36	−0.48, −0.24	0%	χ² = 0.68, p = 0.409
Embodiment	No	13	1745	−0.20	−0.59, 0.19	83%	χ² = 0.68, p = 0.409
ITT or MDM	Yes	13	2723	−0.31	−0.60, −0.02	82%	χ² = 1.19, p = 0.276
ITT or MDM	No	6	199	−0.00	−0.64, 0.63	58%	χ² = 1.19, p = 0.276
Protocol or registration	Yes	11	2219	−0.36	−0.80, 0.07	84%	χ² = 2.62, p = 0.105
Protocol or registration	No	8	703	−0.03	−0.21, 0.15	1%	χ² = 2.62, p = 0.105
Anxiety symptoms (follow-up assessments at 2 weeks to 3 months)
Nature of participants	Stress/distress ± others	5	175	0.38	0.05, 0.70	0%	χ² = 2.37, p = 0.124
Nature of participants	Other condition	1	45	−0.12	−0.71, 0.47	NA	χ² = 2.37, p = 0.124
Age groups	18–30 years	2	109	0.31	−2.13, 2.74	0%	χ² = 3.41, p = 0.332
	31–40 years	1	45	−0.12	−0.71, 0.47	NA
	41–50 years	1	21	0.83	−0.07, 1.73	NA
	>50 years	2	45	0.35	−1.36, 2.06	0%
Different comparators	Active control	4	157	0.36	−0.08, 0.80	0%	χ² = 0.71, p = 0.400
Different comparators	Passive control	2	63	0.08	−3.68, 3.84	20%	χ² = 0.71, p = 0.400
Type of psychotherapy	CBT	4	111	0.27	−0.39, 0.93	12%	χ² = 0.02, p = 0.892
Type of psychotherapy	Others	2	109	0.31	−2.13, 2.74	0%	χ² = 0.02, p = 0.892
Type of platforms	Internet	2	109	0.31	−2.13, 2.74	0%	χ² = 0.02, p = 0.892
Type of platforms	Others	4	111	0.27	−0.39, 0.93	12%	χ² = 0.02, p = 0.892
Response generation	Rule-based	2	109	0.31	−2.13, 2.74	0%	χ² = 0.02, p = 0.892
Response generation	Others	4	111	0.27	−0.39, 0.93	12%	χ² = 0.02, p = 0.892
Embodiment	Yes	1	45	−0.12	−0.71, 0.47	NA	χ² = 2.37, p = 0.124
Embodiment	No	5	175	0.38	0.05, 0.70	0%	χ² = 2.37, p = 0.124
ITT or MDM	Yes	2	112	0.05	−1.67, 1.76	0%	χ² = 7.15, p = 0.008 ^∗∗
ITT or MDM	No	4	108	0.52	0.16, 0.88	0%	χ² = 7.15, p = 0.008 ^∗∗
Protocol or registration	Yes	4	111	0.27	−0.39, 0.93	12%	χ² = 0.02, p = 0.890
Protocol or registration	No	2	109	0.31	−2.13, 2.74	0%	χ² = 0.02, p = 0.890
Anxiety symptoms (follow-up assessments at 6 months)
Nature of participants	Depression ± others	3	380	−0.37	−0.70, −0.05	0%	χ² = 6.71, p < 0.001 ^∗∗∗
Nature of participants	Other condition	1	700	−0.10	−0.24, 0.05	NA	χ² = 6.71, p < 0.001 ^∗∗∗
Age groups	18–30 years	1	700	−0.10	−0.24, 0.05	NA	χ² = 6.71, p < 0.001 ^∗∗∗
Age groups	41–50 years	3	380	−0.37	−0.70, −0.05	0%	χ² = 6.71, p < 0.001 ^∗∗∗
Type of AI chatbot	Deprexis	3	380	−0.37	−0.70, −0.05	0%	χ² = 6.71, p < 0.001 ^∗∗∗
Type of AI chatbot	Other	1	700	−0.10	−0.24, 0.05	NA	χ² = 6.71, p < 0.001 ^∗∗∗
Different comparators	Active control	1	44	−0.22	−0.82, 0.39	NA	χ² = 0.01, p = 0.910
Different comparators	Passive control	3	1036	−0.25	−0.74, 0.23	65%	χ² = 0.01, p = 0.910
Stress symptoms (postintervention)
Nature of participants	Stress/distress ± others	5	267	0.07	−0.23, 0.37	0%	χ² = 5.85, p = 0.054 ^∗
	Other condition	2	126	−0.32	−4.67, 4.02	70%
	Healthy	1	28	−0.85	−1.63, −0.07	NA
Age groups	18–30 years	4	275	−0.03	−0.66, 0.59	46%	χ² = 5.93, p = 0.115
	31–40 years	1	80	−0.64	−1.09, −0.19	NA
	41–50 years	1	21	0.31	−0.55, 1.17	NA
	>50 years	2	45	−0.35	−3.34, 2.63	0%
Different comparators	Active control	4	249	0.08	−0.32, 0.49	0%	χ² = 3.80, p = 0.051 ^∗
Different comparators	Passive control	4	172	−0.40	−1.07, 0.27	41%	χ² = 3.80, p = 0.051 ^∗
Type of psychotherapy	CBT	6	220	−0.33	−0.79, 0.12	36%	χ² = 6.91, p = 0.009 ^∗∗
Type of psychotherapy	Others	2	201	0.14	−0.15, 0.43	0%	χ² = 6.91, p = 0.009 ^∗∗
Type of platforms	Internet	2	201	0.14	−0.15, 0.43	0%	χ² = 6.91, p = 0.009 ^∗∗
Type of platforms	Others	6	220	−0.33	−0.79, 0.12	36%	χ² = 6.91, p = 0.009 ^∗∗
Response generation	Rule-based	3	229	−0.12	−1.41, 1.17	64%	χ² = 0.13, p = 0.717
Response generation	Others	5	192	−0.25	−0.75, 0.26	34%	χ² = 0.13, p = 0.717
Embodiment	Yes	2	126	−0.32	−4.67, 4.02	70%	χ² = 0.37, p = 0.544
Embodiment	No	6	295	−0.09	−0.54, 0.35	37%	χ² = 0.37, p = 0.544
ITT or MDM	Yes	4	313	−0.27	−1.05, 0.50	74%	χ² = 0.56, p = 0.453
ITT or MDM	No	4	108	−0.05	−0.61, 0.51	0%	χ² = 0.56, p = 0.453
Protocol or registration	Yes	4	146	−0.34	−1.03, 0.34	30%	χ² = 1.14, p = 0.286
Protocol or registration	No	4	275	−0.03	−0.66, 0.59	46%	χ² = 1.14, p = 0.286
Stress symptoms (follow-up assessments at 2 weeks to 3 months)
Nature of participants	Stress/distress ± others	5	267	0.34	0.02, 0.66	0%	χ² = 2.78, p = 0.096 ^∗
Nature of participants	Other condition	1	45	−0.20	−0.78, 0.39	NA	χ² = 2.78, p = 0.096 ^∗
Age groups	18–30 years	2	201	0.31	0.27, 0.35	0%	χ² = 5.59, p = 0.133
	31–40 years	1	45	−0.20	−0.78, 0.39	NA
	41–50 years	1	21	1.06	0.13, 1.98	NA
	>50 years	2	45	0.17	−3.45, 3.79	0%
Different comparators	Active control	4	249	0.33	−0.11, 0.76	11%	χ² = 0.48, p = 0.490
Different comparators	Passive control	2	63	0.07	−4.37, 4.51	39%	χ² = 0.48, p = 0.490
Type of psychotherapy	CBT	4	111	0.25	−0.65, 1.16	49%	χ² = 0.04, p = 0.839
Type of psychotherapy	Others	2	201	0.31	0.27, 0.35	0%	χ² = 0.04, p = 0.839
Type of platforms	Internet	2	201	0.31	0.27, 0.35	0%	χ² = 0.04, p = 0.839
Type of platforms	Others	4	111	0.25	−0.65, 1.16	49%	χ² = 0.04, p = 0.839
Response generation	Rule-based	2	201	0.31	0.27, 0.35	0%	χ² = 0.04, p = 0.839
Response generation	Others	4	111	0.25	−0.65, 1.16	49%	χ² = 0.04, p = 0.839
Embodiment	Yes	1	45	−0.20	−0.78, 0.39	NA	χ² = 2.78, p = 0.096 ^∗
Embodiment	No	5	267	0.34	0.02, 0.66	0%	χ² = 2.78, p = 0.096 ^∗
ITT or MDM	Yes	2	204	0.12	−3.01, 3.25	56%	χ² = 0.64, p = 0.424
ITT or MDM	No	4	108	0.38	−0.29, 1.06	13%	χ² = 0.64, p = 0.424
Protocol or registration	Yes	4	111	0.25	−0.65, 1.16	49%	χ² = 0.04, p = 0.839
Protocol or registration	No	2	201	0.31	0.27, 0.35	0%	χ² = 0.04, p = 0.839

Note: I² means heterogeneity.
p < 0.05 ^∗, p < 0.01 ^∗∗, p < 0.001 ^∗∗∗.

A series of random-effects meta-regression analyses were conducted to evaluate the effect of the various covariates on the effect size of depressive symptoms (Table 4). The univariate meta-regression analyses concluded that publication year (β = 0.017, p = 0.617), duration of intervention based on the number of days (β = −0.004, p = 0.270), sample size (β = 0.001, p = 0.383), attrition rate (β = −0.003, p = 0.774), and the portion of males (β = −0.011, p = 0.134) had no effects on depressive symptoms. Thus, the between-trial heterogeneity could not be explained by these covariates.

Table 4. Random-effects univariate meta-regression analyses of covariates on depression and anxiety at postintervention.

Covariates	Depressive symptoms					Anxiety symptoms
Covariates	β	SE	95% lower	95% upper	p-Value	β	SE	95% lower	95% upper	p-Value
Year of publication	0.017	0.033	−0.052	0.085	0.617	−0.001	0.056	−0.118	0.117	0.993
Duration of intervention (days)	−0.004	0.004	−0.01	0.003	0.270	−0.007	0.003	−0.013	<0.001	0.056
Sample size	0.001	0.001	−0.001	0.002	0.383	<−0.001	<0.001	−0.001	0.001	0.671
Attrition rate	−0.003	0.009	−0.020	0.015	0.774	−0.007	0.007	−0.020	0.007	0.339
Portion of males	−0.011	0.007	−0.025	0.004	0.134	−0.006	0.007	−0.021	0.009	0.407

Note: β means regression coefficients.
Abbreviation: SE, standard error.

3.5. Anxiety Symptoms

A total of 19 arms of 17 trials [2, 7, 16, 23, 40–42, 44–47, 49, 50, 70, 71, 73, 74] involving 2922 participants at the postintervention assessment, six arms of five trials [7, 23, 41, 44, 74] including 220 participants at 2 weeks to 3 months of follow-up assessment, and four trials [11, 26, 46, 50] of 1080 participants at 6–12 months of follow-up assessment were found. Meta-analyses showed no differences between the intervention and comparator at the postintervention assessment (t = −1.95, p = 0.067) and 2 weeks to 3 months (t = 2.08, p = 0.093) and 6–12 months (t = −2.82, p = 0.067) of follow-up assessment, as shown in Figure 3.

The 95% PIs were −1.19 to 0.71, −0.11 to 0.65, and −0.99 to 0.51 for three-time points. Hence, the intervention will predict an insignificant reduction in anxiety symptoms compared with comparators in future similar studies. Heterogeneity was substantial (I² = 78%) at the postintervention assessment, insignificant (I² = 0%) at 2 weeks to 3 months of follow-up assessment, and moderate (I² = 45%) at 6–12 months of follow-up assessment. To explore the sources of heterogeneity, subgroup analyses and meta-regression analyses were performed.

We conducted a series of subgroup analyses for three-time points (Table 3 and Figures S27–S49). Significant differences (p < 0.1) were found between subgroups based on the nature of participants, age groups, the type of AI chatbot they used, and how they used ITT/MDM to improve anxiety symptoms at follow-up 2 weeks to 3 months and 6 months. Subgroup analyses showed that AI-based psychotherapeutic interventions using Deprexis with people aged 41–50 in Europe who had depression or depression along with other health problems had a bigger effect (g = −0.37, 95% CI: −0.70. to −0.05) on lowering anxiety symptoms at follow-up 6 months later than the other groups. We found a smaller effect size in the trials using ITT or MDM (g = 0.05, 95% CI: −1.67 to 1.76) on decreasing depressive symptoms at follow-up 2 weeks to 3 months when compared to its counterpart.

The univariate meta-regression analyses suggested that the publication year (β = −0.001, p = 0.993), duration of intervention based on number of days (β < −0.007, p = 0.056), sample size (β < −0.001, p = 0.671), attrition rate (β = −0.007, p = 0.339), and portion of males (β = −0.006, p = 0.407) had no effects on anxiety symptoms (Table 4). Therefore, the cause of high heterogeneity could not be explained by these covariates.

3.6. Stress Symptoms

A total of eight arms of seven RCTs [7, 23, 27, 41, 44, 47, 74] among 421 participants at the postintervention assessment and six arms of five RCTs [7, 23, 41, 44, 74] involving 312 participants at 2 weeks to 3 months of follow-up assessment were pooled to evaluate the effect of intervention on stress symptoms (Figure 4). Meta-analyses did not yield any significant differences (t = −1.18 to 2.04, p = 0.098–0.277) between the intervention and comparator at the postintervention assessment and 2 weeks to 3 months of follow-up assessment.

A series of subgroup analyses were performed for two-time points (Table 3 and Figures S50–S67). Significant differences (p < 0.1) were revealed between subgroups based on the nature of participants, their different comparators, the type of psychotherapy they received, the platforms they used, and how the embodied chatbot presented itself to decrease stress symptoms at postintervention and follow-up assessment. Trials that were conducted among participants with conditions other than stress or distress had a larger effect on reducing stress symptoms at postintervention (g = −0.32, 95% CI: −4.67 to 4.02) and follow-up (g = −0.20, 95% CI: −0.78 to 0.39) compared with their counterparts. The interventions that adopted CBT (g = −0.33, 95% CI: −0.79 to 0.12) using non-Internet platforms (g = −0.33, 95% CI: −0.79 to 0.12) and passive control (g = −0.40, 95% CI: −1.07 to 0.27) had a greater effect on decreasing stress symptoms at postintervention than their counterparts. Trials conducted in non-Europe (g = −0.20, 95% CI: −0.78. to 0.39) when the interventions used embodied chatbot (g = −0.20, 95% CI: −0.78. to 0.39) had a greater effect on improving stress symptoms at follow-up assessment when compared to their counterparts.

3.7. Depressive, Anxiety, and Stress Symptoms

Three RCTs [8, 41, 74] were found to examine the effect of intervention on the total scores of depressive, anxiety, and stress symptoms using the 21-item Depressive, Anxiety, and Stress Scale [75] in 295 participants at the postintervention assessment and 2 weeks to 3 months of follow-up assessment (Figure 5). The meta-analyses did not reveal any differences between the two groups (t = 1.34–1.46, p = 0.281–0.311).

3.8. Overall Evidence

The GRADE criteria were used to evaluate 10 outcomes of this review (Tables S7), and the certainty of evidence ranged from very low to moderate. Inconsistency, indirectness, and imprecision were downgraded due to the presence of high heterogeneity, various populations and interventions, a small sample, and a wide confidence interval. Given the more than 10 trials for depressive and anxiety symptoms at postintervention, funnel plots and Egger’s test were performed. No evidence of publication bias was found because of symmetrical funnel plots and the Egger tests (p = 0.091–0.983; Figures S68 and S69).

4. Discussion

4.1. Summary of Findings

Through 13,546 records from the 12 databases, three clinical trial registries, and other methods by using three-step comprehensive searching, we found 30 RCTs among 6100 samples across nine countries. Our review showed that AI-based psychotherapeutic interventions significantly reduced depressive symptoms at postintervention assessment with a medium effect size and 6–12 months of follow-up assessment with a small effect size compared with comparators. No significant effect of AI-based psychotherapeutic interventions was found on anxiety, stress, or the total scores of depressive, anxiety, and stress symptoms at postintervention or different periods of follow-up assessments. A series of subgroup analyses revealed significant differences in the reduction of psychological symptoms at various points based on participants’ nature, age group, type of AI chatbot, type of psychotherapy, type of platform, embodiment, different comparator, ITT/MDM, and protocol/registration. The random-effects univariate meta-regression did not detect a significant covariate on depressive and anxiety symptoms at postintervention. The majority (79.1%) of trials with ITT analysis and less than half (48.6%) of trials with perprotocol analysis rated a low risk of bias across five domains using the RoB 2.0 criteria. No publication bias was detected for depressive and anxiety symptoms at postintervention. The certainty of evidence ranged from very low to moderate for 10 psychological outcomes according to the GRADE criteria.

4.2. Depressive Symptoms

In line with a piece of previous meta-analytic evidence [18], we found that depressive symptoms significantly reduced following AI-based psychotherapeutic interventions at postintervention. Our result also indicated a significant effect at 6–12 months of follow-up assessment. Thus, AI-based psychotherapeutic interventions reduce immediate and long-term effects. AI chatbots can be designed to deliver various psychotherapies using AI technology according to different psychological principles, such as CBT [40], method of levels therapy [44], or problem-solving therapy [66]. Users may engage in the intervention in text-based or voice-activated conversations [70], and such interactions can offer psychological, relational, and emotional support [76]. Chatbots can also provide initial counseling, guide users to use a self-help library, and lead users to correct services [74]. Chatbots use AI algorithms to interpret user dialogues and conduct useful interactions. They may have a low attrition rate due to increased engagement and motivation [17]. Therefore, AI-based psychotherapeutic interventions can ameliorate depressive symptoms. Given that only seven trials included 6–12 months of follow-up assessment, a conclusion of the long-term effect of intervention cannot be made.

In our review, the majority of the interventions used rule-based response generation and less than half used NLP. Rule-based response generation consists of simple dialogue components based on rules, following a predefined decision tree and communicating in a scripted manner [74]. Conversely, generative-based response generation is more complex and relies on ML to construct its dialogues; AI uses this method to generate possible answers and enhance conversational proficiency [11]. With the increasing integration of AI technology into psychotherapy [77], future interventions can consider using advanced generative deep learning techniques that may allow AI chatbots to interact with users in an empathetic, coherent, and personalized manner [7, 74].

Seven interventions used embodied conversational agents in our review. Our subgroup analysis showed a greater effect size for embodied agents compared with nonembodied agents. An embodied conversational agent is a computer-based dialogue system with a virtual embodiment (full body or face-only) that typically interacts with users using multimodal communication cues of speech, text, animated facial expressions, or gestures [73]. Evidence showed that embodied conversational agents can build trust and rapport and can create a sense of warmth, leading to companionship and long-term usage [13, 78]. Future interventions can consider adopting embodied agents. Only one intervention [57] used emojis (images depicting facial expressions) to share and track the participants’ moods over time. Considering emotions can be used to express, imitate, and appraise the varying degrees of emotions [79]; more research is needed to evaluate its effectiveness.

Notably, the intervention failed to demonstrate superior effects at 2 weeks to 3 months of follow-up assessment in eight trials. Most comparators (62.5%) were active control groups, such as using another conversational chatbot [44, 51, 53], stress management training and CBT [47, 48], and e-books on depression [53]. This finding aligns with the results of a previous mixed-method review [16] demonstrating similar patterns. Our review revealed comparable effects between AI-based psychotherapeutic interventions and active comparators. Furthermore, a few of the participants (25%) had depressive problems at 2 weeks to 3 months of follow-up assessment. The plausible interpretation of the findings suggested that AI-based psychotherapeutic intervention may not alleviate depression symptoms in persons who are not depressed. However, we could not conclude an absolute treatment efficacy on the reduction of depressive symptoms at 2 weeks to 3 months of follow-up assessment.

Our subgroup analyses revealed that intervention had a greater effect size among participants with depression or depression combined with other health issues aged 31–40 than other age groups. One reason could be that younger adults had greater knowledge of AI [72] and more engagement in activities [80] than older adults. Therefore, young adults were more likely to adhere to interventions than older adults. Consistent with a previous review 18], the intervention significantly improved depressive symptoms in participants with depression or depression combined with other health issues. This finding suggests that interventions were more effective for treatment in depressive participants compared to other health conditions. Hence, the intervention was more beneficial for the young depressive group. Our subgroup results showed a significant subgroup difference based on the nature of participants, age groups, embodiment, ITT/MDM, and protocol/registration at follow-up 2 weeks to 12 months, but the subgroup analysis only used 1–4 trials. Hence, the results should be interpreted with caution. Hence, more investigations are recommended for future trials to confirm the findings.

4.3. Other Psychological Outcomes

Contrary to our expectations, the meta-analyses revealed that AI-based psychotherapeutic interventions did not improve anxiety symptoms, stress symptoms, and a combination of depressive, anxiety, and stress symptoms at postintervention and follow-up assessments compared with comparators. These findings are inconsistent with a previous review [17]. One possible reason may be attributed to the fact that most comparators are active control for these outcomes. Another possibility is the differences between depressive, anxiety, and stress symptoms [81]. Stress symptoms are a sense of feeling overwhelmed that measures chronic nonspecific arousal, tension, agitation, and irritability; anxiety symptoms are a sense of fear or dread that focuses on autonomic arousal, physical symptoms of anxiety, and the subjective experience of anxious affect; and depression symptoms are a sense of unhappiness or sadness, such as dysphoria, hopelessness, low self-esteem, anhedonia, and loss of interest [82, 83]. These discrepancies of feeling with specific cognitive processes and coping strategies may explain the different results [83]. At this stage, we can only speculate about the reason for this occurrence. Hence, conclusions cannot be drawn, and further studies are required.

According to our subgroup analyses, we found significant differences between subgroups based on participant’s nature, age groups, type of AI chatbot, psychotherapy, platform, comparators used, and how they used ITT/MDM to improve anxiety and stress symptoms at postintervention and follow-up assessments. However, these subgroup comparisons used only 1–6 trials in each group, and we also found an uneven number of trials in the subgroups. It is therefore important to evaluate the data cautiously[38]. Therefore, we advise further research to validate the results in subsequent studies.

4.4. Strengths and Limitations

The current systematic review has several strengths. This review was the first to examine the short- and long-term effects of AI-based psychotherapeutic interventions on psychological outcomes. A comprehensive search strategy, including 12 databases and three clinical trial registries, was used to identify 30 RCTs to reduce publication bias. The random-effect meta-analysis applied the restricted maximum likelihood method [36] with Hartung–Knapp adjustment [37]. The 95% PI for the meta-analyses was reported to predict true effects in future settings [32], and the certainty of evidence on each outcome was assessed.

Notwithstanding the strengths, this review had several limitations. First, the psychological outcomes were self-reported, which may cause social desirability bias. Second, the number of trials included in some meta-analyses was limited, especially for follow-up assessment; thus, statistical power was reduced. Third, the uneven number of trials in the subgroups may have failed to estimate valid results [38]. Fourth, included interventions were designed from a wide variety of psychological principles, and six meta-analyses revealed substantial heterogeneities that restricted the accuracy of pooled estimates. Fifth, the certainty of the evidence for the six outcomes was either very low or low, which may eliminate the confidence in implementing AI-based psychotherapeutic interventions. Sixth, some trials did not provide a regimen of intervention that limited the feature comparison. Lastly, the majority (n = 18) of the trials were from European countries, which might restrict the generalization of the findings.

4.5. Clinical Implications and Future Research

In this review, we found that depressive symptoms had small to medium effects at postintervention and follow-up assessments at 6–12 months. Given that the intervention was under variable comparator conditions, the active control groups may mask effectiveness [84]. Hence, small to medium effects can be considered either clinically important differences or the minimum clinically important differences [71, 85]. The participants could have experienced meaningful treatment benefits from AI-based psychotherapeutic interventions. However, the certainty of evidence quality of six outcomes was very low or low; thus, AI-based psychotherapeutic interventions can be supported as a supplementary intervention. Given the shortage of mental health workers globally, such intervention can be considered adjunctive to the usual treatments during the therapeutic process. Interventions can be incorporated into comprehensive web applications to facilitate access to psychotherapy amid physical distancing requirements, particularly during the ongoing COVID-19 pandemic. However, designing an interface adaptable to diverse user profiles presents certain challenges. Technical challenges are encountered in interpreting emotions in dialogues and improving features of chatbots in a human-like manner. Privacy and security of interventions are other important issues to pay attention to during the development of the intervention.

Despite the COVID-19 pandemic driving the use of AI technology, AI chatbots may have the risk of being used inappropriately. Healthcare research teams should collaborate closely and regularly with computing scientists to modify and upgrade human–computer interactions. The subgroup results suggest that intervention can target depressive populations aged 31–40 years. Sustainable heterogeneities exist in some meta-analyses, suggesting that the interventions varied across regimes of interventions, settings, and populations. Future interventions should consider using standardized regimes among specific populations in the same setting to draw a conclusive result. In addition, future research should include more detailed content and regimen according to the Template for Intervention Description and Replication Guide [86]. Given the low or very low certainty for the six outcomes, well-designed RCTs are necessary to minimize selection, performance, and reporting biases by reporting allocation concealment, blinding participants, use of ITT or MDM, and registering/publishing trial protocols. Future RCTs should recruit large samples in non-European countries to improve the generalizability of the findings.

5. Conclusion

This review revealed significant effects in reducing depressive symptoms after AI-based psychotherapeutic interventions at postintervention assessment and 6–12 months of follow-up assessment. We found comparable effects on anxiety, stress, and combined symptoms between AI-based psychotherapeutic interventions and active comparators. AI-based psychotherapeutic interventions can supplement the existing psychiatric care targeting depressive groups ages 31–40. Future studies should improve the transparency of the intervention’s content and regimen. Further investigations should also use methodologically robust approaches with a large-scale and long-term follow-up assessment to evaluate the sustainability of the intervention.

Conflicts of Interest

The authors declare no conflicts of interest.

Author Contributions

Ying Lau, Kin Sun Chan, Patrick Cheong-Iao Pang, and Sai Ho Wong conceptualized and designed the study. Wei How Darryl Ang and Wen Wei Ang conducted a systematic literature search with the help of a senior librarian. Sai Ho Wong, Wen Wei Ang, and Ying Lau performed the title and abstract screening, data extraction, and assessed the quality of selected studies. Ying Lau, Sai Ho Wong, Wei How Darryl Ang, and Wen Wei Ang conducted data management, data analysis, and data synthesis. Ying Lau supervised the systematic review and wrote the article. All authors have read and approved the final version of the article.

Funding

No funding was used in the study.

Acknowledgments

We acknowledge the senior librarian, Suei Nee Wong, for her support in developing the search strategy. We also appreciated the supplementary data from the trial authors.

Supporting Information

Additional supporting information can be found online in the Supporting Information section.

Open Research

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Supporting Information

Filename

Description

da8930012-sup-0001-f1.docxWord 2007 document , 15.2 MB

Supporting Information Table S1. PRISMA checklist reported the details of the systematic review including title, abstract, introduction, methods, results, discussion, and other information. Table S2. Eligibility criteria of selection randomized control trials based on population, intervention, comparison, outcomes, study design, years of publication, publication type, and language. Table S3. Search terms including index and keyword terms in PubMed, EMBASE, CINAHL, Cochrane Library, Scopus, IEEE Xplore, Web of Science, PsycINFO, ProQuest, Dissertations and Theses. Table S4. List of excluded 10 ongoing trials in various clinical trial registries with reasons. Table S5. List of excluded 26 full-text articles with reasons. Table S6. Summary of the 14 artificial intelligence (AI)–based chatbots in selected 30 RCTs. Table S7. GRADE criteria for certainty of the evidence for AI-based psychotherapeutic intervention with passive control (usual or waitlist comparator). Figure S1. Summary of risk of bias 2 for 23 ITT studies and 7 perprotocol studies. Figure S2. Subgroup analysis of depressive symptoms for AI-based psychotherapeutic interventions and comparators at postintervention by nature of participants. Figure S3. Subgroup analysis of depressive symptoms for AI-based psychotherapeutic interventions and comparators at postintervention by age groups. Figure S4. Subgroup analysis of depressive symptoms for AI-based psychotherapeutic interventions and comparators at postintervention by type of AI chatbot. Figure S5. Subgroup analysis of depressive symptoms for AI-based psychotherapeutic interventions and comparators at postintervention by different comparators. Figure S6. Subgroup analysis of depressive symptoms for AI-based psychotherapeutic interventions and comparators at postintervention by type of psychotherapy. Figure S7. Subgroup analysis of depressive symptoms for AI-based psychotherapeutic interventions and comparators at postintervention by type of platforms. Figure S8. Subgroup analysis of depressive symptoms for AI-based psychotherapeutic interventions and comparators at postintervention by response generation. Figure S9. Subgroup analysis of depressive symptoms for AI-based psychotherapeutic interventions and comparators at postintervention by embodiment. Figure S10. Subgroup analysis of depressive symptoms for AI-based psychotherapeutic interventions and comparators at postintervention by use of ITT or MDM. Figure S11. Subgroup analysis of depressive symptoms for AI-based psychotherapeutic interventions and comparators at postintervention by protocol publication or clinical trial registration. Figure S12. Subgroup analysis of depressive symptoms for AI-based psychotherapeutic interventions and comparators at follow-up 2 weeks to 3 months by nature of participants. Figure S13. Subgroup analysis of depressive symptoms for AI-based psychotherapeutic interventions and comparators at follow-up 2 weeks to 3 months by age groups. Figure S14. Subgroup analysis of depressive symptoms for AI-based psychotherapeutic interventions and comparators at follow-up 2 weeks to 3 months by different comparators. Figure S15. Subgroup analysis of depressive symptoms for AI-based psychotherapeutic interventions and comparators at follow-up 2 weeks to 3 months by type of psychotherapy. Figure S16. Subgroup analysis of depressive symptoms for AI-based psychotherapeutic interventions and comparators at follow-up 2 weeks to 3 months by type of platforms. Figure S17. Subgroup analysis of depressive symptoms for AI-based psychotherapeutic interventions and comparators at follow-up 2 weeks to 3 months by response generation. Figure S18. Subgroup analysis of depressive symptoms for AI-based psychotherapeutic interventions and comparators at follow-up 2 weeks to 3 months by embodiment. Figure S19. Subgroup analysis of depressive symptoms for AI-based psychotherapeutic interventions and comparators at follow-up 2 weeks to 3 months by ITT or MDM. Figure S20. Subgroup analysis of depressive symptoms for AI-based psychotherapeutic interventions and comparators at follow-up 2 weeks to 3 months by protocol or registration. Figure S21. Subgroup analysis of depressive symptoms for AI-based psychotherapeutic interventions and comparators at follow-up 6 months to 12 months by nature of participants. Figure S22. Subgroup analysis of depressive symptoms for AI-based psychotherapeutic interventions and comparators at follow-up 6 months to 12 months by age groups. Figure S23. Subgroup analysis of depressive symptoms for AI-based psychotherapeutic interventions and comparators at follow-up 6 months to 12 months by type of AI chatbot. Figure S24. Subgroup analysis of depressive symptoms for AI-based psychotherapeutic interventions and comparators at follow-up 6 months to 12 months by different comparators. Figure S25. Subgroup analysis of depressive symptoms for AI-based psychotherapeutic interventions and comparators at follow-up 6 months to 12 months by ITT or MDM. Figure S26. Subgroup analysis of depressive symptoms for AI-based psychotherapeutic interventions and comparators at follow-up 6 months to 12 months by protocol or registration. Figure S27. Subgroup analysis of anxiety symptoms for AI-based psychotherapeutic interventions and comparators at postintervention by nature of participants. Figure S28. Subgroup analysis of anxiety symptoms for AI-based psychotherapeutic interventions and comparators at postintervention by age groups. Figure S29. Subgroup analysis of anxiety symptoms for AI-based psychotherapeutic interventions and comparators at postintervention by type of AI chatbot. Figure S30. Subgroup analysis of anxiety symptoms for AI-based psychotherapeutic interventions and comparators at postintervention by different comparators. Figure S31. Subgroup analysis of anxiety symptoms for AI-based psychotherapeutic interventions and comparators at postintervention by type of psychotherapy. Figure S32. Subgroup analysis of anxiety symptoms for AI-based psychotherapeutic interventions and comparators at postintervention by type of platforms. Figure S33. Subgroup analysis of anxiety symptoms for AI-based psychotherapeutic interventions and comparators at postintervention by response generation. Figure S34. Subgroup analysis of anxiety symptoms for AI-based psychotherapeutic interventions and comparators at postintervention by embodiment. Figure S35. Subgroup analysis of anxiety symptoms for AI-based psychotherapeutic interventions and comparators at postintervention by ITT or MDM. Figure S36. Subgroup analysis of anxiety symptoms for AI-based psychotherapeutic interventions and comparators at postintervention protocol or registration. Figure S37. Subgroup analysis of anxiety symptoms for AI-based psychotherapeutic interventions and comparators at follow-up 2 weeks to 3 months by nature of participants. Figure S38. Subgroup analysis of anxiety symptoms for AI-based psychotherapeutic interventions and comparators at follow-up 2 weeks to 3 months by age groups. Figure S39. Subgroup analysis of anxiety symptoms for AI-based psychotherapeutic interventions and comparators at follow-up 2 weeks to 3 months by different comparators. Figure S40. Subgroup analysis of anxiety symptoms for AI-based psychotherapeutic interventions and comparators at follow-up 2 weeks to 3 months by type of psychotherapy. Figure S41. Subgroup analysis of anxiety symptoms for AI-based psychotherapeutic interventions and comparators at follow-up 2 weeks to 3 months by type of platforms. Figure S42. Subgroup analysis of anxiety symptoms for AI-based psychotherapeutic interventions and comparators at follow-up 2 weeks to 3 months by response generation. Figure S43. Subgroup analysis of anxiety symptoms for AI-based psychotherapeutic interventions and comparators at follow-up 2 weeks to 3 months by embodiment. Figure S44. Subgroup analysis of anxiety symptoms for AI-based psychotherapeutic interventions and comparators at follow-up 2 weeks to 3 months by ITT or MDM. Figure S45. Subgroup analysis of anxiety symptoms for AI-based psychotherapeutic interventions and comparators at follow-up 2 weeks to 3 months by protocol or registration. Figure S46. Subgroup analysis of anxiety symptoms for AI-based psychotherapeutic interventions and comparators at follow-up 6 months by nature of participants. Figure S47. Subgroup analysis of anxiety symptoms for AI-based psychotherapeutic interventions and comparators at follow-up 6 months by age groups. Figure S48. Subgroup analysis of anxiety symptoms for AI-based psychotherapeutic interventions and comparators at follow-up 6 months by type of AI chatbot. Figure S49. Subgroup analysis of anxiety symptoms for AI-based psychotherapeutic interventions and comparators at follow-up 6 months by different comparators. Figure S50. Subgroup analysis of stress symptoms for AI-based psychotherapeutic interventions and comparators at postintervention by nature of participants. Figure S51. Subgroup analysis of stress symptoms for AI-based psychotherapeutic interventions and comparators at postintervention by age groups. Figure S52. Subgroup analysis of stress symptoms for AI-based psychotherapeutic interventions and comparators at postintervention by different comparators. Figure S53. Subgroup analysis of stress symptoms for AI-based psychotherapeutic interventions and comparators at postintervention by type of psychotherapy. Figure S54. Subgroup analysis of stress symptoms for AI-based psychotherapeutic interventions and comparators at postintervention by type of platforms. Figure S55. Subgroup analysis of stress symptoms for AI-based psychotherapeutic interventions and comparators at postintervention by response generation. Figure S56. Subgroup analysis of stress symptoms for AI-based psychotherapeutic interventions and comparators at postintervention by embodiment. Figure S57. Subgroup analysis of stress symptoms for AI-based psychotherapeutic interventions and comparators at postintervention by ITT or MDM. Figure S58. Subgroup analysis of stress symptoms for AI-based psychotherapeutic interventions and comparators at postintervention by protocol or registration. Figure S59. Subgroup analysis of stress symptoms for AI-based psychotherapeutic interventions and comparators at follow-up 2 weeks to 3 months by nature of participants. Figure S60. Subgroup analysis of stress symptoms for AI-based psychotherapeutic interventions and comparators at follow-up 2 weeks to 3 months by age groups. Figure S61. Subgroup analysis of stress symptoms for AI-based psychotherapeutic interventions and comparators at follow-up 2 weeks to 3 months by different comparators. Figure S62. Subgroup analysis of stress symptoms for AI-based psychotherapeutic interventions and comparators at follow-up 2 weeks to 3 months by type of psychotherapy. Figure S63. Subgroup analysis of stress symptoms for AI-based psychotherapeutic interventions and comparators at follow-up 2 weeks to 3 months by type of platforms. Figure S64. Subgroup analysis of stress symptoms for AI-based psychotherapeutic interventions and comparators at follow-up 2 weeks to 3 months by response generation. Figure S65. Subgroup analysis of stress symptoms for AI-based psychotherapeutic interventions and comparators at follow-up 2 weeks to 3 months by embodiment. Figure S66. Subgroup analysis of stress symptoms for AI-based psychotherapeutic interventions and comparators at follow-up 2 weeks to 3 months by ITT or MDM. Figure S67. Subgroup analysis of stress symptoms for AI-based psychotherapeutic interventions and comparators at follow-up 2 weeks to 3 months by protocol or registration. Figure S68. Funnel plot for AI chatbot intervention on depression symptoms at postintervention. Figure S69. Funnel plot for AI chatbot intervention on anxiety symptoms at postintervention.

Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.

References

1 World Health Organization, World Mental Health Report: Transforming Mental Health for All, 2022, World Health Organization.
Google Scholar
2 GBD 2019 Mental Disorders Collaborators, Global, Regional, and National Burden of 12 Mental Disorders in 204 countries and Territories, 1990–2019: A Systematic Analysis for the Global Burden of Disease Study 2019, The Lancet Psychiatry. (2022) 9, no. 2, 137–150, https://doi.org/10.1016/S2215-0366(21)00395-3.
10.1016/S2215-0366(21)00395-3
PubMed Web of Science® Google Scholar
3 Lucas G. M., Rizzo A., and Gratch J., et al.Reporting Mental Health Symptoms: Breaking Down Barriers to Care With Virtual Human Interviewers, Frontiers in Robotics and AI. (2017) 4, 1–9, https://doi.org/10.3389/frobt.2017.00051, 2-s2.0-85055132286.
10.3389/frobt.2017.00051
Google Scholar
4 Vaidyam A. N., Wisniewski H., Halamka J. D., Kashavan M. S., and Torous J. B., Chatbots and Conversational Agents in Mental Health: A Review of the Psychiatric Landscape, The Canadian Journal of Psychiatry. (2019) 64, no. 7, 456–464, https://doi.org/10.1177/0706743719828977, 2-s2.0-85063318137.
10.1177/0706743719828977
PubMed Google Scholar
5 Santomauro D. F., Mantilla Herrera A. M., and Shadid J., et al.Global Prevalence and Burden of Depressive and Anxiety Disorders in 204 Countries and Territories in 2020 Due to the COVID-19 Pandemic, The Lancet. (2021) 398, no. 10312, 1700–1712, https://doi.org/10.1016/S0140-6736(21)02143-7.
10.1016/S0140-6736(21)02143-7
PubMed Web of Science® Google Scholar
6 Mahmud S., Mohsin M., Dewan M. N., and Muyeed A., The Global Prevalence of Depression, Anxiety, Stress, and Insomnia among General Population During COVID-19 Pandemic: A Systematic Review and Meta-Analysis, Trends in Psychology. (2023) 31, no. 1, 143–170, https://doi.org/10.1007/s43076-021-00116-9.
10.1007/s43076-021-00116-9
Google Scholar
7 Boucher E. M., Harake N. R., and Ward H. E., et al.Artificially Intelligent Chatbots in Digital Mental Health Interventions: A Review, Expert Review of Medical Devices. (2021) 18, no. sup1, 37–49, https://doi.org/10.1080/17434440.2021.2013200.
10.1080/17434440.2021.2013200
CAS PubMed Web of Science® Google Scholar
8 Ahmed A., Hassan A., and Aziz S., et al.Chatbot Features for Anxiety and Depression: A Scoping Review, Health Informatics Journal. (2023) 29, no. 1, https://doi.org/10.1177/14604582221146719, 14604582221146719.
10.1177/14604582221146719
Google Scholar
9 Gual-Montolio P., Jaén I., Martínez-Borba V., Castilla D., and Suso-Ribera C., Using Artificial Intelligence to Enhance Ongoing Psychological Interventions for Emotional Problems in Real- or Close to Real-Time: A Systematic Review, International Journal of Environmental Research and Public Health. (2022) 19, no. 13, https://doi.org/10.3390/ijerph19137737, 7737.
10.3390/ijerph19137737
PubMed Google Scholar
10 Pham K. T., Nabizadeh A., and Selek S., Artificial Intelligence and Chatbots in Psychiatry, Psychiatric Quarterly. (2022) 93, no. 1, 249–253, https://doi.org/10.1007/s11126-022-09973-8.
10.1007/s11126-022-09973-8
PubMed Web of Science® Google Scholar
11 Bendig E., Erb B., Schulze-Thuesing L., and Baumeister H., The Next Generation: Chatbots in Clinical Psychology and Psychotherapy to Foster Mental Health: A Scoping Review, Verhaltenstherapie. (2019) 32, no. Suppl. 1, 64–76, https://doi.org/10.1159/000501812.
10.1159/000501812
Google Scholar
12 Fitzpatrick K. K., Darcy A., and Vierhile M., Delivering Cognitive Behavior Therapy to Young Adults With Symptoms of Depression and Anxiety Using a Fully Automated Conversational Agent (Woebot): A Randomized Controlled Trial, JMIR Mental Health. (2017) 4, no. 2, https://doi.org/10.2196/mental.7785, e19.
10.2196/mental.7785
PubMed Web of Science® Google Scholar
13 ter Stal S., Kramer L. L., Tabak M., op den Akker H., and Hermens H., Design Features of Embodied Conversational Agents in eHealth: A Literature Review, International Journal of Human-Computer Studies. (2020) 138, https://doi.org/10.1016/j.ijhcs.2020.102409, 102409.
10.1016/j.ijhcs.2020.102409
Web of Science® Google Scholar
14 Abd-alrazaq A. A., Alajlani M., Alalwan A. A., Bewick B. M., Gardner P., and Househ M., An Overview of the Features of Chatbots in Mental Health: A Scoping Review, International Journal of Medical Informatics. (2019) 132, https://doi.org/10.1016/j.ijmedinf.2019.103978, 2-s2.0-85073096382, 103978.
10.1016/j.ijmedinf.2019.103978
PubMed Web of Science® Google Scholar
15 Wilson L. and Marasoiu M., The Development and Use of Chatbots in Public Health: Scoping Review, JMIR Human Factors. (2022) 9, no. 4, https://doi.org/10.2196/35882, e35882.
10.2196/35882
PubMed Google Scholar
16 Gaffney H., Mansell W., and Tai S., Conversational Agents in the Treatment of Mental Health Problems: Mixed-Method Systematic Review, JMIR Mental Health. (2019) 6, no. 10, https://doi.org/10.2196/14166, e14166.
10.2196/14166
PubMed Web of Science® Google Scholar
17 Li Y., Liang S., and Zhu B., et al.Feasibility and Effectiveness of Artificial Intelligence-Driven Conversational Agents in Healthcare Interventions: A Systematic Review of Randomized Controlled Trials, International Journal of Nursing Studies. (2023) 143, https://doi.org/10.1016/j.ijnurstu.2023.104494, 104494.
10.1016/j.ijnurstu.2023.104494
PubMed Google Scholar
18 Lim S. M., Shiau C. W. C., Cheng L. J., and Lau Y., Chatbot-Delivered Psychotherapy for Adults With Depressive and Anxiety Symptoms: A Systematic Review and Meta-Regression, Behavior Therapy. (2022) 53, no. 2, 334–347, https://doi.org/10.1016/j.beth.2021.09.007.
10.1016/j.beth.2021.09.007
PubMed Web of Science® Google Scholar
19 Page M. J., McKenzie J. E., and Bossuyt P. M., et al.The PRISMA 2020 Statement: An Updated Guideline for Reporting Systematic Reviews, International Journal of Surgery. (2021) 88, https://doi.org/10.1016/j.ijsu.2021.105906, 105906.
10.1016/j.ijsu.2021.105906
PubMed Web of Science® Google Scholar
20 Hariton E. and Locascio J. J., Randomised Controlled Trials–The Gold Standard for Effectiveness Research: Study Design: Randomised Controlled Trials, BJOG: An International Journal of Obstetrics & Gynaecology. (2018) 125, no. 13, https://doi.org/10.1111/1471-0528.15199, 2-s2.0-85056558920, 1716.
10.1111/1471-0528.15199
PubMed Web of Science® Google Scholar
21 Song F., Hooper L., and Loke Y., Publication Bias: What is it? How Do We Measure it? How Do We Avoid it?, Open Access Journal of Clinical Trials. (2013) 2013, no. 5, 71–81, https://doi.org/10.2147/OAJCT.S34419, 2-s2.0-84889824359.
10.2147/OAJCT.S34419
Google Scholar
22 Higgins J. P., Thomas J., and Chandler J., et al. Cochrane Handbook for Systematic Reviews of Interventions, 2019, John Wiley & Sons.
10.1002/9781119536604
Google Scholar
23 Cohen J., A Coefficient of Agreement for Nominal Scales, Educational and Psychological Measurement. (1960) 20, no. 1, 37–46, https://doi.org/10.1177/001316446002000104, 2-s2.0-84973587732.
10.1177/001316446002000104
PubMed Web of Science® Google Scholar
24 Marston L., Introductory Statistics for Health and Nursing Using SPSS, 2010, Sage Publications.
10.4135/9781446221570
Google Scholar
25 Sterne J. A. C., Savović J., and Page M. J., et al.RoB 2: A Revised Tool for Assessing Risk of Bias in Randomised Trials, BMJ. (2019) 366, https://doi.org/10.1136/bmj.l4898, 2-s2.0-85071628750, l4898.
10.1136/bmj.l4898
PubMed Google Scholar
26 Guyatt G., Oxman A. D., and Akl E. A., et al.GRADE Guidelines: 1. Introduction—GRADE Evidence Profiles and Summary of Findings Tables, Journal of Clinical Epidemiology. (2011) 64, no. 4, 383–394, https://doi.org/10.1016/j.jclinepi.2010.04.026, 2-s2.0-79951952372.
10.1016/j.jclinepi.2010.04.026
PubMed Web of Science® Google Scholar
27 Egger M., Smith G. D., Schneider M., and Minder C., Bias in Meta-Analysis Detected by a Simple, Graphical Test, BMJ. (1997) 315, no. 7109, 629–634, https://doi.org/10.1136/bmj.315.7109.629.
10.1136/bmj.315.7109.629
CAS PubMed Web of Science® Google Scholar
28 Zwetsloot P.-P., Van Der Naald M., and Sena E. S., et al.Standardized Mean Differences Cause Funnel Plot Distortion in Publication Bias Assessments, eLife. (2017) 6, 1–20, https://doi.org/10.7554/eLife.24260, 2-s2.0-85032890708.
10.7554/eLife.24260
Web of Science® Google Scholar
29 Sterne J. A., Sutton A. J., and Ioannidis J. P., et al.Recommendations for Examining and Interpreting Funnel Plot Asymmetry in Meta-Analyses of Randomised Controlled Trials, BMJ. (2011) 343, no. jul22 1, https://doi.org/10.1136/bmj.d4002, 2-s2.0-79961238388, d4002.
10.1136/bmj.d4002
PubMed Web of Science® Google Scholar
30 Schwarzer G. and Schwarzer M. G., Package ‘Meta, The R Foundation for Statistical Computing. (2012) 9, 27.
Google Scholar
31 Viechtbauer W. and Viechtbauer M. W., Package ‘Metafor’, The Comprehensive R Archive Network’, 2015.
Google Scholar
32 IntHout J., Ioannidis J. P. A., Rovers M. M., and Goeman J. J., Plea for Routinely Presenting Prediction Intervals in Meta-Analysis, BMJ Open. (2016) 6, no. 7, https://doi.org/10.1136/bmjopen-2015-010247, 2-s2.0-85000866393, e010247.
10.1136/bmjopen-2015-010247
PubMed Web of Science® Google Scholar
33 Hedges L. V. and Olkin I., Statistical Methods for Meta-Analysis, 2014, Academic press.
Google Scholar
34 Lakens D., Calculating and Reporting Effect Sizes to Facilitate Cumulative Science: A Practical Primer for t-Tests and ANOVAs, Frontiers in Psychology. (2013) 4, 1–12, https://doi.org/10.3389/fpsyg.2013.00863, 2-s2.0-84889688852.
10.3389/fpsyg.2013.00863
PubMed Google Scholar
35 Schwarzer G., Carpenter J. R., and Rücker G., Fixed Effect and Random Effects Meta-Analysis, Meta-Analysis with R, 2015, Springer International Publishing, 21–53, https://doi.org/10.1007/978-3-319-21416-0_2.
10.1007/978-3-319-21416-0_2
Google Scholar
36 Jamshidian M., On Algorithms for Restricted Maximum Likelihood Estimation, Computational Statistics & Data Analysis. (2004) 45, no. 2, 137–157, https://doi.org/10.1016/S0167-9473(02)00345-6, 2-s2.0-1142304590.
10.1016/S0167-9473(02)00345-6
Web of Science® Google Scholar
37 Röver C., Knapp G., and Friede T., Hartung-Knapp-Sidik-Jonkman Approach and Its Modification for Random-Effects Meta-Analysis With Few Studies, BMC Medical Research Methodology. (2015) 15, no. 1, https://doi.org/10.1186/s12874-015-0091-1, 2-s2.0-84963850958, 99.
10.1186/s12874-015-0091-1
PubMed Web of Science® Google Scholar
38 Richardson M., Garner P., and Donegan S., Interpretation of Subgroup Analyses in Systematic Reviews: A Tutorial, Clinical Epidemiology and Global Health. (2019) 7, no. 2, 192–198, https://doi.org/10.1016/j.cegh.2018.05.005, 2-s2.0-85047470773.
10.1016/j.cegh.2018.05.005
Web of Science® Google Scholar
39 Bring J., How to Standardize Regression Coefficients, The American Statistician. (1994) 48, no. 3, 209–213, https://doi.org/10.1080/00031305.1994.10476059, 2-s2.0-77951801920.
10.1080/00031305.1994.10476059
Web of Science® Google Scholar
40 Beevers C. G., Pearson R., Hoffman J., Foulser A. A., Shumake J. D., and Meyer B., Effectiveness of an Internet Intervention (Deprexis) for Depression in a United States Adult Sample: A Parallel-Group Pragmatic Randomized Controlled Trial, Journal of Consulting and Clinical Psychology. (2017) 85, no. 4, 367–380, https://doi.org/10.1037/ccp0000171, 2-s2.0-85013335958.
10.1037/ccp0000171
PubMed Web of Science® Google Scholar
41 Bennion M. R., Hardy G. E., Moore R. K., Kellett S., and Millings A., Usability, Acceptability, and Effectiveness of Web-Based Conversational Agents to Facilitate Problem Solving in Older Adults: Controlled Study, Journal of Medical Internet Research. (2020) 22, no. 5, https://doi.org/10.2196/16794, e16794.
10.2196/16794
PubMed Web of Science® Google Scholar
42 Berger T., Hämmerli K., Gubser N., Andersson G., and Caspar F., Internet-Based Treatment of Depression: A Randomized Controlled Trial Comparing Guided with Unguided Self-Help, Cognitive Behaviour Therapy. (2011) 40, no. 4, 251–266, https://doi.org/10.1080/16506073.2011.616531, 2-s2.0-84858419037.
10.1080/16506073.2011.616531
PubMed Google Scholar
43 Berger T., Krieger T., Sude K., Meyer B., and Maercker A., Evaluating an E-Mental Health Program ("Deprexis") as Adjunctive Treatment Tool in Psychotherapy for Depression: Results of a Pragmatic Randomized Controlled Trial, Journal of Affective Disorders. (2018) 227, 455–462, https://doi.org/10.1016/j.jad.2017.11.021, 2-s2.0-85034089945.
10.1016/j.jad.2017.11.021
PubMed Web of Science® Google Scholar
44 Bird T., Mansell W., Wright J., Gaffney H., and Tai S., Manage Your Life Online: A Web-Based Randomized Controlled Trial Evaluating the Effectiveness of a Problem-Solving Intervention in a Student Sample, Behavioural and Cognitive Psychotherapy. (2018) 46, no. 5, 570–582, https://doi.org/10.1017/S1352465817000820, 2-s2.0-85052638886.
10.1017/S1352465817000820
PubMed Google Scholar
45 Bücker L., Bierbrodt J., Hand I., Wittekind C., Moritz S., and Van Wouwe J. P., Effects of a Depression-Focused Internet Intervention in Slot Machine Gamblers: A Randomized Controlled Trial, PLoS One. (2018) 13, no. 6, https://doi.org/10.1371/journal.pone.0198859, 2-s2.0-85048164449, e0198859.
10.1371/journal.pone.0198859
PubMed Google Scholar
46 Burton C., Szentagotai Tatar A., and McKinstry B., et al.Pilot Randomised Controlled Trial of Help4Mood, An Embodied Virtual Agent-Based System to Support Treatment of Depression, Journal of Telemedicine and Telecare. (2016) 22, no. 6, 348–355, https://doi.org/10.1177/1357633X15609793, 2-s2.0-84983070104.
10.1177/1357633X15609793
PubMed Web of Science® Google Scholar
47 Danieli M., Ciulli T., Mousavi S. M., and Riccardi G., A Conversational Artificial Intelligence Agent for a Mental Health Care App: Evaluation Study of Its Participatory Design, JMIR Formative Research. (2021) 5, no. 12, https://doi.org/10.2196/30053, e30053.
10.2196/30053
PubMed Google Scholar
48 Danieli M., Ciulli T., and Mousavi S. M., et al.Assessing the Impact of Conversational Artificial Intelligence in the Treatment of Stress and Anxiety in Aging Adults: Randomized Controlled Trial, JMIR Mental Health. (2022) 9, no. 9, https://doi.org/10.2196/38067, e38067.
10.2196/38067
PubMed Google Scholar
49 Fischer A., Schröder J., and Vettorazzi E., et al.An Online Programme to Reduce Depression in Patients With Multiple Sclerosis: A Randomised Controlled Trial, The Lancet Psychiatry. (2015) 2, no. 3, 217–223, https://doi.org/10.1016/S2215-0366(14)00049-2, 2-s2.0-84923293255.
10.1016/S2215-0366(14)00049-2
PubMed Web of Science® Google Scholar
50 Fitzsimmons-Craft E. E., Chan W. W., and Smith A. C., et al.Effectiveness of a Chatbot for Eating Disorders Prevention: A Randomized Clinical Trial, International Journal of Eating Disorders. (2022) 55, no. 3, 343–353, https://doi.org/10.1002/eat.23662.
10.1002/eat.23662
PubMed Web of Science® Google Scholar
51 Gaffney H., Mansell W., Edwards R., and Wright J., Manage Your Life Online (MYLO): A Pilot Trial of a Conversational Computer-Based Intervention for Problem Solving in a Student Sample, Behavioural and Cognitive Psychotherapy. (2014) 42, no. 6, 731–746, https://doi.org/10.1017/S135246581300060X, 2-s2.0-84911865727.
10.1017/S135246581300060X
PubMed Google Scholar
52 Guțu S. M., Cosmoiu A., Cojocaru D., Turturescu T., Popoviciu C. M., and Giosan C., Bot to the Rescue? Effects of a Fully Automated Conversational Agent on Anxiety and Depression: A Randomized Controlled Trial, Annals of Depression and Anxiety. (2021) 8, no. 1, https://doi.org/10.26420/anndepressanxiety.2021.1107, 1107.
10.26420/anndepressanxiety.2021.1107
Google Scholar
53 He Y., Yang L., and Zhu X., et al.Mental Health Chatbot for Young Adults With Depressive Symptoms During the COVID-19 Pandemic: Single-Blind, Three-Arm Randomized Controlled Trial, Journal of Medical Internet Research. (2022) 24, no. 11, https://doi.org/10.2196/40719, e40719.
10.2196/40719
PubMed Web of Science® Google Scholar
54 Hunt M., Miguez S., Dukas B., Onwude O., and White S., Efficacy of Zemedy, A Mobile Digital Therapeutic for the Self-Management of Irritable Bowel Syndrome: Crossover Randomized Controlled Trial, JMIR mHealth and uHealth. (2021) 9, no. 5, https://doi.org/10.2196/26152, e26152.
10.2196/26152
PubMed Web of Science® Google Scholar
55 Jang S., Kim J. J., Kim S. J., Hong J., Kim S., and Kim E., Mobile App-Based Chatbot to Deliver Cognitive Behavioral Therapy and Psychoeducation for Adults with Attention Deficit: A Development and Feasibility/usability Study, International Journal of Medical Informatics. (2021) 150, https://doi.org/10.1016/j.ijmedinf.2021.104440, 104440.
10.1016/j.ijmedinf.2021.104440
PubMed Web of Science® Google Scholar
56 Klein J. P., Späth C., and Schröder J., et al.Time to Remission From Mild to Moderate Depressive Symptoms: 1 Year Results From the EVIDENT-Study, An RCT of an Internet Intervention for Depression, Behaviour Research and Therapy. (2017) 97, 154–162, https://doi.org/10.1016/j.brat.2017.07.013, 2-s2.0-85026803788.
10.1016/j.brat.2017.07.013
PubMed Google Scholar
57 Klos M. C., Escoredo M., Joerin A., Lemos V. N., Rauws M., and Bunge E. L., Artificial Intelligence-Based Chatbot for Anxiety and Depression in University Students: Pilot Randomized Controlled Trial, JMIR Formative Research. (2021) 5, no. 8, https://doi.org/10.2196/20678, e20678.
10.2196/20678
PubMed Google Scholar
58 Liu H., Peng H., Song X., Xu C., and Zhang M., Using AI Chatbots to Provide Self-Help Depression Interventions for University Students: A Randomized Trial of Effectiveness, Internet Interventions. (2022) 27, https://doi.org/10.1016/j.invent.2022.100495, 100495.
10.1016/j.invent.2022.100495
PubMed Google Scholar
59 Ly K. H., Ly A. M., and Andersson G., A Fully Automated Conversational Agent for Promoting Mental Well-Being: A Pilot RCT Using Mixed Methods, Internet Interventions. (2017) 10, 39–46, https://doi.org/10.1016/j.invent.2017.10.002, 2-s2.0-85032031216.
10.1016/j.invent.2017.10.002
PubMed Google Scholar
60 Maeda E., Miyata A., and Boivin J., et al.Promoting Fertility Awareness and Preconception Health Using a Chatbot: A Randomized Controlled Trial, Reproductive BioMedicine Online. (2020) 41, no. 6, 1133–1143, https://doi.org/10.1016/j.rbmo.2020.09.006.
10.1016/j.rbmo.2020.09.006
PubMed Web of Science® Google Scholar
61 Meyer B., Berger T., Caspar F., Beevers C. G., Andersson G., and Weiss M., Effectiveness of a Novel Integrative Online Treatment for Depression (Deprexis): Randomized Controlled Trial, Journal of Medical Internet Research. (2009) 11, no. 2, https://doi.org/10.2196/jmir.1151, 2-s2.0-66749158460, e15.
10.2196/jmir.1151
PubMed Web of Science® Google Scholar
62 Meyer B., Bierbrodt J., and Schröder J., et al.Effects of an Internet Intervention (Deprexis) on Severe Depression Symptoms: Randomized Controlled Trial, Internet Interventions. (2015) 2, no. 1, 48–59, https://doi.org/10.1016/j.invent.2014.12.003, 2-s2.0-84920698377.
10.1016/j.invent.2014.12.003
Google Scholar
63 Moritz S., Schilling L., Hauschildt M., Schröder J., and Treszl A., A Randomized Controlled Trial of Internet-Based Therapy in Depression, Behaviour Research and Therapy. (2012) 50, no. 7-8, 513–521, https://doi.org/10.1016/j.brat.2012.04.006, 2-s2.0-84862336194.
10.1016/j.brat.2012.04.006
PubMed Web of Science® Google Scholar
64 Oh J., Jang S., Kim H., and Kim J. J., Efficacy of Mobile App-Based Interactive Cognitive Behavioral Therapy Using a Chatbot for Panic Disorder, International Journal of Medical Informatics. (2020) 140, https://doi.org/10.1016/j.ijmedinf.2020.104171, 104171.
10.1016/j.ijmedinf.2020.104171
PubMed Web of Science® Google Scholar
65 Prochaska J. J., Vogel E. A., and Chieng A., et al.A Randomized Controlled Trial of a Therapeutic Relational Agent for Reducing Substance Misuse During the COVID-19 Pandemic, Drug and Alcohol Dependence. (2021) 227, https://doi.org/10.1016/j.drugalcdep.2021.108986, 108986.
10.1016/j.drugalcdep.2021.108986
CAS PubMed Google Scholar
66 Sandoval L. R., Buckey J. C., Ainslie R., Tombari M., Stone W., and Hegel M. T., Randomized Controlled Trial of a Computerized Interactive Media-Based Problem Solving Treatment for Depression, Behavior Therapy. (2017) 48, no. 3, 413–425, https://doi.org/10.1016/j.beth.2016.04.001, 2-s2.0-85028244247.
10.1016/j.beth.2016.04.001
PubMed Web of Science® Google Scholar
67 Schröder J., Brückner K., and Fischer A., et al.Efficacy of a Psychological Online Intervention for Depression in People with Epilepsy: A Randomized Controlled Trial, Epilepsia. (2014) 55, no. 12, 2069–2076, https://doi.org/10.1111/epi.12833, 2-s2.0-84920022614.
10.1111/epi.12833
CAS PubMed Web of Science® Google Scholar
68 Zwerenz R., Baumgarten C., and Becker J., et al.Improving the Course of Depressive Symptoms After Inpatient Psychotherapy Using Adjunct Web-Based Self-Help: Follow-Up Results of a Randomized Controlled Trial, Journal of Medical Internet Research. (2019) 21, no. 10, https://doi.org/10.2196/13655, 2-s2.0-85074087071, e13655.
10.2196/13655
PubMed Google Scholar
69 Zwerenz R., Becker J., Knickenberg R. J., Siepmann M., Hagen K., and Beutel M. E., Online Self-Help as an Add-on to Inpatient Psychotherapy: Efficacy of a New Blended Treatment Approach, Psychotherapy and Psychosomatics. (2017) 86, no. 6, 341–350, https://doi.org/10.1159/000481177, 2-s2.0-85033392147.
10.1159/000481177
PubMed Web of Science® Google Scholar
70 Dosovitsky G., Pineda B. S., Jacobson N. C., Chang C., Escoredo M., and Bunge E. L., Artificial Intelligence Chatbot for Depression: Descriptive Study of Usage, JMIR Formative Research. (2020) 4, no. 11, https://doi.org/10.2196/17065, e17065.
10.2196/17065
PubMed Google Scholar
71 Dettori J. R., Norvell D. C., and Chapman J. R., Clinically Important Difference: 4 Tips Toward a Better Understanding, Global Spine Journal. (2022) 12, no. 6, 1297–1298, https://doi.org/10.1177/21925682221092721.
10.1177/21925682221092721
PubMed Google Scholar
72 Cinalioglu K., Elbaz S., Sekhon K., Su C. L., Rej S., and Sekhon H., Exploring Differential Perceptions of Artificial Intelligence in Health Care Among Younger Versus Older Canadians: Results From the 2021 Canadian Digital Health Survey, Journal of Medical Internet Research. (2023) 25, https://doi.org/10.2196/38169, e38169.
10.2196/38169
PubMed Google Scholar
73 Cassell J., Embodied Conversational Agents: Representation and Intelligence in User Interfaces, AI Magazine. (2001) 22, no. 4, 67–67, https://doi.org/10.1609/aimag.v22i4.1593.
10.1609/aimag.v22i4.1593
Web of Science® Google Scholar
74 Cameron G., Cameron D., and Megaw G., et al. Towards a Chatbot for Digital Counselling, Proceedings of the 31st International BCS Human Computer Interaction Conference (HCI’ 17), 2017, BCS Learning & Development Ltd., 1–7.
Google Scholar
75 Ng F., Trauer T., Dodd S., Callaly T., Campbell S., and Berk M., The Validity of the 21-Item Version of the Depression Anxiety Stress Scales as a Routine Clinical Outcome Measure, Acta Neuropsychiatrica. (2007) 19, no. 5, 304–310, https://doi.org/10.1111/j.1601-5215.2007.00217.x, 2-s2.0-34748884358.
10.1111/j.1601-5215.2007.00217.x
PubMed Web of Science® Google Scholar
76 Ho A., Hancock J., and Miner A. S., Psychological, Relational, and Emotional Effects of Self-Disclosure After Conversations With a Chatbot, Journal of Communication. (2018) 68, no. 4, 712–733, https://doi.org/10.1093/joc/jqy026, 2-s2.0-85052726959.
10.1093/joc/jqy026
PubMed Web of Science® Google Scholar
77 Ray A., Bhardwaj A., Malik Y. K., Singh S., and Gupta R., Artificial Intelligence and Psychiatry: An Overview, Asian Journal of Psychiatry. (2022) 70, https://doi.org/10.1016/j.ajp.2022.103021, 103021.
10.1016/j.ajp.2022.103021
PubMed Web of Science® Google Scholar
78 Loveys K., Sebaratnam G., Sagar M., and Broadbent E., The Effect of Design Features on Relationship Quality With Embodied Conversational Agents: A Systematic Review, International Journal of Social Robotics. (2020) 12, no. 6, 1293–1312, https://doi.org/10.1007/s12369-020-00680-7.
10.1007/s12369-020-00680-7
Google Scholar
79 Kaye L. K., Malone S. A., and Wall H. J., Emojis: Insights, Affordances, and Possibilities for Psychological Science, Trends in Cognitive Sciences. (2017) 21, no. 2, 66–68, https://doi.org/10.1016/j.tics.2016.10.007, 2-s2.0-85009450714.
10.1016/j.tics.2016.10.007
PubMed Web of Science® Google Scholar
80 Toglia J., Askin G., Gerber L. M., Jaywant A., and O’Dell M. W., Participation in Younger and Older Adults Post-Stroke: Frequency, Importance, and Desirability of Engagement in Activities, Frontiers in Neurology. (2019) 10, https://doi.org/10.3389/fneur.2019.01108, 1108.
10.3389/fneur.2019.01108
PubMed Google Scholar
81 Plant J. and Stephenson J., Beating Stress, Anxiety and Depression: Groundbreaking Ways to Help You Feel Better, 2009, Piatkus.
Google Scholar
82 Holt N., Bremner A., Sutherland E., Vliek M., Passer M., and Smith R., EBOOK: Psychology: The Science of Mind and Behaviour, 4e, 2019, McGraw Hill.
Google Scholar
83 Jovanović V., Gavrilov-Jerković V., and Lazić M., Can Adolescents Differentiate Between Depression, Anxiety and Stress? Testing Competing Models of the Depression Anxiety Stress Scales (DASS-21), Current Psychology. (2021) 40, no. 12, 6045–6056, https://doi.org/10.1007/s12144-019-00540-2.
10.1007/s12144-019-00540-2
Google Scholar
84 Kraiss J., Viechtbauer W., and Black N., et al.Estimating the True Effectiveness of Smoking Cessation Interventions Under Variable Comparator Conditions: A Systematic Review and Meta-Regression, Addiction. (2023) 118, no. 10, 1835–1850, https://doi.org/10.1111/add.16222.
10.1111/add.16222
PubMed Google Scholar
85 Klukowska A. M., Vandertop W. P., Schröder M. L., and Staartjes V. E., Calculation of the Minimum Clinically Important Difference (MCID) Using Different Methodologies: Case Study and Practical Guide, European Spine Journal. (2024) 33, no. 9, 3388–3400, https://doi.org/10.1007/s00586-024-08369-5.
10.1007/s00586-024-08369-5
PubMed Google Scholar
86 Hoffmann T. C., Glasziou P. P., and Boutron I., et al.Better Reporting of Interventions: Template for Intervention Description and Replication (TIDieR) Checklist and Guide, BMJ. (2014) 348, no. mar07 3, https://doi.org/10.1136/bmj.g1687, 2-s2.0-84896518648, g1687.
10.1136/bmj.g1687
PubMed Web of Science® Google Scholar

All articles

Artificial Intelligence–Based Psychotherapeutic Intervention on Psychological Outcomes: A Meta-Analysis and Meta-Regression

Abstract

1. Introduction

2. Material and Method

2.1. Eligibility Criteria

2.2. Search Strategy

2.3. Study Selection

2.4. Data Management and Extraction

2.5. Risk of Bias Version 2 (RoB 2.0)

2.6. Certainty of Evidence

2.7. Data Synthesis

3. Results

3.1. Trial Characteristics

3.2. Description of AI-Based Psychotherapeutic Interventions

3.3. Individual Quality Assessment

3.4. Depressive Symptoms

3.5. Anxiety Symptoms

3.6. Stress Symptoms

3.7. Depressive, Anxiety, and Stress Symptoms

3.8. Overall Evidence

4. Discussion

4.1. Summary of Findings

4.2. Depressive Symptoms

4.3. Other Psychological Outcomes

4.4. Strengths and Limitations

4.5. Clinical Implications and Future Research

5. Conclusion

Conflicts of Interest

Author Contributions

Funding

Acknowledgments

Supporting Information

Open Research

Data Availability Statement

Supporting Information

References

Figures

References

Related

Information