Volume 23, Issue 3 pp. 444-445
Letter to the Editor
Free Access

The Framework for AI Tool Assessment in Mental Health (FAITA - Mental Health): a scale for evaluating AI-powered mental health tools

Ashleigh Golden

Ashleigh Golden

Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, CA, USA

Search for more papers by this author
Elias Aboujaoude

Elias Aboujaoude

Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, CA, USA

Cedars-Sinai Medical Center, Los Angeles, CA, USA

Search for more papers by this author
First published: 16 September 2024
Citations: 2

Supplementary information including the FAITA - Mental Health scale is available at https://www.FAITAmentalhealthscale.com.

Even within the ever-evolving landscape of digital mental health interventions, the advent of generative artificial intelligence (GAI), large language models (LLMs), and generative pre-trained transformers (GPTs) represents a paradigm shift. These technologies bring the promise of scalable and personalized diagnostics, psychoeducation and treatment that may help close a stubborn access-to-care gap1. At the same time, the risk to patients’ health from unmonitored AI-powered care, and to users’ data from insecure platforms, presents unprecedented challenges. The enthusiasm and fear that AI mental health offerings simultaneously generate make a comprehensive tool for their systematic assessment a timely necessity.

To our knowledge, no comprehensive scale exists for systematically evaluating AI interventions. Abbasian et al2 suggested helpful metrics for assessing AI health care conversations, without explicitly tailoring them to mental health. AI scholar L. Eliot3 advocated rating mental health chatbots by their autonomy or degree of independence from human oversight. Pfohl et al4 put the focus squarely on evaluating equity and bias. These efforts highlight the need for a comprehensive toolbox for evaluating AI interventions in mental health – one that encompasses autonomy and equity, but also efficacy, user experience, safety and ethical integrity, among other crucial dimensions5.

Evaluative digital mental health tools that predate the rise of AI provide valuable lessons. The now discontinued nonprofit One Mind PsyberGuide6 offered reviews of digital mental health apps with a focus on three dimensions: credibility, user experience, and transparency. This framework seemed to fulfill an important role across several constituencies: Psihogios et al7 praised it in their paper on pediatric mobile health apps; Nesamoney8 endorsed it for helping app developers and designers; and Garland et al9 described it as more comprehensive and user-friendly than other app review platforms, including that by the American Psychological Association.

In creating an assessment framework for AI-powered mental health tools, PsyberGuide is a reasonable starting point. Besides short app reviews by users and lengthier expert reviews, it offered scoring guidelines for its dimensions. Given the importance of AI tools “learning” from ongoing feedback and reviews, and of a scoring system that facilitates comparisons across AI offerings, it forms a helpful basis.

Here we introduce the Framework for AI Tool Assessment in Mental Health (FAITA - Mental Health), a structured scale developed by updating PsyberGuide's “credibility”, “user experience” and “transparency” dimensions for the AI “age”, and incorporating three crucial new dimensions: “user agency”, “diversity and inclusivity” and “crisis management” (see supplementary information for the full structured FAITA - Mental Health form).

Our framework reflects awareness of both the potential and challenges of AI tools, and emphasizes evidence base, user-centric design, safety, personalization, cultural sensitivity, and the ethical use of technology. Ultimately, the framework aims to promote “best practices” and to guide industry development of AI technologies that benefit users while respecting their rights. Additionally, the framework seeks to be sufficiently flexible to accommodate continued evolution in the field and, with some minor modifications, adaptation to other medical disciplines impacted by AI (e.g., “FAITA - Genetics”).

The framework's first dimension, “credibility”, evaluates AI-powered mental health tools according to their scientific underpinnings and user goal achievement capabilities. Integrating the three subdimensions of “proposed goal”, “evidence-based content” and “retention”, this dimension advocates for interventions that have clear and measurable goals, are grounded in validated research and practices, and can keep users meaningfully engaged over time. Each subdimension is awarded up to 2 points, for a maximum dimension score of 6 for the most “credible” tool.

The second dimension for assessing AI mental health tools, “user experience”, addresses more complex interactions than those encountered in static mental health apps. As such, PsyberGuide's “user experience” dimension – with its focus on engagement, functionality and esthetics – was found to be insufficient, and three new subdimensions were incorporated: “personalized adaptability”, to evaluate the AI's ability to improve from user feedback over time; “quality of interactions”, to evaluate the naturalness of exchanges; and “mechanisms for feedback”, to underscore the importance of users’ ability to report issues, suggest improvements, and seek assistance. Each subdimension on the “user experience” dimension is awarded up to 2 points, for a maximum dimension score of 6.

The third dimension, “user agency”, is new and underlines the importance of empowering users to manage their personal data and treatment choices. It is divided into two subdimensions. The first, “user autonomy, data protection, and privacy”, focuses on control over personal health data, clearly worded and user-friendly consent processes, robust data protection protocols, secure storage, and users' ability to actively manage their data. The second, “user empowerment”, focuses on users’ self-efficacy and capacity for self-management, gauging AI interventions’ inclusion of tools that support users' independence, as well as encouraging the application of skills learned using the tool to real-life contexts in ways that prevent dependency on the tool. Each subdimension is awarded up to 2 points, for a maximum “user agency” dimension score of 4.

The fourth dimension, “equity and inclusivity”, is also new and consists of two subdimensions: “cultural sensitivity and inclusivity”, which assesses a tool's capability to engage with users from diverse cultural backgrounds and emphasizes the need for content recognizing cultural and other identity differences; and “bias and fairness”, which addresses the tool's commitment to diversify its training material and remove biases that might impact fairness and equity. Each subdimension is awarded up to 2 points, for a maximum “equity and inclusivity” dimension score of 4.

The fifth dimension, “transparency”, remains from PsyberGuide, but now extends beyond data management to include the AI's ownership, funding, business model, development processes, and primary stakeholders. It highlights the importance of providing clear and comprehensive information about operational and business practices, so that users are better equipped to make informed decisions on using such technologies. It also aims to help developers adhere to best practices by disclosing information regarding their tools’ intention and governance. The “transparency” dimension carries a maximum score of 2.

Finally, the new sixth dimension of “crisis management” evaluates the safeguarding of user well-being and whether the mental health AI tool provides immediate, effective support in emergencies. It emphasizes comprehensive safety protocols and crisis management features that not only steer users to relevant local resources during crises, but also facilitate follow-through with these resources. The “crisis management” dimension carries a maximum score of 2.

Integrating GAI, LLMs and GPTs into mental health care heralds a promising but complicated new era. The promise of these technologies for delivering personalized, accessible and scalable mental health support is immense. So, unfortunately, are the challenges. We developed the FAITA - Mental Health to equip users, clinicians, researchers, and industry and public health stakeholders with a scale for comprehensively evaluating the quality, safety, integrity and user-centricity of AI-powered mental health tools.

With an overall score ranging from 0 to 24, this scale attempts to capture the complexities of AI-driven mental health care, while accommodating ongoing evolution in the field and possible adaptations to other medical disciplines. Formal research is required to empirically test its strengths, weaknesses, and most pertinent components.

    The full text of this article hosted at iucr.org is unavailable due to technical difficulties.