Volume 41, Issue 2 pp. 187-193
ORIGINAL ARTICLE

Evaluation of validity and reliability of AI Chatbots as public sources of information on dental trauma

Ashish J. Johnson

Ashish J. Johnson

All India Institute of Medical Sciences (AIIMS), Bathinda, India

Search for more papers by this author
Tarun Kumar Singh

Corresponding Author

Tarun Kumar Singh

All India Institute of Medical Sciences (AIIMS), Bathinda, India

Correspondence:

Tarun Kumar Singh ([email protected])

Search for more papers by this author
Aakash Gupta

Aakash Gupta

All India Institute of Medical Sciences (AIIMS), Bathinda, India

Search for more papers by this author
Hariram Sankar

Hariram Sankar

All India Institute of Medical Sciences (AIIMS), Bathinda, India

Search for more papers by this author
Ikroop Gill

Ikroop Gill

All India Institute of Medical Sciences (AIIMS), Bathinda, India

Search for more papers by this author
Madhav Shalini

Madhav Shalini

All India Institute of Medical Sciences (AIIMS), Bathinda, India

Search for more papers by this author
Neeraj Mohan

Neeraj Mohan

Maulana Azad Institute of Dental Science, New Delhi, India

Search for more papers by this author
First published: 17 October 2024
Citations: 10

Funding: The authors received no specific funding for this work.

ABSTRACT

Aim

This study aimed to assess the validity and reliability of AI chatbots, including Bing, ChatGPT 3.5, Google Gemini, and Claude AI, in addressing frequently asked questions (FAQs) related to dental trauma.

Methodology

A set of 30 FAQs was initially formulated by collecting responses from four AI chatbots. A panel comprising expert endodontists and maxillofacial surgeons then refined these to a final selection of 20 questions. Each question was entered into each chatbot three times, generating a total of 240 responses. These responses were evaluated using the Global Quality Score (GQS) on a 5-point Likert scale (5: strongly agree; 4: agree; 3: neutral; 2: disagree; 1: strongly disagree). Any disagreements in scoring were resolved through evidence-based discussions. The validity of the responses was determined by categorizing them as valid or invalid based on two thresholds: a low threshold (scores of ≥ 4 for all three responses) and a high threshold (scores of 5 for all three responses). A chi-squared test was used to compare the validity of the responses between the chatbots. Cronbach's alpha was calculated to assess the reliability by evaluating the consistency of repeated responses from each chatbot.

Conclusion

The results indicate that the Claude AI chatbot demonstrated superior validity and reliability compared to ChatGPT and Google Gemini, whereas Bing was found to be less reliable. These findings underscore the need for authorities to establish strict guidelines to ensure the accuracy of medical information provided by AI chatbots.

Conflicts of Interest

The authors declare no conflicts of interest.

Data Availability Statement

The data that supports the findings of this study are available in the Data S1 of this article.

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.