This study aimed to assess the validity and reliability of AI chatbots, including Bing, ChatGPT 3.5, Google Gemini, and Claude AI, in addressing frequently asked questions (FAQs) related to dental trauma.

Methodology

A set of 30 FAQs was initially formulated by collecting responses from four AI chatbots. A panel comprising expert endodontists and maxillofacial surgeons then refined these to a final selection of 20 questions. Each question was entered into each chatbot three times, generating a total of 240 responses. These responses were evaluated using the Global Quality Score (GQS) on a 5-point Likert scale (5: strongly agree; 4: agree; 3: neutral; 2: disagree; 1: strongly disagree). Any disagreements in scoring were resolved through evidence-based discussions. The validity of the responses was determined by categorizing them as valid or invalid based on two thresholds: a low threshold (scores of ≥ 4 for all three responses) and a high threshold (scores of 5 for all three responses). A chi-squared test was used to compare the validity of the responses between the chatbots. Cronbach's alpha was calculated to assess the reliability by evaluating the consistency of repeated responses from each chatbot.

Conclusion

The results indicate that the Claude AI chatbot demonstrated superior validity and reliability compared to ChatGPT and Google Gemini, whereas Bing was found to be less reliable. These findings underscore the need for authorities to establish strict guidelines to ensure the accuracy of medical information provided by AI chatbots.

Conflicts of Interest

The authors declare no conflicts of interest.

Open Research

Data Availability Statement

The data that supports the findings of this study are available in the Data S1 of this article.

Supporting Information

References

1Y. LeCun, Y. Bengio, and G. Hinton, “Deep Learning,” Nature 521 (2015): 436–444.
10.1038/nature14539
CAS PubMed Web of Science® Google Scholar
2Y. Liu, T. Han, S. Ma, et al., “Summary of ChatGPT-Related Research and Perspective Towards the Future of Large Language Models,” Meta-Radiology 1 (2023): 100017.
10.1016/j.metrad.2023.100017
Google Scholar
3F. Eggmann, R. Weiger, N. U. Zitzmann, and M. B. Blatz, “Implications of Large Language Models Such as ChatGPT for Dental Medicine,” Journal of Esthetic and Restorative Dentistry 35 (2023): 1098–1102.
10.1111/jerd.13046
PubMed Web of Science® Google Scholar
4Z. Safi, A. Abd-Alrazaq, M. Khalifa, and M. Househ, “Technical Aspects of Developing Chatbots for Medical Applications: Scoping Review,” Journal of Medical Internet Research 22 (2020): e19127.
10.2196/19127
PubMed Web of Science® Google Scholar
5J. W. Ayers, Z. Zhu, A. Poliak, et al., “Evaluating Artificial Intelligence Responses to Public Health Questions,” JAMA Network Open 6 (2023): e2317517.
10.1001/jamanetworkopen.2023.17517
PubMed Web of Science® Google Scholar
6M. Clark and S. Bailey, “Chatbots in Health Care: Connecting Patients to Information,” Canadian Journal of Health Technologies 4 (2024): 818, https://doi.org/10.51731/cjht.2024.818.
10.51731/cjht.2024.818
Google Scholar
7D. W. Meyrowitsch, A. K. Jensen, J. B. Sørensen, and T. V. Varga, “AI Chatbots and (Mis)information in Public Health: Impact on Vulnerable Communities,” Frontiers in Public Health 11 (2023): 1226776.
10.3389/fpubh.2023.1226776
PubMed Web of Science® Google Scholar
8S. Azami-Aghdash, F. E. Azar, F. P. Azar, et al., “Prevalence, Etiology, and Types of Dental Trauma in Children and Adolescents: Systematic Review and Meta-Analysis,” Medical Journal of the Islamic Republic of Iran 29 (2015): 234.
PubMed Google Scholar
9J. Erwin, J. Horrell, H. Wheat, et al., “Access to Dental Care for Children and Young People in Care and Care Leavers: A Global Scoping Review,” Dental Journal 12 (2024): 37.
10.3390/dj12020037
Google Scholar
10A. Bernard, M. Langille, S. Hughes, C. Rose, D. Leddin, and S. Veldhuyzen Van Zanten, “A Systematic Review of Patient Inflammatory Bowel Disease Information Resources on the World Wide Web,” American Journal of Gastroenterology 102 (2007): 2070–2077.
10.1111/j.1572-0241.2007.01325.x
PubMed Web of Science® Google Scholar
11J. M. Bland and D. G. Altman, “Statistics Notes: Cronbach's Alpha,” BMJ 314 (1997): 572.
10.1136/bmj.314.7080.572
CAS PubMed Web of Science® Google Scholar
12H. L. Walker, S. Ghani, C. Kuemmerli, et al., “Reliability of Medical Information Provided by ChatGPT: Assessment Against Clinical Guidelines and Patient Information Quality Instrument,” Journal of Medical Internet Research 25 (2023): e47479.
10.2196/47479
PubMed Web of Science® Google Scholar
13A. Dhopte and H. Bagde, “Smart Smile: Revolutionizing Dentistry With Artificial Intelligence,” Cureus 15, no. 6 (2023): e41227, https://doi.org/10.7759/cureus.41227.
10.7759/cureus.41227
PubMed Google Scholar
14E. Grassini, M. Buzzi, B. Leporini, and A. Vozna, “A Systematic Review of Chatbots in Inclusive Healthcare: Insights From the Last 5 Years,” Universal Access in the Information Society. Published ahead of print May 10, (2024), https://doi.org/10.1007/s10209-024-01118-x.
10.1007/s10209?024?01118?x
Web of Science® Google Scholar
15S. Petti, U. Glendor, and L. Andersson, “World Traumatic Dental Injury Prevalence and Incidence, a Meta-Analysis—One Billion Living People Have Had Traumatic Dental Injuries,” Dental Traumatology 34 (2018): 71–86.
10.1111/edt.12389
PubMed Web of Science® Google Scholar
16H. Mohammad-Rahimi, S. A. Ourang, M. A. Pourhoseingholi, O. Dianat, P. M. H. Dummer, and A. Nosrat, “Validity and Reliability of Artificial Intelligence Chatbots as Public Sources of Information on Endodontics,” International Endodontic Journal 57 (2024): 305–314.
10.1111/iej.14014
PubMed Web of Science® Google Scholar
17M. A. Makrygiannakis, K. Giannakopoulos, and E. G. Kaklamanos, “Evidence-Based Potential of Generative Artificial Intelligence Large Language Models in Orthodontics: A Comparative Study of ChatGPT, Google Bard, and Microsoft Bing,” European Journal of Orthodontics 46 (2024): cjae017.
10.1093/ejo/cjae017
Web of Science® Google Scholar
18A. Suárez, V. Díaz-Flores García, J. Algar, M. Gómez Sánchez, M. Llorente de Pedro, and Y. Freire, “Unveiling the ChatGPT Phenomenon: Evaluating the Consistency and Accuracy of Endodontic Question Answers,” International Endodontic Journal 57 (2024): 108–113.
10.1111/iej.13985
PubMed Web of Science® Google Scholar
19D. Sharma, D. A. Vidhate, J. Osei-Asiamah, M. Kumari, V. Mahajan, and K. Rajagopal, “Exploring the Evolution of Chatgpt: From Origin to Revolutionary Influence,” Educational Administration: Theory and Practice 30, no. 5 (2024): 2685–2692.
Google Scholar
20R. Doshi, K. Amin, P. Khosla, S. Bajaj, S. Chheang, and H. P. Forman, “Utilizing Large Language Models to Simplify Radiology Reports: A Comparative Analysis of ChatGPT3.5, ChatGPT4.0, Google Bard, and Microsoft Bing,” Published ahead of print June 5, 2023, https://doi.org/10.1101/2023.06.04.23290786.
10.1101/2023.06.04.23290786
Google Scholar
21K. Irusa, I. A. Alrahaem, C. N. Ngoc, and T. Donovan, “Tooth Whitening Procedures: A Narrative Review,” Dental-Revue 2 (2022): 100055.
Google Scholar
22L. Kollitsch, K. Eredics, M. Marszalek, et al., “How Does Artificial Intelligence Master Urological Board Examinations? A Comparative Analysis of Different Large Language Models' Accuracy and Reliability in the 2022 In-Service Assessment of the European Board of Urology,” World Journal of Urology 42 (2024): 20.
10.1007/s00345-023-04749-6
PubMed Google Scholar
23A. E. Sobaih, “Ethical Concerns for Using Artificial Intelligence Chatbots in Research and Publication: Evidences From Saudi Arabia,” Journal of Applied Learning & Teaching 7 (2024): 21, https://doi.org/10.37074/jalt.2024.7.1.21.
10.37074/jalt.2024.7.1.21
Google Scholar
24J. O. Andreasen, F. M. Andreasen, and L. Andersson, Textbook and Color Atlas of Traumatic Injuries to the Teeth, 4th ed., ed. J. O. Andreasen (Oxford, UK; Ames, Iowa: Blackwell Munksgaard, 2007).
Google Scholar
25F. Milana, E. Costanza, and J. E. Fischer, “Chatbots as Advisers: The Effects of Response Variability and Reply Suggestion Buttons,” in Proceedings of the 5th International Conference on Conversational User Interfaces (Eindhoven, the Netherlands: ACM, n.d.), 1–10.
Google Scholar

Citing Literature

Volume41, Issue2

April 2025

Pages 187-193

Evaluation of validity and reliability of AI Chatbots as public sources of information on dental trauma

ABSTRACT

Aim

Methodology

Conclusion

Conflicts of Interest

Open Research

Data Availability Statement

Supporting Information

References

Citing Literature

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

Evaluation of validity and reliability of AI Chatbots as public sources of information on dental trauma

ABSTRACT

Aim

Methodology

Conclusion

Conflicts of Interest

Open Research

Data Availability Statement

Supporting Information

References

Citing Literature

References

Related

Information