Evidence-Based Potential of Generative Artificial Intelligence Large Language Models on Dental Avulsion: ChatGPT Versus Gemini
Corresponding Author
Taibe Tokgöz Kaplan
Department of Pedodontics, Faculty of Dentistry, Karabuk University, Karabük, Turkey
Correspondence:
Taibe Tokgöz Kaplan ([email protected])
Search for more papers by this authorCorresponding Author
Taibe Tokgöz Kaplan
Department of Pedodontics, Faculty of Dentistry, Karabuk University, Karabük, Turkey
Correspondence:
Taibe Tokgöz Kaplan ([email protected])
Search for more papers by this authorFunding: The authors received no specific funding for this work.
ABSTRACT
Background
In this study, the accuracy and comprehensiveness of the answers given to questions about dental avulsion by two artificial intelligence-based language models, ChatGPT and Gemini, were comparatively evaluated.
Materials and Methods
Based on the guidelines of the International Society of Dental Traumatology, a total of 33 questions were prepared, including multiple-choice questions, binary questions, and open-ended questions as technical questions and patient questions about dental avulsion. They were directed to ChatGPT and Gemini. Responses were recorded and scored by four pediatric dentists. Statistical analyses, including ICC analysis, were performed to determine the agreement and accuracy of the responses. The significance level was set as p < 0.050.
Results
The mean score of the Gemini model was statistically significantly higher than the ChatGPT (p = 0.001). ChatGPT gave more correct answers to open-ended questions and T/F questions on dental avulsion; it showed the lowest accuracy in the MCQ section. There was no significant difference between the responses of the Gemini model to different types of questions on dental avulsion and the median scores (p = 0.088). ChatGPT and Gemini were analyzed with the Mann–Whitney U test without making a distinction between question types, and Gemini answers were found to be statistically significantly more accurate (p = 0.004).
Conclusions
The Gemini and ChatGPT language models based on the IADT guideline for dental avulsion undoubtedly show promise. To guarantee the successful incorporation of LLMs into practice, it is imperative to conduct additional research, clinical validation, and improvements to the models.
Conflicts of Interest
The authors declare no conflicts of interest.
Open Research
Data Availability Statement
The data that support the findings of this study are available from the corresponding author upon reasonable request.
Supporting Information
Filename | Description |
---|---|
edt12999-sup-0001-AppendixS1.docxWord 2007 document , 60.1 KB |
Appendix S1. |
Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.
References
- 1A. Aggarwal, C. C. Tam, D. Wu, X. Li, and S. Qiao, “Artificial Intelligence-Based Chatbots for Promoting Health Behavioral Changes: Systematic Review,” Journal of Medical Internet Research 25 (2023): e40789, https://doi.org/10.2196/40789.
- 2S. Wailthare, T. Gaikwad, K. Khadse, and P. Dubey, “Artificial Intelligence Based Chat-Bot,” Artificial Intelligence 5 (2018): 2305–2306, https://doi.org/10.22214/ijraset.2018.4393.
10.22214/ijraset.2018.4393 Google Scholar
- 3C. Krishnan, A. Gupta, A. Gupta, and G. Singh, “ Impact of Artificial Intelligence-Based Chatbots on Customer Engagement and Business Growth,” in Deep Learning for Social Media Data Analytics (Springer International Publishing, 2022), 195–210, https://doi.org/10.1007/978-3-031-10869-3_11.
10.1007/978-3-031-10869-3_11 Google Scholar
- 4S. Verma, R. Sharma, S. Deb, and D. Maitra, “Artificial Intelligence in Marketing: Systematic Review and Future Research Direction,” International Journal of Information Management Data Insights 1 (2021): 100002, https://doi.org/10.1016/j.jjimei.2020.100002.
10.1016/j.jjimei.2020.100002 Google Scholar
- 5U. Arsenijevic and M. Jovic, “ Artificial Intelligence Marketing: Chatbots,” in 2019 International Conference on Artificial Intelligence: Applications and Innovations (IC-AIAI) (IEEE, 2019), 19–193, https://doi.org/10.1109/IC-AIAI48757.2019.00010.
10.1109/IC-AIAI48757.2019.00010 Google Scholar
- 6F. Eggmann and M. B. Blatz, “ChatGPT: Chances and Challenges for Dentistry,” Compendium of Continuing Education in Dentistry 44 (2023): 220–224.
- 7S. Fergus, M. Botha, and M. Ostovar, “Evaluating Academic Answers Generated Using ChatGPT,” Journal of Chemical Education 100, no. 4 (2023): 1672–1675, https://doi.org/10.1021/acs.jchemed.3c00087.
- 8A. Abd-Alrazaq, R. AlSaad, D. Alhuwail, et al., “Large Language Models in Medical Education: Opportunities, Challenges, and Future Directions,” JMIR Medical Education 9 (2023): e48291, https://doi.org/10.2196/48291.
- 9T. Mikolov, A. Deoras, D. Povey, L. Burget, and J. Černocký, “ Strategies for Training Large Scale Neural Network Language Models,” in 2011 IEEE Workshop on Automatic Speech Recognition & Understanding (IEEE, 2011), 196–201, https://doi.org/10.1109/ASRU.2011.6163930.
10.1109/ASRU.2011.6163930 Google Scholar
- 10M. Sallam, “ChatGPT Utility in Healthcare Education, Research, and Practice: Systematic Review on the Promising Perspectives and Valid Concerns,” Health 11 (2023): 887, https://doi.org/10.3390/healthcare11060887.
10.3390/healthcare11060887 Google Scholar
- 11E. Brynjolfsson, D. Li, and L. R. Raymond, Generative AI at Work NBER Working Papers 31161, (National Bureau of Economic Research, Inc., 2023).
10.3386/w31161 Google Scholar
- 12J. Liu, C. Wang, and S. Liu, “Utility of ChatGPT in Clinical Practice,” Journal of Medical Internet Research 25 (2023): e48568, https://doi.org/10.2196/48568.
- 13R. Thoppilan, D. De Freitas, J. Hall, et al., “ Lamda: Language Models for Dialog Applications,” arXiv Preprint arXiv:220108239 (2022), https://doi.org/10.48550/arXiv.2201.08239.
10.48550/arXiv.2201.08239 Google Scholar
- 14M. A. Makrygiannakis, K. Giannakopoulos, and E. G. Kaklamanos, “Evidence-Based Potential of Generative Artificial Intelligence Large Language Models in Orthodontics: A Comparative Study of ChatGPT, Google Bard, and Microsoft Bing,” European Journal of Orthodontics 46 (2024): cjae017, https://doi.org/10.1093/ejo/cjae017.
- 15M. Masalkhi, J. Ong, E. Waisberg, and A. G. Lee, “Google DeepMind's Gemini AI Versus ChatGPT: A Comparative Analysis in Ophthalmology,” Eye 38 (2024): 1412–1417, https://doi.org/10.1038/s41433-024-02958-w.
- 16Y. N. Abbas, H. A. Hassan, D. Q. Hamad, et al., “Role of ChatGPT and Google Bard in the Diagnosis of Psychiatric Disorders: A Comparative Study,” Barw Medical Journal 1, no. 4 (2023): 14–19, https://doi.org/10.58742/4vd6h741.
10.58742/4vd6h741 Google Scholar
- 17L. Ouyang, J. Wu, X. Jiang, et al., “Training Language Models to Follow Instructions With Human Feedback,” Advances in Neural Information Processing Systems 35 (2022): 27730–27744.
- 18Y. Wang, Z. Luo, and P. M. Jodoin, “Interactive Deep Learning Method for Segmenting Moving Objects,” Pattern Recognition Letters 96 (2017): 66–75, https://doi.org/10.1016/j.patrec.2016.09.014.
- 19B. Murdoch, “Privacy and Artificial Intelligence: Challenges for Protecting Health Information in a New Era,” BMC Medical Ethics 22 (2021): 1–5, https://doi.org/10.1186/s12910-021-00687-3.
- 20R. R. Althar, D. Samanta, M. Kaur, A. A. Alnuaim, N. Aljaffan, and U. M. Aman, “[Retracted] Software Systems Security Vulnerabilities Management by Exploring the Capabilities of Language Models Using NLP,” Computational Intelligence and Neuroscience 2021 (2021): 8522839, https://doi.org/10.1155/2023/9867256.
- 21S. Takagi, T. Watari, A. Erabi, and K. Sakaguchi, “Performance of GPT-3.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study,” JMIR Medical Education 9 (2023): e48002, https://doi.org/10.2196/48002.
- 22K. Taira, T. Itaya, and A. Hanada, “Performance of the Large Language Model ChatGPT on the National Nurse Examinations in Japan: Evaluation Study,” JMIR Nursing 6 (2023): e47305, https://doi.org/10.2196/47305.
- 23Y. Kunitsu, “The Potential of GPT-4 as a Support Tool for Pharmacists: Analytical Study Using the Japanese National Examination for Pharmacists,” JMIR Medical Education 9 (2023): e48452, https://doi.org/10.2196/48452.
- 24F. A. Schwendicke, W. Samek, and J. Krois, “Artificial Intelligence in Dentistry: Chances and Challenges,” Journal of Dental Research 99 (2020): 769–774, https://doi.org/10.1177/0022034520915714.
- 25F. Carrillo-Perez, O. E. Pecho, J. C. Morales, et al., “Applications of Artificial Intelligence in Dentistry: A Comprehensive Review,” Journal of Esthetic and Restorative Dentistry 34 (2022): 259–280, https://doi.org/10.1111/jerd.12844.
- 26K. Hung, C. Montalvao, R. Tanaka, T. Kawai, and M. M. Bornstein, “The Use and Performance of Artificial Intelligence Applications in Dental and Maxillofacial Radiology: A Systematic Review,” Dento Maxillo Facial Radiology 49 (2020): 20190107, https://doi.org/10.1259/dmfr.20190107.
- 27S. B. Khanagar, S. Vishwanathaiah, S. Naik, et al., “Application and Performance of Artificial Intelligence Technology in Forensic Odontology—A Systematic Review,” Legal Medicine 48 (2021): 101826, https://doi.org/10.1016/j.legalmed.2020.101826.
- 28N. M. Islam, L. Laughter, R. Sadid-Zadeh, et al., “Adopting Artificial Intelligence in Dental Education: A Model for Academic Leadership and Innovation,” Journal of Dental Education 86 (2022): 1545–1551, https://doi.org/10.1002/jdd.13010.
- 29T. Shan, F. R. Tay, and L. Gu, “Application of Artificial Intelligence in Dentistry,” Journal of Dental Research 100 (2021): 232–244, https://doi.org/10.1177/0022034520969115.
- 30S. Biswas, “ Role of ChatGPT in Dental Science,” Available at SSRN 4403581 (2023), https://doi.org/10.2139/ssrn.4403581.
10.2139/ssrn.4403581 Google Scholar
- 31W. M. Ahmed, A. A. Azhari, A. Alfaraj, A. Alhamadani, M. Zhang, and C. T. Lu, “The Quality of AI-Generated Dental Caries Multiple Choice Questions: A Comparative Analysis of ChatGPT and Google Bard Language Models,” Heliyon 10 (2024): e28198, https://doi.org/10.1016/j.heliyon.2024.e28198.
- 32M. Javaid, A. Haleem, and R. P. Singh, “ChatGPT for Healthcare Services: An Emerging Stage for an Innovative Perspective,” BenchCouncil Transactions on Benchmarks, Standards and Evaluations 3 (2023): 100105, https://doi.org/10.1016/j.tbench.2023.100105.
10.1016/j.tbench.2023.100105 Google Scholar
- 33A. Sauerbrei, A. Kerasidou, F. Lucivero, and N. Hallowell, “The Impact of Artificial Intelligence on the Person-Centred, Doctor-Patient Relationship: Some Problems and Solutions,” BMC Medical Informatics and Decision Making 23 (2023): 73, https://doi.org/10.1186/s12911-023-02162-y.
- 34D. Sybil, P. Shrivastava, A. Rai, et al., “Performance of ChatGPT in Dentistry: Multi-Specialty and Multi-Centric Study,” (2023), https://doi.org/10.21203/rs.3.rs-3247663/v1.
10.21203/rs.3.rs?3247663/v1 Google Scholar
- 35Y. Balel, “Can ChatGPT Be Used in Oral and Maxillofacial Surgery?,” Journal of Stomatology, Oral and Maxillofacial Surgery 124 (2023): 101471, https://doi.org/10.1016/j.jormas.2023.101471.
- 36A. F. Fouad, P. V. Abbott, G. Tsilingaridis, et al., “International Association of Dental Traumatology Guidelines for the Management of Traumatic Dental Injuries: 2. Avulsion of Permanent Teeth,” Dental Traumatology 36 (2020): 331–342, https://doi.org/10.1111/edt.12573.
- 37A. Bernard, M. Langille, S. Hughes, C. Rose, D. Leddin, and S. V. Van Zanten, “A Systematic Review of Patient Inflammatory Bowel Disease Information Resources on the World Wide Web,” Official Journal of the American College of Gastroenterology| ACG 102 (2007): 2070–2077, https://doi.org/10.1111/j.1572-0241.2007.01325.x.
- 38P. Anandan, S. Kokila, S. Elango, P. Gopinath, and P. Sudarsan, “ Artificial Intelligence Based Chat Bot for Patient Health Care,” in 2022 International Conference on Computer Communication and Informatics (ICCCI) (IEEE, 2022), 1–4.
10.1109/ICCCI54379.2022.9740912 Google Scholar
- 39A. Ngo, S. Gupta, O. Perrine, R. Reddy, S. Ershadi, and D. Remick, “ChatGPT 3.5 Fails to Write Appropriate Multiple Choice Practice Exam Questions,” Academic Pathology 11 (2024): 100099, https://doi.org/10.1016/j.acpath.2023.100099.
- 40S. Yamaguchi, M. Morishita, H. Fukuda, et al., “Evaluating the Efficacy of Leading Large Language Models in the Japanese National Dental Hygienist Examination: A Comparative Analysis of ChatGPT, Bard, and Bing Chat,” Journal of Dental Sciences 19 (2024): 2262–2267, https://doi.org/10.1016/j.jds.2024.02.019.
- 41A. Rao, J. Kim, M. Kamineni, M. Pang, W. Lie, and M. D. Succi, “Evaluating ChatGPT as an Adjunct for Radiologic Decision-Making,” Journal of the American College of Radiology 20, no. 10 (2023): 990–997, https://doi.org/10.1101/2023.02.02.23285399.
- 42Y. Özbay, “Evaluation of ChatGPT as a Multiple-Choice Question Generator in Dental Traumatology,” Medical Record 6 (2024): 235–238, https://doi.org/10.37990/medr.1446396.
10.37990/medr.1446396 Google Scholar
- 43A. H. Acar, “Can Natural Language Processing Serve as a Consultant in Oral Surgery?,” Journal of Stomatology, Oral and Maxillofacial Surgery 125 (2024): 101724, https://doi.org/10.1016/j.jormas.2023.101724.
- 44A. Suárez, V. Díaz-Flores García, J. Algar, M. Gómez Sánchez, M. Llorente de Pedro, and Y. Freire, “Unveiling the ChatGPT Phenomenon: Evaluating the Consistency and Accuracy of Endodontic Question Answers,” International Endodontic Journal 57 (2024): 108–113, https://doi.org/10.1111/iej.13985.
- 45D. M. Korngiebel and S. D. Mooney, “Considering the Possibilities and Pitfalls of Generative Pre-Trained Transformer 3 (GPT-3) in Healthcare Delivery,” npj Digital Medicine 4 (2021): 93, https://doi.org/10.1038/s41746-021-00464-x.
- 46J. Rudolph, S. Tan, and S. Tan, “War of the Chatbots: Bard, Bing Chat, ChatGPT, Ernie and Beyond. The New AI Gold Rush and Its Impact on Higher Education,” Journal of Applied Learning and Teaching 6 (2023): 364–389, https://doi.org/10.37074/jalt.2023.6.1.23.
10.37074/jalt.2023.6.1.23 Google Scholar
- 47D. S. Hiwa, S. S. Abdalla, A. S. Muhialdeen, H. M. Hamasalih, and S. O. Karim, “Assessment of Nursing Skill and Knowledge of ChatGPT, Gemini, Microsoft Copilot, and Llama: A Comparative Study,” Barw Medical Journal 2, no. 2 (2024): 3–6, https://doi.org/10.58742/bmj.v2i2.87.
10.58742/bmj.v2i2.87 Google Scholar
- 48I. Ozden, M. Gokyar, M. E. Ozden, and O. H. Sazak, “Assessment of Artificial Intelligence Applications in Responding to Dental Trauma,” Dental Traumatology (2024): 1–8, https://doi.org/10.1111/edt.12965.
- 49R. K. Gan, J. C. Ogbodo, Y. Z. Wee, A. Z. Gan, and P. A. González, “Performance of Google Bard and ChatGPT in Mass Casualty Incidents Triage,” American Journal of Emergency Medicine 75 (2024): 72–78, https://doi.org/10.1016/j.ajem.2023.10.034.
- 50C. F. Snyder, A. W. Wu, R. S. Miller, R. E. Jensen, E. T. Bantug, and A. C. Wolff, “The Role of Informatics in Promoting Patient-Centered Care,” Cancer Journal 17 (2011): 211–218, https://doi.org/10.1097/PPO.0b013e318225ff89.
- 51M. J. Ball, N. Carla Smith, and R. S. Bakalar, “Personal Health Records: Empowering Consumers,” Journal of Healthcare Information Management 21 (2007): 77.
- 52A. S. Muhialdeen, S. A. Mohammed, N. H. A. Ahmed, et al., “Artificial Intelligence in Medicine: A Comparative Study of ChatGPT and Google Bard in Clinical Diagnostics,” Barw Medical Journal 1, no. 4 (2023): 7–13, https://doi.org/10.58742/pry94q89.
10.58742/pry94q89 Google Scholar
- 53K. Giannakopoulos, A. Kavadella, A. A. Salim, et al., “Evaluation of Generative Artificial Intelligence Large Language Models ChatGPT, Google Bard, and Microsoft Bing Chat in Supporting Evidence-Based Dentistry:A Comparative Mixed-Methods Study,” Journal of Medical Internet Research 25 (2023): e51580, https://doi.org/10.2196/51580.