Volume 35, Issue 5 pp. 876-878
Short Report

Will code one day run a code? Performance of language models on ACEM primary examinations and implications

Jesse Smith MD

Corresponding Author

Jesse Smith MD

Emergency Medicine Registrar

Eastern Health Emergency Medicine Program, Eastern Health, Melbourne, Victoria, Australia

Correspondence: Dr Jesse Smith, Emergency Department, Box Hill Hospital, 5 Arnold Street, Box Hill, VIC 3128, Australia. Email: [email protected]

Search for more papers by this author
Philip MC Choi MBChB, FRACP

Philip MC Choi MBChB, FRACP

Stroke Neurologist

Department of Neuroscience, Eastern Health, Melbourne, Victoria, Australia

Eastern Health Clinical School, Monash University, Melbourne, Victoria, Australia

Search for more papers by this author
Paul Buntine MBBS (Hons), FACEM, MClinRes

Paul Buntine MBBS (Hons), FACEM, MClinRes

Director of Emergency Medicine Research

Eastern Health Emergency Medicine Program, Eastern Health, Melbourne, Victoria, Australia

Department of Neuroscience, Eastern Health, Melbourne, Victoria, Australia

Search for more papers by this author
First published: 06 July 2023
Citations: 1

Abstract

Objective

Large language models (LLMs) have demonstrated mixed results in their ability to pass various specialist medical examination and their performance within the field of emergency medicine remains unknown.

Methods

We explored the performance of three prevalent LLMs (OpenAI's GPT series, Google's Bard, and Microsoft's Bing Chat) on a practice ACEM primary examination.

Results

All LLMs achieved a passing score, with scores with GPT 4.0 outperforming the average candidate.

Conclusion

Large language models, by passing the ACEM primary examination, show potential as tools for medical education and practice. However, limitations exist and are discussed.

Data availability statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.