Volume 45, Issue 4 e16112
ORIGINAL ARTICLE

Exploring the potential of large language models in identifying metabolic dysfunction-associated steatotic liver disease: A comparative study of non-invasive tests and artificial intelligence-generated responses

Wanying Wu

Wanying Wu

Department of Cardiology, Guangdong Cardiovascular Institute, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China

Department of Guangdong Provincial Key Laboratory of Coronary Heart Disease Prevention, Guangdong Cardiovascular Institute, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China

Search for more papers by this author
Yuhu Guo

Yuhu Guo

Faculty of Science and Engineering, The University of Manchester, Manchester, UK

Search for more papers by this author
Qi Li

Qi Li

Department of Neurology, The First Affiliated Hospital of Hebei North University, Zhangjiakou, China

Search for more papers by this author
Congzhuo Jia

Corresponding Author

Congzhuo Jia

Department of Cardiology, Guangdong Cardiovascular Institute, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China

Department of Guangdong Provincial Key Laboratory of Coronary Heart Disease Prevention, Guangdong Cardiovascular Institute, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China

Correspondence

Congzhuo Jia, Department of Cardiology, Guangdong Cardiovascular Institute, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou 510080, China.

Email: [email protected]

Search for more papers by this author
First published: 11 November 2024
Citations: 1

Wanying Wu and Yuhu Guo contributed equally to this work and share first authorship

Handling Editor: Luca Valenti.

Abstract

Background and Aims

This study sought to assess the capabilities of large language models (LLMs) in identifying clinically significant metabolic dysfunction-associated steatotic liver disease (MASLD).

Methods

We included individuals from NHANES 2017–2018. The validity and reliability of MASLD diagnosis by GPT-3.5 and GPT-4 were quantitatively examined and compared with those of the Fatty Liver Index (FLI) and United States FLI (USFLI). A receiver operating characteristic curve was conducted to assess the accuracy of MASLD diagnosis via different scoring systems. Additionally, GPT-4V's potential in clinical diagnosis using ultrasound images from MASLD patients was evaluated to provide assessments of LLM capabilities in both textual and visual data interpretation.

Results

GPT-4 demonstrated comparable performance in MASLD diagnosis to FLI and USFLI with the AUROC values of .831 (95% CI .796–.867), .817 (95% CI .797–.837) and .827 (95% CI .807–.848), respectively. GPT-4 exhibited a trend of enhanced accuracy, clinical relevance and efficiency compared to GPT-3.5 based on clinician evaluation. Additionally, Pearson's r values between GPT-4 and FLI, as well as USFLI, were .718 and .695, respectively, indicating robust and moderate correlations. Moreover, GPT-4V showed potential in understanding characteristics from hepatic ultrasound imaging but exhibited limited interpretive accuracy in diagnosing MASLD compared to skilled radiologists.

Conclusions

GPT-4 achieved performance comparable to traditional risk scores in diagnosing MASLD and exhibited improved convenience, versatility and the capacity to offer user-friendly outputs. The integration of GPT-4V highlights the capacities of LLMs in handling both textual and visual medical data, reinforcing their expansive utility in healthcare practice.

CONFLICT OF INTEREST STATEMENT

The authors declare that they have no competing interests.

DATA AVAILABILITY STATEMENT

The data that support the findings of this study are available in National Health and Nutrition Examination Survey (NHANES) at https://www.cdc.gov/nchs/nhanes/index.htm. These data were derived from the following resources available in the public domain: NHANES 2017–2018, https://wwwn.cdc.gov/nchs/nhanes/continuousnhanes/default.aspx?BeginYear=2017.

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.