Volume 45, Issue 4 e16139
ORIGINAL ARTICLE

A Machine Learning Model to Predict De Novo Hepatocellular Carcinoma Beyond Year 5 of Antiviral Therapy in Patients With Chronic Hepatitis B

Yeonjung Ha

Yeonjung Ha

Department of Gastroenterology, CHA Bundang Medical Center, CHA University, Seongnam-si, Gyeonggi-do, South Korea

Search for more papers by this author
Seungseok Lee

Seungseok Lee

Department of Biomedical Engineering, College of Electronics and Informatics, Kyung Hee University, Yongin-si, Gyeonggi-do, South Korea

Search for more papers by this author
Jihye Lim

Jihye Lim

Division of Gastroenterology and Hepatology, Department of Internal Medicine, College of Medicine, The Catholic University of Korea, Seoul, South Korea

Search for more papers by this author
Kwanjoo Lee

Kwanjoo Lee

Department of Gastroenterology, CHA Bundang Medical Center, CHA University, Seongnam-si, Gyeonggi-do, South Korea

Search for more papers by this author
Young Eun Chon

Young Eun Chon

Department of Gastroenterology, CHA Bundang Medical Center, CHA University, Seongnam-si, Gyeonggi-do, South Korea

Search for more papers by this author
Joo Ho Lee

Joo Ho Lee

Department of Gastroenterology, CHA Bundang Medical Center, CHA University, Seongnam-si, Gyeonggi-do, South Korea

Search for more papers by this author
Kwan Sik Lee

Kwan Sik Lee

Department of Gastroenterology, CHA Bundang Medical Center, CHA University, Seongnam-si, Gyeonggi-do, South Korea

Search for more papers by this author
Kang Mo Kim

Kang Mo Kim

Asan Liver Center, Asan Medical Center, University of Ulsan College of Medicine, Seoul, South Korea

Search for more papers by this author
Ju Hyun Shim

Ju Hyun Shim

Asan Liver Center, Asan Medical Center, University of Ulsan College of Medicine, Seoul, South Korea

Search for more papers by this author
Danbi Lee

Danbi Lee

Asan Liver Center, Asan Medical Center, University of Ulsan College of Medicine, Seoul, South Korea

Search for more papers by this author
Dong Keon Yon

Dong Keon Yon

Center for Digital Health, Medical Research Institute, Kyung Hee University Medical Center, Kyung Hee University, Seoul, South Korea

Search for more papers by this author
Jinseok Lee

Corresponding Author

Jinseok Lee

Department of Biomedical Engineering, College of Electronics and Informatics, Kyung Hee University, Yongin-si, Gyeonggi-do, South Korea

Correspondence:

Jinseok Lee ([email protected])

Han Chu Lee ([email protected])

Search for more papers by this author
Han Chu Lee

Corresponding Author

Han Chu Lee

Asan Liver Center, Asan Medical Center, University of Ulsan College of Medicine, Seoul, South Korea

Correspondence:

Jinseok Lee ([email protected])

Han Chu Lee ([email protected])

Search for more papers by this author
First published: 18 December 2024

Handling Editor: Alejandro Forner

Funding: The authors received no specific funding for this work.

Yeonjung Ha and Seungseok Lee contributed equally to this study.

ABSTRACT

Background and Aims

This study aims to develop and validate a machine learning (ML) model predicting hepatocellular carcinoma (HCC) in chronic hepatitis B (CHB) patients after the first 5 years of entecavir (ETV) or tenofovir (TFV) therapy.

Methods

CHB patients treated with ETV/TFV for > 5 years and not diagnosed with HCC during the first 5 years of therapy were selected from two hospitals. We used 36 variables, including baseline characteristics (age, sex, cirrhosis, and type of antiviral agent) and laboratory values (at baseline, at 5 years, and changes between 5 years) for model development. Five machine learning algorithms were applied to the training dataset and internally validated using a test dataset. External validation was performed.

Results

In years 5–15, a total of 279/5908 (4.7%) and 25/562 (4.5%) patients developed HCC in the derivation and external validation cohorts, respectively. In the training dataset (n = 4726), logistic regression showed the highest area under the receiver operating curve (AUC) of 0.803 and a balanced accuracy of 0.735, outperforming other ML algorithms. An ensemble model combining logistic regression and random forest performed best (AUC, 0.811 and balanced accuracy, 0.754). The results from the test dataset (n = 1182) verified the good performance of the ensemble model (AUC, 0.784 and balanced accuracy, 0.712). External validation confirmed the predictive accuracy of our ensemble model (AUC, 0.862 and balanced accuracy, 0.771). A web-based calculator was developed (http://ai-wm.khu.ac.kr/HCC/).

Conclusions

The proposed ML model excellently predicted HCC risk beyond year 5 of ETV/TFV therapy and, therefore, could facilitate individualised HCC surveillance based on risk stratification.

Conflicts of Interest

The authors declare no conflicts of interest.

Data Availability Statement

The datasets used and/or analysed during this study are available from the corresponding author upon reasonable request.

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.