Volume 15, Issue 4 pp. 338-348
ORIGINAL ARTICLE
Open Access

Development and validation of risk prediction models for large for gestational age infants using logistic regression and two machine learning algorithms

使用Logistic回归和两种机器学习算法开发和验证大于胎龄儿风险预测模型

Ning Wang

Ning Wang

Department of Endocrinology, The Second Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China

Search for more papers by this author
Haonan Guo

Haonan Guo

Department of Endocrinology and Second Department of Geriatrics, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China

Search for more papers by this author
Yingyu Jing

Yingyu Jing

Department of Endocrinology and Second Department of Geriatrics, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China

Search for more papers by this author
Yifan Zhang

Yifan Zhang

Department of Endocrinology and Second Department of Geriatrics, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China

Search for more papers by this author
Bo Sun

Bo Sun

Department of Physiology and Pathophysiology, School of Basic Medical Sciences, Xi'an Jiaotong University Health Science Center, Xi'an, China

Search for more papers by this author
Xingyan Pan

Xingyan Pan

Xi'an Jiaotong University, Xi'an, China

Search for more papers by this author
Huan Chen

Huan Chen

Department of Endocrinology and Second Department of Geriatrics, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China

Search for more papers by this author
Jing Xu

Jing Xu

Department of Endocrinology, The Second Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China

Search for more papers by this author
Mengjun Wang

Mengjun Wang

Department of Endocrinology, Xi'an, China

Search for more papers by this author
Xi Chen

Xi Chen

Department of Epidemiology and Statistics, School of Public Health, Medical College, Zhejiang University, Hangzhou, China

Search for more papers by this author
Lin Song

Corresponding Author

Lin Song

Department of Physiology and Pathophysiology, School of Basic Medical Sciences, Xi'an Jiaotong University Health Science Center, Xi'an, China

Correspondence

Lin Song, Department of Physiology and Pathophysiology, School of Basic Medical Sciences, Xi'an Jiaotong University Health Science Center, Xi'an, China.

Email: [email protected]

Wei Cui, Department of Endocrinology and Second Department of Geriatrics, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China.

Email: [email protected]

Search for more papers by this author
Wei Cui

Corresponding Author

Wei Cui

Department of Endocrinology and Second Department of Geriatrics, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China

Correspondence

Lin Song, Department of Physiology and Pathophysiology, School of Basic Medical Sciences, Xi'an Jiaotong University Health Science Center, Xi'an, China.

Email: [email protected]

Wei Cui, Department of Endocrinology and Second Department of Geriatrics, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China.

Email: [email protected]

Search for more papers by this author
First published: 08 March 2023

Ning Wang, Haonan Guo and Yingyu Jing contributed equally to this work.

Abstract

en

Background

Large for gestational age (LGA) is one of the adverse outcomes during pregnancy that endangers the life and health of mothers and offspring. We aimed to establish prediction models for LGA at late pregnancy.

Methods

Data were obtained from an established Chinese pregnant women cohort of 1285 pregnant women. LGA was diagnosed as >90th percentile of birth weight distribution of Chinese corresponding to gestational age of the same-sex newborns. Women with gestational diabetes mellitus (GDM) were classified into three subtypes according to the indexes of insulin sensitivity and insulin secretion. Models were established by logistic regression and decision tree/random forest algorithms, and validated by the data.

Results

A total of 139 newborns were diagnosed as LGA after birth. The area under the curve (AUC) for the training set is 0.760 (95% confidence interval [CI] 0.706–0.815), and 0.748 (95% CI 0.659–0.837) for the internal validation set of the logistic regression model, which consisted of eight commonly used clinical indicators (including lipid profile) and GDM subtypes. For the prediction models established by the two machine learning algorithms, which included all the variables, the training set and the internal validation set had AUCs of 0.813 (95% CI 0.786–0.839) and 0.779 (95% CI 0.735–0.824) for the decision tree model, and 0.854 (95% CI 0.831–0.877) and 0.808 (95% CI 0.766–0.850) for the random forest model.

Conclusion

We established and validated three LGA risk prediction models to screen out the pregnant women with high risk of LGA at the early stage of the third trimester, which showed good prediction power and could guide early prevention strategies.

摘要

zh

背景:大于胎龄儿(Large for gestational age, LGA)是危害母儿生命健康的不良妊娠结局之一。本研究旨在建立妊娠晚期LGA的预测模型。

方法:研究对象来自一个1285名中国孕妇队列。LGA诊断为大于同性别新生儿胎龄对应的中国出生体重分布的第90百分位数。根据胰岛素敏感性和胰岛素分泌指标将GDM孕妇分为3型。通过logistic回归和决策树/随机森林算法建立模型, 并通过上述数据进行验证。

结果:139例新生儿出生后被诊断为LGA。由7项常用临床指标(包括血脂)和GDM亚型组成的logistic回归模型的训练集AUC为0.760 (95% CI 0.706 ~ 0.815), 内部验证集AUC为0.748 (95% CI 0.659 ~ 0.837)。两种机器学习算法建立的预测模型中, 决策树模型的训练集和内部验证集的AUC分别为0.813 (95%CI 0.786 ~ 0.839)和0.779 (95%CI 0.735 ~ 0.824), 随机森林模型AUC分别为0.854 (95%CI 0.831 ~ 0.877)和0.808 (95%CI 0.766 ~ 0.850)。

结论:本研究建立并验证了3种LGA风险预测模型, 可在孕晚期筛选出LGA高风险孕妇, 预测效能较好, 可指导早期预防策略。

CONFLICT OF INTEREST STATEMENT

The authors declare no potential conflicts of interest.

DATA AVAILABILITY STATEMENT

The data associated with the paper are not publicly available but are available from the corresponding author on reasonable request.

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.