Development and validation of risk prediction models for large for gestational age infants using logistic regression and two machine learning algorithms
使用Logistic回归和两种机器学习算法开发和验证大于胎龄儿风险预测模型
Ning Wang, Haonan Guo and Yingyu Jing contributed equally to this work.
Abstract
enBackground
Large for gestational age (LGA) is one of the adverse outcomes during pregnancy that endangers the life and health of mothers and offspring. We aimed to establish prediction models for LGA at late pregnancy.
Methods
Data were obtained from an established Chinese pregnant women cohort of 1285 pregnant women. LGA was diagnosed as >90th percentile of birth weight distribution of Chinese corresponding to gestational age of the same-sex newborns. Women with gestational diabetes mellitus (GDM) were classified into three subtypes according to the indexes of insulin sensitivity and insulin secretion. Models were established by logistic regression and decision tree/random forest algorithms, and validated by the data.
Results
A total of 139 newborns were diagnosed as LGA after birth. The area under the curve (AUC) for the training set is 0.760 (95% confidence interval [CI] 0.706–0.815), and 0.748 (95% CI 0.659–0.837) for the internal validation set of the logistic regression model, which consisted of eight commonly used clinical indicators (including lipid profile) and GDM subtypes. For the prediction models established by the two machine learning algorithms, which included all the variables, the training set and the internal validation set had AUCs of 0.813 (95% CI 0.786–0.839) and 0.779 (95% CI 0.735–0.824) for the decision tree model, and 0.854 (95% CI 0.831–0.877) and 0.808 (95% CI 0.766–0.850) for the random forest model.
Conclusion
We established and validated three LGA risk prediction models to screen out the pregnant women with high risk of LGA at the early stage of the third trimester, which showed good prediction power and could guide early prevention strategies.
摘要
zh背景:大于胎龄儿(Large for gestational age, LGA)是危害母儿生命健康的不良妊娠结局之一。本研究旨在建立妊娠晚期LGA的预测模型。
方法:研究对象来自一个1285名中国孕妇队列。LGA诊断为大于同性别新生儿胎龄对应的中国出生体重分布的第90百分位数。根据胰岛素敏感性和胰岛素分泌指标将GDM孕妇分为3型。通过logistic回归和决策树/随机森林算法建立模型, 并通过上述数据进行验证。
结果:139例新生儿出生后被诊断为LGA。由7项常用临床指标(包括血脂)和GDM亚型组成的logistic回归模型的训练集AUC为0.760 (95% CI 0.706 ~ 0.815), 内部验证集AUC为0.748 (95% CI 0.659 ~ 0.837)。两种机器学习算法建立的预测模型中, 决策树模型的训练集和内部验证集的AUC分别为0.813 (95%CI 0.786 ~ 0.839)和0.779 (95%CI 0.735 ~ 0.824), 随机森林模型AUC分别为0.854 (95%CI 0.831 ~ 0.877)和0.808 (95%CI 0.766 ~ 0.850)。
结论:本研究建立并验证了3种LGA风险预测模型, 可在孕晚期筛选出LGA高风险孕妇, 预测效能较好, 可指导早期预防策略。
CONFLICT OF INTEREST STATEMENT
The authors declare no potential conflicts of interest.
Open Research
DATA AVAILABILITY STATEMENT
The data associated with the paper are not publicly available but are available from the corresponding author on reasonable request.