Volume 17, Issue 1 e70049

REVIEW ARTICLE

Open Access

Overcoming Missing Data: Accurately Predicting Cardiovascular Risk in Type 2 Diabetes, A Systematic Review

Wenhui Ren,

Wenhui Ren

Department of Clinical Epidemiology and Biostatistics, Peking University People's Hospital, Beijing, China

Search for more papers by this author

Keyu Fan,

Keyu Fan

Department of Anesthesiology, Peking University People's Hospital, Beijing, China

Search for more papers by this author

Zheng Liu,

Zheng Liu

Department of Clinical Epidemiology and Biostatistics, Peking University People's Hospital, Beijing, China

Search for more papers by this author

Yanqiu Wu,

Yanqiu Wu

Department of Clinical Epidemiology and Biostatistics, Peking University People's Hospital, Beijing, China

Search for more papers by this author

Haiyan An,

Corresponding Author

Haiyan An

[email protected]

Department of Anesthesiology, Peking University People's Hospital, Beijing, China

Correspondence:

Haiyan An ([email protected])

Huixin Liu ([email protected])

Search for more papers by this author

Huixin Liu,

Corresponding Author

Huixin Liu

[email protected]

orcid.org/0000-0003-1305-1977

Department of Clinical Epidemiology and Biostatistics, Peking University People's Hospital, Beijing, China

Correspondence:

Haiyan An ([email protected])

Huixin Liu ([email protected])

Search for more papers by this author

Wenhui Ren,

Wenhui Ren

Department of Clinical Epidemiology and Biostatistics, Peking University People's Hospital, Beijing, China

Search for more papers by this author

Keyu Fan,

Keyu Fan

Department of Anesthesiology, Peking University People's Hospital, Beijing, China

Search for more papers by this author

Zheng Liu,

Zheng Liu

Department of Clinical Epidemiology and Biostatistics, Peking University People's Hospital, Beijing, China

Search for more papers by this author

Yanqiu Wu,

Yanqiu Wu

Department of Clinical Epidemiology and Biostatistics, Peking University People's Hospital, Beijing, China

Search for more papers by this author

Haiyan An,

Corresponding Author

Haiyan An

[email protected]

Department of Anesthesiology, Peking University People's Hospital, Beijing, China

Correspondence:

Haiyan An ([email protected])

Huixin Liu ([email protected])

Search for more papers by this author

Huixin Liu,

Corresponding Author

Huixin Liu

[email protected]

orcid.org/0000-0003-1305-1977

Department of Clinical Epidemiology and Biostatistics, Peking University People's Hospital, Beijing, China

Correspondence:

Haiyan An ([email protected])

Huixin Liu ([email protected])

Search for more papers by this author

First published: 22 January 2025

https://doi.org/10.1111/1753-0407.70049

Funding: This work was supported by National Natural Science Foundation of China (NSFC [No. 81602939]), Research and Development Fund of Peking University People's Hospital [RDX2023-11], and Research and Development Fund of Peking University People's Hospital [RDGS2022-03]).

Wenhui Ren and Keyu Fan have contributed equally to this work and share first authorship.

Share a link

Email
Wechat
Bluesky

ABSTRACT

Understanding is limited regarding strategies for addressing missing value when developing and validating models to predict cardiovascular disease (CVD) in type 2 diabetes mellitus (T2DM). This study aimed to investigate the presence of and approaches to missing data in these prediction models. The MEDLINE electronic database was systematically searched for English-language studies from inception to June 30, 2024. The percentages of missing values, missingness mechanisms, and missing data handling strategies in the included studies were extracted and summarized. This study included 51 articles published between 2001 and 2024, involving 19 studies that focused solely on prediction model development, and 16 and 16 studies that incorporated internal and external validation, respectively. Most articles reported missing data in the development (n = 40/51) and external validation (n = 12/16) stages. Furthermore, the missing data were addressed in 74.5% of development studies and 68.8% of validation studies. Imputation emerged as the predominant method employed for both development (27/40) and validation (7/12) purposes, followed by deletion (17/40 and 4/12, respectively). During the model development phase, the number of studies reported missing data increased from 9 out of 15 before 2016 to 31 out of 36 in 2016 and subsequent years. Although missing values have received much attention in CVD risk prediction models in patients with T2DM, most studies lack adequate reporting on the methodologies used for addressing the missing data. Enhancing the quality assurance of prediction models necessitates heightened clarity and the utilization of suitable methodologies to handle missing data effectively.

Summary

In this systematic review of 51 studies, missing value were handled in 74.5% (38/51) of developmental studies and 68.8% (11/16) of validation studies.
Imputation emerged as the predominant missing data handling method employed for both development and validation purposes, followed by deletion.
While missing values in CVD prediction models for patients with T2DM have been studied, better clarity and appropriate methodologies for addressing missingness are still needed to improve prediction model quality.

1 Introduction

Type 2 diabetes mellitus (T2DM) ranks as the eighth most prevalent cause of mortality and morbidity, emerging as a significant global public health issue [1]. Cardiovascular disease (CVD) continues to be the leading cause of death and disability in individuals with T2DM [2]. Accumulating evidence shows that patients with T2DM exhibit a 2–4 times greater likelihood of developing CVD than those without T2DM, resulting in a higher disease burden, hospitalization, and treatment costs [3, 4]. Therefore, timely identification of individuals at high risk for CVD is essential for initiating early prevention strategies, providing prompt treatment, and ensuring effective disease management. Various risk prediction models have been developed and validated for CVD in patients with T2DM [5-7]. However, the productive performance and quality of such models are uneven and depend heavily on the quality and availability of datasets.

Missing data is a pervasive and significant challenge during the development of prediction models, especially when utilizing extensive datasets. This missingness can result in biased estimations, diminished statistical power, and inaccurate conclusions, thereby compromising the predictive performance of the model and diminishing its decision-making efficacy in clinical practice. The Transparent Reporting of a multivariable prediction model for the Individual Prognosis or Diagnosis (TRIPOD) initiative has been proposed. It recommends that authors transparently report the occurrence and extent of missing values in both the development and validation sets, as well as detail the methods used to address any missing data during the analysis [8]. Thus, other researchers and policymakers must adequately assess the potential usefulness of prediction models. However, compliance with these reporting guidelines appears to be constrained among studies that apply prediction models, particularly when missing data are often inadequately addressed or disregarded.

Multiple approaches have been developed to address missingness, including complete case analysis (CCA) and imputation-based methods. These strategies are always valid under specific circumstances [9, 10]. Moreover, many reviews of prediction models have found that the approaches for addressing missing values are inadequately explained or exhibit unclear applications [11]. A recent review examined the patterns of missing data and strategies for addressing them in predictive research for undiagnosed T2DM, while the handling of missing data in prediction models for CVD among individuals with T2DM remains unexplored [12].

Therefore, this study aimed to systematically review the prediction model of CVD for individuals with T2DM as the primary outcome and analyze the presence of missing data, missing scenarios, and the application of different strategies for processing missing data.

2 Methods

2.1 Literature Search

The search strategy was designed for the MEDLINE electronic database from its inception until June 30, 2024, using variations of search terms including “diabetes mellitus type 2,” “cardiovascular diseases,” and “prediction model” (Supporting Information: Appendix). The references cited in the articles were meticulously examined using the snowballing approach to identify additional relevant research.

2.2 Study Selection

Figure 1 displays this study's Preferred Reporting Items for Systematic reviews and Meta-Analyses flow diagram. The following eligibility criteria for studies were applied: (1) patients diagnosed with T2DM; (2) studies focused on developing prediction models; (3) outcome(s) of the predictive model was CVD or included CVD; and (4) English-language. Reviews, studies that only reported the external validation stage, and papers for which full-text versions were unavailable were excluded. Two reviewers (W.H.R. and K.Y.F.) independently reviewed the article titles and abstracts. Discrepancies were evaluated by another reviewer (H.X.L).

Details are in the caption following the image — **FIGURE 1**
Open in figure viewer PowerPoint

Flow diagram of Preferred Reporting Items for Systematic Reviews and Meta-Analyses.

2.3 Data Extraction and Analysis

A standardized data extraction template was developed to catalogue basic publication information (i.e., publication year and study design) and the characteristics of the predictive models (i.e., data source, study population description, sample size, outcomes, statistical models, and predictive model stage). Whether and how well the included studies reported on and addressed the issue of missing data were also summarized based on previous studies [8], which included information missingness of the study variables (i.e., missingness mechanisms and percentage of missing values) and missing data handling strategies (i.e., deletion, imputation-based methods, and non-imputation-based methods).

3 Results

The preliminary literature search identified 471 articles. After removing duplicate entries and applying the inclusion and exclusion criteria, 51 articles were included for further review (Figure 1).

3.1 Study Characteristics

Table S1 presents an overview of the characteristics of the 51 studies that were included in the analysis, which were published between 2001 and 2024 [5, 13-62]. Many studies (19 of 51) focused only on the development process for the prediction models [15, 18, 19, 21-25, 31, 35, 46, 48, 49, 52, 55-57, 59, 61], 16 articles on model development also included the internal validation stage [13, 16, 20, 28, 33, 34, 36, 37, 40, 43, 45, 51, 54, 60, 62, 32], and 16 articles conducted additional external validation [5, 14, 17, 26, 27, 29, 30, 38, 39, 41, 42, 44, 47, 50, 53, 58]. The study sample sizes regarding the developmental stages of the prediction models varied from 151 to 1 297 131.

Cohort studies comprised 39 of the articles included in this review [5, 13-15, 18, 20-22, 24, 26-28, 30-34, 36, 37, 42-49, 51, 53-58, 61, 62, 39, 59], while six [16, 17, 23, 40, 59, 60] and eight studies [19, 25, 29, 35, 38, 41, 50, 52] developed risk prediction models based on cross-sectional studies and randomized controlled trials, respectively. Accordingly, seven studies focused on developing diagnostic prediction models [25, 26, 35, 40, 44, 47, 59], while the remaining studies primarily centered on prognostic prediction models [13-24, 27-34, 36-38, 45, 46]. Regarding model types, most studies (35/51) applied the Cox proportional hazards regression model [13-15, 17-20, 22, 24, 25, 27-31, 33-36, 38, 41, 42, 45, 48-58, 61], followed by a logistic regression model (10/51) [16, 23, 26, 32, 37, 39, 40, 43, 44, 59]. The other models included decision tree [24, 39, 44], knowledge learning symbiosis [46], random survival forest methods [47], and deep neural network [60].

3.2 Presence and Mechanisms for Handling Missing Data

Almost 78.4% (40/51) of the included studies reported missing values at the prediction model development stage [5, 13-17, 20, 21, 24, 26, 27, 29, 30, 32, 33, 35-37, 39-54, 56-58, 61, 62, 18] while 75% (12/16) of studies reported missingness in the external validation stage [14, 26, 27, 30, 39, 41, 42, 44, 50, 53, 58, 5]. Additionally, 22 studies reported the percentage of missing data [5, 14, 17, 20, 21, 26, 29, 32, 33, 39, 40, 42-44, 46, 47, 50-54, 58, 61] that most often included the overall number (n = 6) [29, 42-44, 51, 52] or frequency of missingness (n = 16) [5, 14, 17, 20, 21, 26, 32, 33, 39, 40, 46, 47, 53, 54, 58, 61]. The extent of missingness ranged from < 1% to 97.81%. Only four studies described the mechanisms of missing data, specifically addressing data missing at random (MAR) (n = 2) and missing not at random (MNAR) (n = 2) [15, 17, 40, 58] (Table S2).

3.3 Strategies for Handling Missing Data

A total of 38 out of 40 articles that reported missing values also included a description of the strategies applied to handle missing data [5, 13-18, 20, 21, 26, 27, 29, 30, 32, 33, 35-37, 39, 45-54, 56-58, 61, 62, 40, 42-44], and 11 studies consisting of an external validation stage [5, 14, 26, 27, 30, 39, 42, 44, 50, 53, 58] (Table S2).

3.3.1 Development Stage

In development studies, the most common approach for missing values was imputation (27/38) [5, 14-16, 18, 20, 21, 26, 27, 30, 32, 33, 35, 39, 40, 42, 43, 45, 47-50, 53, 54, 57, 58, 62], followed by deletion (17/38) [13, 17, 29, 36, 37, 39, 44, 46, 47, 51-53, 56, 61, 20, 30, 43]. Simple imputation (SI) [15, 16, 18, 21, 27, 39, 49, 54, 57, 62] and multiple imputation (MI) [5, 14, 20, 30, 32, 33, 35, 36, 40, 42, 43, 50, 53, 58, 26] were reported in 10 and 15 studies, respectively, while one study utilized both SI and MI [48]. The last observation carried forward (n = 3) [27, 49, 62] was the most common method among studies using SI; the other methods included mean, median, and mode imputation. Multivariate imputation by chained equations (MICE) (6/15) was reported as the common MI method for addressing missing values [5, 26, 32, 48, 53, 58]. One study reported a fully conditional specification method [20]. Among studies that used deletion methods, 13 of the 17 articles applied CCA to handle missing data [13, 17, 20, 29, 30, 36, 37, 39, 43, 44, 46, 51, 61]. Four studies [39, 43, 44, 47] applied a machine learning method (two used random forest imputation [44, 47]) to address missing data (Figure 2).

3.3.2 External Validation Stage

Of the 11 studies with an external validation stage that reported missingness [14, 26, 27, 30, 39, 41, 42, 44, 50, 53, 58], six applied imputation methods to handle missing data [14, 27, 30, 42, 50, 53], three used deletion strategies [26, 44, 58], and one applied both methods [39]. Among the five articles that employed the MI strategy [14, 30, 42, 50, 53], only one reported the use of MICE [53]. Another study reported mean, median, and mode imputations [39]. In addition, CCA was applied in four studies that utilized the deletion approach [26, 39, 44, 58]. Six studies [14, 27, 39, 42, 44, 50] demonstrated a consistent approach to address missing data among the 11 studies [14, 26, 27, 30, 39, 41, 42, 44, 50, 53, 58] that included both development and external validation stages. Only two studies utilized imputation-based methods during the development stage and opted for deletion strategies during the external validation stage [26, 58] (Figure 3).

3.4 Trends in Missing Data Processing Before and After 2015

An examination was conducted on the present status of reporting missing data and the modifications in strategies for handling missing data within the studies, including both pre- and post-TRIPOD release. Until 2015, during the model development phase, nine [17, 18, 27, 30, 33, 45, 49, 61, 62] out of fifteen [17, 18, 23, 25, 27, 28, 30, 31, 33, 34, 45, 49, 59, 61, 62] studies reported missing data, with seven [18, 27, 30, 33, 45, 49, 62] utilizing imputation-based techniques (including two employing MIs [30, 33]), two utilizing deletion methods [17, 61], and one combining both deletion and imputation approaches [30]. In 2016 and subsequent years, 31 [5, 13-16, 20, 21, 24, 26, 29, 32, 35-37, 39-44, 46-48, 50-54, 56-58] out of 36 [5, 13-16, 19-22, 24, 26, 29, 32, 35-44, 46-48, 50-58, 60] reports contained missing data, with 15 [5, 14-16, 21, 26, 32, 35, 36, 40, 42, 50, 54, 57, 58] studies employing imputation-based techniques (including 10 utilizing MI [5, 14, 26, 32, 35, 36, 40, 42, 50, 58]), eight utilizing deletion methods [13, 29, 37, 44, 46, 51, 52, 56], and five employing a combination of deletion and imputation strategies [20, 39, 43, 48, 53]. Regarding the external validation stage, a study of three papers published before 2016 revealed that two [27, 30] reported missing data and utilized imputation-based methods. After 2016, 10 [14, 26, 39, 41, 42, 44, 50, 53, 54, 58] out of 13 [5, 14, 26, 29, 38, 39, 41, 42, 44, 47, 50, 53, 58] papers reported missing data, with four employing imputation-based techniques [14, 42, 50, 53], three utilizing deletion methods [26, 44, 58], and one employing a combination of deletion and imputation strategies [39]. Among the papers that utilized imputation-based methods for addressing missing data, four studies employed the MI approach [14, 42, 50, 53].

4 Discussion

Prediction models are being increasingly developed to support clinical decision-making. Missing data is prevalent during the development and validation of prediction models, necessitating careful handling to prevent any negative impact on their quality. The present review's findings revealed that most (78.4%, 40/51) of the included studies on CVD risk among patients with T2DM reported missing data, and 74.5% (38/51) applied missing data handling approaches. Imputation was a commonly utilized approach, although most studies lacked detailed statistical information on the methods used. Notably, the frequency of missing data reports and the utilization of MI methods during the model development stage in the included articles demonstrated an increase subsequent to the publication of the TRIPOD guidelines. However, a comprehensive understanding is lacking regarding the proper selection and application of different methods, which may have implications for model development efficacy.

Missing predictor values pose obstacles to the development and practical implementation of predictive models, specifically affecting their predictive abilities. Interestingly, the missing data problem has garnered greater focus during the development stage than during the external validation and implementation stages [63, 64]. This phenomenon was also observed in our review, in which almost 78.4% (40/51) of included studies reported missing data during the development stage, while only 75% (12) reported them during the external validation stage. This difference may be associated with the lack of predictor variable management during the validation and implementation of a predictive model for novel patient populations.

Elucidating missing data characteristics is imperative to understanding data quality and making informed decisions regarding the selection of suitable data processing techniques. The missing data percentages and mechanisms are very important in evaluating the associated circumstances [65, 66]. In this review, only 22 studies that reported missing data included the overall number (n = 6) or missing data frequency (n = 16). The missing data percentage varied from < 1%–97.81%, showing great heterogeneity. This may be attributed primarily to the data sources utilized in developing the models' intricate nature, encompassing cohort, field survey data, and electronic medical records. Missing data mechanisms can be categorized as ‘missing completely at random (MCAR), MAR, or MNAR, depending on the reasons for missing data [67].

Analyzing missing data characteristics enables researchers to make informed decisions regarding the appropriate addressing strategies based on assumptions about data patterns. MCAR is rarely used and always involves negligible data, whereas scenarios involving MAR and MNAR are more likely. In this review, only two missingness mechanisms were reported, suggesting that many researchers continue to overlook the significance of accurately reporting missing data.

To correct the bias resulting from improper handling of missing data, TRIPOD emphasizes the significance of including missing data processing as a crucial study component [8]. As mentioned earlier, the rise in missing data reports and the use of MI methods in the model development phase post-TRIPOD suggests that adherence to the TRIPOD guidelines may enhance the quality of prediction model studies, particularly in processing missing data. Evidence regarding missing data strategies dates to the 1970s, and many mature statistical methods have been proposed, including CCA and imputation-based methods [68]. Processing relies on complete data and involves removing all cases with missing values to acquire a complete dataset for analysis. This method is user-friendly and widely adopted by clinical researchers [68]. A previous systematic review of 152 prediction model studies utilizing machine learning suggested that the predominant method for addressing missing values is deletion (65/96), mostly via CCA (43/96), followed by imputation [69]. However, drawbacks include alterations in the original data distribution, potentially leading to an increase in bias, standard error, and imprecision in the regression coefficient estimates.

Imputation aims to predict missing values by obtaining values through the relationships within and between variables, including SI and MI. MI is the most popular missing data-handling method in medical statistics. Our findings also indicated that the imputation-based method was more widely used than the deletion strategies in both the development and external validation stages. This suggests that imputation-based missing data approaches are progressively expanding and gaining acceptance among medical researchers, thereby broadening their scope. However, one study suggested that MI was more suitable when using research data, such as those from cohort studies that exhibit a low percentage of missing data and generally reasonable levels of MAR [11]. When using “real-world” data, such as those from electronic health systems, missingness (typically informative and with a high rate using the MNAR mechanism), including missing indicators, could substantially improve predictive model performance [70, 71].

The application of the appropriate missing data handling method during the development and validation stages of predictive model is related to integrated missingness scenarios that consider missingness mechanisms, missingness patterns, and percentages of missing values. It indicates that researchers ought to comprehensively account for these factors when addressing missing values in predictive models. Compared with CCA, MI can reduce bias and enhance the model's predictive performance, particularly when the missing data mechanism is MAR. Other recently developed approaches address missing data by incorporating missing indicator variables, employing pattern-mixing models, utilizing tree-based integration, or applying machine learning techniques to circumvent inference challenges associated with missing data [64]. Furthermore, a lack of consistency in the strategies for addressing missing data during the development and validation stages can lead to an over or underestimation of the clinical prediction models [11, 72]. In this review, seven studies applied the same strategies to address missing data among the 13 studies that incorporated both development and external validation stages. In addition, among the studies with external validation, the details of the missing data strategies were fewer than those of model development. This may be attributed to the lack of emphasis on external validation and prediction model implementation, as evidenced by previous research findings [64]. The TRIPOD statement recommends the inclusion of information regarding the missing data percentage, missingness rationales, and notable distinctions between patients with complete versus incomplete data. Arguably, the reporting level is a critical foundation for evaluating the predictive model's efficacy. Provision of details is a crucial scientific research and reporting requirement to enable study replication. However, most studies that applied methods to handle missing data lacked sufficient detail. Thus, the methodological quality of predictive models must urgently be improved to promote good practices for handling missing data.

This review had some limitations. Some relevant studies might have been missed due to reliance on a single database. Nevertheless, we believe that the review comprehensively covers most of the literature through the application of a snowballing supplementary search. Furthermore, studies that were only in the external validation or deployment stages were not included. However, given that our study is built upon the same research framework that compared missing data processing strategies at various prediction model stages, its findings are more generalizable. Moreover, the analysis focused exclusively on models designed to predict CVD risk in patients with T2DM, which may differ from models used to predict CVD in other populations.

5 Conclusion

The issue of missing data in the development stage of CVD risk prediction models among patients with T2DM has gained much attention; however, insufficient consideration is paid to the validation stage. Notably, there is still room for improvement in selecting appropriate methods and providing details for handling missing data during the development and validation of CVD risk prediction models in T2DM patients. Additionally, the missingness mechanism should be carefully considered.

Author Contributions

All the authors contributed to the conception or design of this study. Wenhui Ren, Keyu Fan, Zheng Liu, and Yanqiu Wu were involved in collecting and analyzing the data. Wenhui Ren, Keyu Fan, Huixin Liu, and Haiyan An wrote and prepared the manuscript. All the authors revised the manuscript critically and approved the final version. Additionally, Huixin Liu and Haiyan An are the guarantors, who had full access to all of the data in the study and took responsibility for the integrity of the data and the accuracy of the data analysis.

Acknowledgments

This work was supported by National Natural Science Foundation of China (NSFC [No. 81602939]), Research and Development Fund of Peking University People's Hospital [RDX2023-11], and Research and Development Fund of Peking University People's Hospital [RDGS2022-03]).

Conflicts of Interest

The authors declare no conflicts of interest.

Open Research

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Supporting Information

References

1 GBD 2019 Diseases and Injuries Collaborators, “Global Burden of 369 Diseases and Injuries in 204 Countries and Territories, 1990–2019: A Systematic Analysis for the Global Burden of Disease Study 2019,” Lancet 396 (2020): 1204–1222.
10.1016/S0140-6736(20)30925-9
PubMed Web of Science® Google Scholar
2J. J. Joseph, P. Deedwania, T. Acharya, et al., “Comprehensive Management of Cardiovascular Risk Factors for Adults With Type 2 Diabetes: A Scientific Statement From the American Heart Association,” Circulation 145 (2022): e722–e759.
10.1161/CIR.0000000000001040
PubMed Web of Science® Google Scholar
3A. Rawshani, A. Rawshani, S. Franzén, et al., “Risk Factors, Mortality, and Cardiovascular Outcomes in Patients With Type 2 Diabetes,” New England Journal of Medicine 379 (2018): 633–644.
10.1056/NEJMoa1800256
PubMed Web of Science® Google Scholar
4H. Wu, E. S. H. Lau, R. C. W. Ma, et al., “Secular Trends in All-Cause and Cause-Specific Mortality Rates in People With Diabetes in Hong Kong, 2001-2016: A Retrospective Cohort Study,” Diabetologia 63 (2020): 757–766.
10.1007/s00125-019-05074-7
CAS PubMed Web of Science® Google Scholar
5 SCORE2-Diabetes Working Group and the ESC Cardiovascular Risk Collaboration, “Score2-Diabetes: 10-Year Cardiovascular Risk Estimation in Type 2 Diabetes in Europe,” European Heart Journal 44 (2023): 2544–2556.
10.1093/eurheartj/ehad260
PubMed Web of Science® Google Scholar
6O. T. Kee, H. Harun, N. Mustafa, et al., “Cardiovascular Complications in a Diabetes Prediction Model Using Machine Learning: A Systematic Review,” Cardiovascular Diabetology 22 (2023): 13.
10.1186/s12933-023-01741-7
PubMed Web of Science® Google Scholar
7G. Kostopoulos, C. Antza, I. Doundoulakis, and K. A. Toulis, “Risk Models and Scores of Cardiovascular Disease in Patients With Diabetes Mellitus,” Current Pharmaceutical Design 27 (2021): 1245–1253.
10.2174/1381612826666201210112743
CAS PubMed Google Scholar
8G. S. Collins, J. B. Reitsma, D. G. Altman, and K. G. Moons, “Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD): The TRIPOD Statement,” BMJ 350 (2015): g7594.
10.1136/bmj.g7594
PubMed Web of Science® Google Scholar
9J. A. Sterne, I. R. White, J. B. Carlin, et al., “Multiple Imputation for Missing Data in Epidemiological and Clinical Research: Potential and Pitfalls,” BMJ 338 (2009): b2393.
10.1136/bmj.b2393
PubMed Web of Science® Google Scholar
10O. Harel, E. M. Mitchell, N. J. Perkins, et al., “Multiple Imputation for Incomplete Data in Epidemiologic Studies,” American Journal of Epidemiology 187 (2018): 576–584.
10.1093/aje/kwx349
PubMed Web of Science® Google Scholar
11M. Sperrin, G. P. Martin, R. Sisk, and N. Peek, “Missing Data Should Be Handled Differently for Prediction Than for Description or Causal Explanation,” Journal of Clinical Epidemiology 125 (2020): 183–187.
10.1016/j.jclinepi.2020.03.028
PubMed Web of Science® Google Scholar
12K. L. Masconi, T. E. Matsha, J. B. Echouffo-Tcheugui, R. T. Erasmus, and A. P. Kengne, “Reporting and Handling of Missing Data in Predictive Research for Prevalent Undiagnosed Type 2 Diabetes Mellitus: A Systematic Review,” EPMA Journal 6 (2015): 7.
10.1186/s13167-015-0028-0
PubMed Web of Science® Google Scholar
13M. K. Kim, K. Han, J. H. Cho, H. S. Kwon, K. H. Yoon, and S. H. Lee, “A Model to Predict Risk of Stroke in Middle-Aged Adults With Type 2 Diabetes Generated From a Nationwide Population-Based Cohort Study in Korea,” Diabetes Research and Clinical Practice 163 (2020): 108157.
10.1016/j.diabres.2020.108157
PubMed Google Scholar
14C. Schiborn, T. Kühn, K. Mühlenbruch, et al., “A Newly Developed and Externally Validated Non-Clinical Score Accurately Predicts 10-Year Cardiovascular Disease Risk in the General Adult Population,” Scientific Reports 11 (2021): 19609.
10.1038/s41598-021-99103-4
CAS PubMed Web of Science® Google Scholar
15B. A. Williams, D. Geba, J. M. Cordova, and S. S. Shetty, “A Risk Prediction Model for Heart Failure Hospitalization in Type 2 Diabetes Mellitus,” Clinical Cardiology 43 (2020): 275–283.
10.1002/clc.23298
PubMed Web of Science® Google Scholar
16M. Xue, L. Liu, S. Wang, et al., “A Simple Nomogram Score for Screening Patients With Type 2 Diabetes to Detect Those With Hypertension: A Cross-Sectional Study Based on a Large Community Survey in China,” PLoS One 15 (2020): e0236957.
10.1371/journal.pone.0236957
CAS PubMed Google Scholar
17W. A. Davis, M. W. Knuiman, and T. M. E. Davis, “An Australian Cardiovascular Risk Equation for Type 2 Diabetes: The Fremantle Diabetes Study,” Internal Medicine Journal 40 (2010): 286–292.
10.1111/j.1445-5994.2009.01958.x
CAS PubMed Web of Science® Google Scholar
18J. N. Adams, L. M. Raffield, B. I. Freedman, et al., “Analysis of Common and Coding Variants With Cardiovascular Disease in the Diabetes Heart Study,” Cardiovascular Diabetology 13 (2014): 77.
10.1186/1475-2840-13-77
PubMed Web of Science® Google Scholar
19J. Oellgaard, P. Gæde, F. Persson, P. Rossing, H. H. Parving, and O. Pedersen, “Application of Urinary Proteomics as Possible Risk Predictor of Renal and Cardiovascular Complications in Patients With Type 2-Diabetes and Microalbuminuria,” Journal of Diabetes and its Complications 32 (2018): 1133–1140.
10.1016/j.jdiacomp.2018.09.012
PubMed Web of Science® Google Scholar
20C. S. Liu, C. I. Li, M. C. Wang, S. Y. Yang, T. C. Li, and C. C. Lin, “Building Clinical Risk Score Systems for Predicting the All-Cause and Expanded Cardiovascular-Specific Mortality of Patients With Type 2 Diabetes,” Diabetes, Obesity & Metabolism 23 (2021): 467–479.
10.1111/dom.14240
CAS PubMed Web of Science® Google Scholar
21E. O. Caplan, J. Hayden, P. Pimple, et al., “Cardiovascular Risk Prediction Model and Stratification in Patients With Type 2 Diabetes Enrolled in a Medicare Advantage Plan,” Journal of Managed Care & Specialty Pharmacy 27 (2021): 1579–1591.
10.18553/jmcp.2021.27.11.1579
PubMed Google Scholar
22A. H. Alrawahi, P. Lee, Z. A. M. Al-Anqoudi, et al., “Cardiovascular Risk Prediction Model for Omanis With Type 2 Diabetes,” Diabetes and Metabolic Syndrome: Clinical Research and Reviews 12 (2018): 105–110.
10.1016/j.dsx.2017.09.012
PubMed Google Scholar
23T. Nakatou, K. Nakata, A. Nakamura, and T. Itoshima, “Carotid Haemodynamic Parameters as Risk Factors for Cerebral Infarction in Type 2 Diabetic Patients,” Diabetic Medicine 21 (2004): 223–229.
10.1111/j.1464-5491.2004.01108.x
CAS PubMed Web of Science® Google Scholar
24E. Y. F. Wan, D. Y. T. Fong, C. S. C. Fung, et al., “Classification Rule for 5-Year Cardiovascular Diseases Risk Using Decision Tree in Primary Care Chinese Patients With Type 2 Diabetes Mellitus,” Scientific Reports 7 (2017): 15238.
10.1038/s41598-017-15579-z
PubMed Google Scholar
25A. P. Kengne, A. Patel, M. Marre, et al., “Contemporary Model for Cardiovascular Risk Prediction in People With Type 2 Diabetes,” European Journal of Cardiovascular Prevention and Rehabilitation 18 (2011): 393–398.
10.1177/1741826710394270
PubMed Web of Science® Google Scholar
26D. Yu, Y. Cai, J. Graffy, D. Holman, Z. Zhao, and D. Simmons, “Derivation and External Validation of Risk Algorithms for Cerebrovascular (Re)hospitalisation in Patients With Type 2 Diabetes: Two Cohorts Study,” Diabetes Research and Clinical Practice 144 (2018): 74–81.
10.1016/j.diabres.2018.08.006
PubMed Web of Science® Google Scholar
27C. R. Elley, E. Robinson, T. Kenealy, D. Bramley, and P. L. Drury, “Derivation and Validation of a New Cardiovascular Risk Score for People With Type 2 Diabetes,” Diabetes Care 33 (2010): 1347–1352.
10.2337/dc09-1444
CAS PubMed Web of Science® Google Scholar
28X. Yang, R. C. Ma, W.-Y. So, et al., “Development and Validation of a Risk Score for Hospitalization for Heart Failure in Patients With Type 2 Diabetes Mellitus,” Cardiovascular Diabetology 7 (2008): 9.
10.1186/1475-2840-7-9
PubMed Web of Science® Google Scholar
29S. Basu, J. B. Sussman, S. A. Berkowitz, R. A. Hayward, and J. S. Yudkin, “Development and Validation of Risk Equations for Complications of Type 2 Diabetes (Recode) Using Individual Participant Data From Randomised Trials,” Lancet Diabetes and Endocrinology 5 (2017): 788–798.
10.1016/S2213-8587(17)30221-8
PubMed Web of Science® Google Scholar
30J. Hippisley-Cox and C. Coupland, “Development and Validation of Risk Prediction Equations to Estimate Future Risk of Heart Failure in Patients With Diabetes: A Prospective Cohort Study,” BMJ Open 5 (2015): e008503.
10.1136/bmjopen-2015-008503
PubMed Google Scholar
31X. Yang, W.-Y. So, A. P. S. Kong, et al., “Development and Validation of Stroke Risk Equation for Hong Kong Chinese Patients With Type 2 Diabetes,” Diabetes Care 30 (2007): 65–70.
10.2337/dc06-1273
PubMed Web of Science® Google Scholar
32E. Y. F. Wan, D. Y. T. Fong, C. S. C. Fung, et al., “Development of a Cardiovascular Diseases Risk Prediction Model and Tools for Chinese Patients With Type 2 Diabetes Mellitus: A Population-Based Retrospective Cohort Study,” Diabetes, Obesity and Metabolism 20 (2018): 309–318.
10.1111/dom.13066
CAS PubMed Web of Science® Google Scholar
33J. Yeboah, R. Erbel, J. C. Delaney, et al., “Development of a New Diabetes Risk Prediction Tool for Incident Coronary Heart Disease Events: The Multi-Ethnic Study of Atherosclerosis and the Heinz Nixdorf Recall Study,” Atherosclerosis 236 (2014): 411–417.
10.1016/j.atherosclerosis.2014.07.035
CAS PubMed Web of Science® Google Scholar
34J. A. Piniés, F. González-Carril, J. M. Arteagoitia, et al., “Development of a Prediction Model for Fatal and Non-Fatal Coronary Heart Disease and Cardiovascular Disease in Patients With Newly Diagnosed Type 2 Diabetes Mellitus: The Basque Country Prospective Complications and Mortality Study Risk Engine (BASCORE),” Diabetologia 57 (2014): 2324–2333.
10.1007/s00125-014-3370-1
CAS PubMed Google Scholar
35P. Yang, Y. Zhao, and N. D. Wong, “Development of a Risk Score for Atrial Fibrillation in Adults With Diabetes Mellitus (From the Accord Study),” American Journal of Cardiology 125 (2020): 1638–1643.
10.1016/j.amjcard.2020.03.002
CAS PubMed Web of Science® Google Scholar
36T.-C. Li, H.-C. Wang, C.-I. Li, et al., “Establishment and Validation of a Prediction Model for Ischemic Stroke Risks in Patients With Type 2 Diabetes,” Diabetes Research and Clinical Practice 138 (2018): 220–228.
10.1016/j.diabres.2018.01.034
PubMed Google Scholar
37R. Shi, T. Zhang, H. Sun, and F. Hu, “Establishment of Clinical Prediction Model Based on the Study of Risk Factors of Stroke in Patients With Type 2 Diabetes Mellitus,” Frontiers in Endocrinology 11 (2020): 559.
10.3389/fendo.2020.00559
PubMed Web of Science® Google Scholar
38D. D. Berg, S. D. Wiviott, B. M. Scirica, et al., “Heart Failure Risk Stratification and Efficacy of Sodium-Glucose Cotransporter-2 Inhibitors in Patients With Type 2 Diabetes Mellitus,” Circulation 140 (2019): 1569–1577.
10.1161/CIRCULATIONAHA.119.042685
CAS PubMed Web of Science® Google Scholar
39J. Ding, Y. Luo, H. Shi, et al., “Machine Learning for the Prediction of Atherosclerotic Cardiovascular Disease During 3-Year Follow Up in Chinese Type 2 Diabetes Mellitus Patients,” Journal of Diabetes Investigation 14 (2023): 1289–1302.
10.1111/jdi.14069
CAS PubMed Web of Science® Google Scholar
40M. Hao, X. Huang, X. Liu, et al., “Novel Model Predicts Diastolic Cardiac Dysfunction in Type 2 Diabetes,” Annals of Medicine 55 (2023): 766–777.
10.1080/07853890.2023.2180154
PubMed Google Scholar
41F. Said, C. Arnott, A. A. Voors, H. J. L. Heerspink, and J. M. Ter Maaten, “Prediction of New-Onset Heart Failure in Patients With Type 2 Diabetes Derived From Altitude and Canvas,” Diabetes, Obesity & Metabolism 26 (2024): 2741–2751.
10.1111/dom.15592
CAS PubMed Web of Science® Google Scholar
42B. R. Shah, P. C. Austin, N. M. Ivers, et al., “Risk Prediction Scores for Type 2 Diabetes Microvascular and Cardiovascular Complications Derived and Validated With Real-World Data From 2 Provinces: The DIabeteS COmplications (DISCO) Risk Scores,” Canadian Journal of Diabetes 48 (2024): 188–194.
10.1016/j.jcjd.2023.12.009
PubMed Google Scholar
43X. Tusongtuoheti, Y. Shu, G. Huang, and Y. Mao, “Predicting the Risk of Subclinical Atherosclerosis Based on Interpretable Machine Models in a Chinese T2DM Population,” Frontiers in Endocrinology (Lausanne) 15 (2024): 1332982.
10.3389/fendo.2024.1332982
PubMed Google Scholar
44H. Sang, H. Lee, M. Lee, et al., “Prediction Model for Cardiovascular Disease in Patients With Diabetes Using Machine Learning Derived and Validated in Two Independent Korean Cohorts,” Scientific Reports 14 (2024): 14966.
10.1038/s41598-024-63798-y
CAS PubMed Google Scholar
45L. M. Raffield, A. J. Cox, F. C. Hsu, et al., “Impact of HDL Genetic Risk Scores on Coronary Artery Calcified Plaque and Mortality in Individuals With Type 2 Diabetes From the Diabetes Heart Study,” Cardiovascular Diabetology 12 (2013): 95.
10.1186/1475-2840-12-95
CAS PubMed Web of Science® Google Scholar
46J. Mei and E. Xia, “Knowledge Learning Symbiosis for Developing Risk Prediction Models From Regional EHR Repositories,” Studies in Health Technology and Informatics 264 (2019): 258–262.
PubMed Google Scholar
47M. W. Segar, M. Vaduganathan, K. V. Patel, et al., “Machine Learning to Predict the Risk of Incident Heart Failure Hospitalization Among Patients With Diabetes: The WATCH-DM Risk Score,” Diabetes Care 42 (2019): 2298–2306.
10.2337/dc19-0587
PubMed Web of Science® Google Scholar
48C. Nowak, A. C. Carlsson, C. J. Östgren, et al., “Multiplex Proteomics for Prediction of Major Cardiovascular Events in Type 2 Diabetes,” Diabetologia 61 (2018): 1748–1757.
10.1007/s00125-018-4641-z
CAS PubMed Web of Science® Google Scholar
49R. Jiang, M. B. Schulze, T. Li, et al., “Non-HDL Cholesterol and Apolipoprotein B Predict Cardiovascular Disease Events Among Men With Type 2 Diabetes,” Diabetes Care 27 (2004): 1991–1997.
10.2337/diacare.27.8.1991
CAS PubMed Web of Science® Google Scholar
50Y. Lin, H. Shao, L. Shi, A. H. Anderson, and V. Fonseca, “Predicting Incident Heart Failure Among Patients With Type 2 Diabetes Mellitus: The DM-CURE Risk Score,” Diabetes, Obesity & Metabolism 24 (2022): 2203–2211.
10.1111/dom.14806
CAS PubMed Web of Science® Google Scholar
51S.-H. Lee, K. Han, H.-S. Kim, J. H. Cho, K. H. Yoon, and M. K. Kim, “Predicting the Development of Myocardial Infarction in Middle-Aged Adults With Type 2 Diabetes: A Risk Model Generated From a Nationwide Population-Based Cohort Study in Korea,” Endocrinology and Metabolism 35 (2020): 636–646.
10.3803/EnM.2020.704
CAS Google Scholar
52M. Woodward, Y. Hirakawa, A. P. Kengne, et al., “Prediction of 10-Year Vascular Risk in Patients With Diabetes: The AD-ON Risk Score,” Diabetes, Obesity & Metabolism 18 (2016): 289–294.
10.1111/dom.12614
CAS PubMed Web of Science® Google Scholar
53C. Wan, S. Read, H. Wu, et al., “Prediction of Five-Year Cardiovascular Disease Risk in People With Type 2 Diabetes Mellitus: Derivation in Nanjing, China and External Validation in Scotland, UK,” Global Heart 17 (2022): 46.
10.5334/gh.1131
PubMed Google Scholar
54G. F. N. Berkelmans, S. Gudbjörnsdottir, F. L. J. Visseren, et al., “Prediction of Individual Life-Years Gained Without Cardiovascular Events From Lipid, Blood Pressure, Glucose, and Aspirin Treatment Based on Data of More Than 500 000 Patients With Type 2 Diabetes Mellitus,” European Heart Journal 40 (2019): 2899–2906.
10.1093/eurheartj/ehy839
CAS PubMed Web of Science® Google Scholar
55S. Lee, J. Zhou, C. L. Guo, et al., “Predictive Scores for Identifying Patients With Type 2 Diabetes Mellitus at Risk of Acute Myocardial Infarction and Sudden Cardiac Death,” Endocrinology, Diabetes & Metabolism 4 (2021): 4.
10.1002/edm2.240
Google Scholar
56S. Sadeghpour, E. Faghihimani, M. Amini, and M. Mansourian, “Predictors of All-Cause and Cardiovascular-Specific Mortality in Type 2 Diabetes: A Competing Risk Modeling of an Iranian Population,” Advanced Biomedical Research 5 (2016): 82.
10.4103/2277-9175.182213
PubMed Google Scholar
57B. A. Williams, J. C. Blankenship, S. Voyce, J. M. Cordova, P. Gandhi, and S. S. Shetty, “Quantifying the Risk Continuum for Cardiovascular Death in Adults With Type 2 Diabetes,” Canadian Journal of Diabetes 45 (2021): 650–658.
10.1016/j.jcjd.2021.01.009
PubMed Google Scholar
58J. Quan, D. Pang, T. K. Li, et al., “Risk Prediction Scores for Mortality, Cerebrovascular, and Heart Disease Among Chinese People With Type 2 Diabetes,” Journal of Clinical Endocrinology and Metabolism 104 (2019): 5823–5830.
10.1210/jc.2019-00731
PubMed Web of Science® Google Scholar
59M. R. Maracy, S. M. Hosseini, and M. Amini, “Risk Scores in Ischaemic Heart Disease Patients With Type 2 Diabetes, Isfahan, Iran,” Acta Cardiologica 63 (2008): 729–734.
10.2143/AC.63.6.2033390
PubMed Google Scholar
60H. Chu, L. Chen, X. Yang, et al., “Roles of Anxiety and Depression in Predicting Cardiovascular Disease Among Patients With Type 2 Diabetes Mellitus: A Machine Learning Approach,” Frontiers in Psychology 12 (2021): 645418.
10.3389/fpsyg.2021.645418
PubMed Google Scholar
61R. J. Stevens, V. Kothari, A. I. Adler, I. M. Stratton, and United Kingdom Prospective Diabetes Study (UKPDS) Group, “The UKPDS Risk Engine: A Model for the Risk of Coronary Heart Disease in Type II Diabetes (UKPDS 56),” Clinical Science 101 (2001): 671–679.
10.1042/cs1010671
CAS PubMed Web of Science® Google Scholar
62A. J. Hayes, J. Leal, A. M. Gray, R. R. Holman, and P. M. Clarke, “UKPDS Outcomes Model 2: A New Version of a Model to Simulate Lifetime Health Outcomes of Patients With Type 2 Diabetes Mellitus Using Data From the 30 Year United Kingdom Prospective Diabetes Study: UKPDS 82,” Diabetologia 56 (2013): 1925–1933.
10.1007/s00125-013-2940-y
CAS PubMed Web of Science® Google Scholar
63J. Hoogland, M. van Barreveld, T. P. A. Debray, et al., “Handling Missing Predictor Values When Validating and Applying a Prediction Model to New Patients,” Statistics in Medicine 39 (2020): 3591–3607.
10.1002/sim.8682
PubMed Web of Science® Google Scholar
64A. Tsvetanova, M. Sperrin, N. Peek, I. Buchan, S. Hyland, and G. P. Martin, “Missing Data Was Handled Inconsistently in UK Prediction Models: A Review of Method Used,” Journal of Clinical Epidemiology 140 (2021): 149–158.
10.1016/j.jclinepi.2021.09.008
PubMed Web of Science® Google Scholar
65P. Vercherin, C. Gutknecht, F. Guillemin, R. Ecochard, L. I. Mennen, and M. Mercier, “Missing Data Mechanisms of the Questionnaire SF-36's Items in the SU.VI.MAX Study,” Revue d'Épidémiologie et de Santé Publique 51 (2003): 513–525.
CAS PubMed Google Scholar
66J. H. Lee and J. C. Huber, “Evaluation of Multiple Imputation With Large Proportions of Missing Data: How Much Is Too Much?,” Iranian Journal of Public Health 50 (2021): 1372–1380.
PubMed Web of Science® Google Scholar
67M. W. Heymans and J. W. R. Twisk, “Handling Missing Data in Clinical Research,” Journal of Clinical Epidemiology 151 (2022): 185–188.
10.1016/j.jclinepi.2022.08.016
PubMed Web of Science® Google Scholar
68R. J. A. Little and D. B. Rubin, Statistical Analysis With Missing Data, 3rd ed. Wiley Series in Probability and Statistics (Hoboken, NJ: John Wiley & Sons, Inc, 2019).
10.1002/9781119482260
Google Scholar
69S. Nijman, A. M. Leeuwenberg, I. Beekers, et al., “Missing Data Is Poorly Handled and Reported in Prediction Model Studies Using Machine Learning: A Literature Review,” Journal of Clinical Epidemiology 142 (2022): 218–229.
10.1016/j.jclinepi.2021.11.023
PubMed Web of Science® Google Scholar
70G. J. van der Heijden, A. R. Donders, T. Stijnen, and K. G. Moons, “Imputation of Missing Values Is Superior to Complete Case Analysis and the Missing-Indicator Method in Multivariable Diagnostic Research: A Clinical Example,” Journal of Clinical Epidemiology 59 (2006): 1102–1109.
10.1016/j.jclinepi.2006.01.015
PubMed Web of Science® Google Scholar
71A. Sharafoddini, J. A. Dubin, D. M. Maslove, and J. Lee, “A New Insight Into Missing Data in Intensive Care Unit Patient Profiles: Observational Study,” JMIR Medical Informatics 7 (2019): e11605.
10.2196/11605
PubMed Google Scholar
72M. van Smeden, R. H. H. Groenwold, and K. G. Moons, “A Cautionary Note on the Use of the Missing Indicator Method for Handling Missing Data in Prediction Research,” Journal of Clinical Epidemiology 125 (2020): 188–190.
10.1016/j.jclinepi.2020.06.007
PubMed Google Scholar

Volume17, Issue1

January 2025

e70049

Overcoming Missing Data: Accurately Predicting Cardiovascular Risk in Type 2 Diabetes, A Systematic Review