Volume 2, Issue 2 pp. 140-147
ORIGINAL ARTICLE
Open Access

Prediction of death rates for cardiovascular diseases and cancers

Oleg Gaidai

Oleg Gaidai

Shanghai Engineering Research Center of Marine Renewable Energy, College of Engineering Science and Technology, Shanghai Ocean University, Shanghai, China

Contribution: Conceptualization (equal)

Search for more papers by this author
Yihan Xing

Corresponding Author

Yihan Xing

Department of Mechanical and Structural Engineering and Materials Science, University of Stavanger, Stavanger, Norway

Correspondence Yihan Xing, Department of Mechanical and Structural Engineering and Materials Science, University of Stavanger, Kjell Arholms gate 41, Stavanger 4021, Norway.

Email: [email protected]

Contribution: Validation (equal)

Search for more papers by this author
Rajiv Balakrishna

Rajiv Balakrishna

Department of Mechanical and Structural Engineering and Materials Science, University of Stavanger, Stavanger, Norway

Contribution: ​Investigation (equal)

Search for more papers by this author
Jiayao Sun

Jiayao Sun

School of Naval Architecture & Ocean Engineering, Jiangsu University of Science and Technology, Zhenjiang, China

Contribution: Methodology (equal)

Search for more papers by this author
Xiaolong Bai

Xiaolong Bai

School of Naval Architecture & Ocean Engineering, Jiangsu University of Science and Technology, Zhenjiang, China

Contribution: Validation (equal)

Search for more papers by this author
First published: 09 February 2023
Citations: 17

Abstract

Background

To estimate cardiovascular and cancer death rates by regions and time periods.

Design

Novel statistical methods were used to analyze clinical surveillance data.

Methods

A multicenter, population-based medical survey was performed. Annual recorded deaths from cardiovascular diseases were analyzed for all 195 countries of the world. It is challenging to model such data; few mathematical models can be applied because cardiovascular disease and cancer data are generally not normally distributed.

Results

A novel approach to assessing the biosystem reliability is introduced and has been found to be particularly suitable for analyzing multiregion environmental and healthcare systems. While traditional methods for analyzing temporal observations of multiregion processes do not deal with dimensionality efficiently, our methodology has been shown to be able to cope with this challenge.

Conclusions

Our novel methodology can be applied to public health and clinical survey data.

Abbreviations

  • CVD
  • cardiovascular disease
  • MDOF
  • multidegree of freedom
  • 1 BACKGROUND

    Cardiovascular disease (CVD) refers to a range of diseases affecting the heart and blood vessels including hypertension (high blood pressure), coronary heart disease and heart attacks, cerebrovascular diseases (e.g., stroke and heart failure), and various other heart diseases. Cancers are defined by the National Cancer Institute as diseases in which abnormal cells can divide and infiltrate nearby tissues. Cancers can arise in many parts of the body; thus, there is a wide range of cancer types, as shown below, some of which spread to other parts of the body through the blood and lymph systems. CVD and cancer are the leading causes of death worldwide, therefore analyzing bivariate statistics is important. This study is concerned with public health systems rather than health at the level of the individual. The research is not clinical in nature; the goal is to estimate the burden imposed by CVD and cancer on public health systems in different countries at any given time. We analyze mortality literature data for both CVDs [1-8] and cancer [9-29].

    Assessing the reliability of healthcare systems and estimating excess mortality from CVDs using conventional statistical methods are challenging [30-35]. To achieve the latter goal over large areas, degrees of freedom are typically calculated for random variables governing dynamic biological systems. In principle, the reliability of a complex biological system can be accurately estimated if there are sufficient measurements or by using Monte Carlo simulations. For CVDs and cancers, however, data are scarce before 1990 [30]. Against this background, we introduce a novel method for assessing the reliability of biological and healthcare systems, to aid prediction and management of excess mortality from CVD. This study focused on cross-correlations in CVD and cancer deaths among countries within the same climatic zone. Worldwide health data and related research are readily available online [30].

    Lifetime data analysis with the application of extreme value theory is widespread in the fields of medicine and engineering, [30]. A recent paper presented the arguments for and against using the upper distribution of life expectancy data [1]. A bivariate lifetime distribution is often assumed when analyzing statistical data [3]. A new approach that uses Clayton, Gumbel, and inverse Gaussian power variance functions, as well as conditional sampling and numerical approximation, was applied for survival analysis [2]. However, few studies have aimed to predict excess CVD and cancer mortality; this paper aimed to address this deficit.

    In this paper, excess mortality from CVD is viewed as an unexpected event that may occur in any country at any time. The nondimensional factor urn:x-wiley:27709191:media:cai247:cai247-math-0001 is used to predict CVD risk. Biological systems are influenced by environmental parameters that can be modeled as ergodic processes. The CVD and cancer incidence data for 195 countries during the period 1990–2019 were retrieved [30]. The biological system under consideration herein can be regarded as a multidegree of freedom (MDOF) dynamic system with highly interrelated regional components/dimensions. This study focused on predicting excess mortality rather than symptoms.

    2 METHODS

    Consider an MDOF biosystem subjected to random ergodic environmental influences. The other alternative is to view the process as being dependent on specific environmental parameters whose variation in time may be modeled as an ergodic process on its own. The MDOF biomedical response vector process urn:x-wiley:27709191:media:cai247:cai247-math-0002 is measured and/or simulated over a sufficiently long time interval urn:x-wiley:27709191:media:cai247:cai247-math-0003. Unidimensional global maxima over the entire time span urn:x-wiley:27709191:media:cai247:cai247-math-0004 are denoted as urn:x-wiley:27709191:media:cai247:cai247-math-0005, urn:x-wiley:27709191:media:cai247:cai247-math-0006, urn:x-wiley:27709191:media:cai247:cai247-math-0007. By sufficiently long time urn:x-wiley:27709191:media:cai247:cai247-math-0008, one primarily means a large value of urn:x-wiley:27709191:media:cai247:cai247-math-0009 with respect to the dynamic system autocorrelation time.

    Let urn:x-wiley:27709191:media:cai247:cai247-math-0010 be consequent in the time local maxima of the bioprocess urn:x-wiley:27709191:media:cai247:cai247-math-0011 at monotonously increasing discrete time instants urn:x-wiley:27709191:media:cai247:cai247-math-0012 in urn:x-wiley:27709191:media:cai247:cai247-math-0013. The analogous definition follows for other MDOF biological system response components urn:x-wiley:27709191:media:cai247:cai247-math-0014 with urn:x-wiley:27709191:media:cai247:cai247-math-0015 urn:x-wiley:27709191:media:cai247:cai247-math-0016, and so on. For simplicity, all urn:x-wiley:27709191:media:cai247:cai247-math-0017 components, and therefore, its maxima are assumed to be nonnegative. The aim is to estimate system failure probability
    urn:x-wiley:27709191:media:cai247:cai247-math-0018()
    with
    urn:x-wiley:27709191:media:cai247:cai247-math-0019()
    being the probability of nonexceedance for response components urn:x-wiley:27709191:media:cai247:cai247-math-0020, urn:x-wiley:27709191:media:cai247:cai247-math-0021, urn:x-wiley:27709191:media:cai247:cai247-math-0022, … critical values; urn:x-wiley:27709191:media:cai247:cai247-math-0023 denotes logical unity operation «or»; and urn:x-wiley:27709191:media:cai247:cai247-math-0024 being joint probability density of the global maxima over the entire time span urn:x-wiley:27709191:media:cai247:cai247-math-0025.

    In practice, however, it is not feasible to estimate the latter joint probability distribution directly urn:x-wiley:27709191:media:cai247:cai247-math-0026 due to its high dimensionality and available data set limitations. In other words, the time instant when either urn:x-wiley:27709191:media:cai247:cai247-math-0027 exceeds, urn:x-wiley:27709191:media:cai247:cai247-math-0028 exceeds, urn:x-wiley:27709191:media:cai247:cai247-math-0029 exceeds, and so on, the system is regarded as immediately failed. Fixed failure levels urn:x-wiley:27709191:media:cai247:cai247-math-0030, urn:x-wiley:27709191:media:cai247:cai247-math-0031, urn:x-wiley:27709191:media:cai247:cai247-math-0032, … are, of course, individual for each unidimensional response component of urn:x-wiley:27709191:media:cai247:cai247-math-0033. urn:x-wiley:27709191:media:cai247:cai247-math-0034, urn:x-wiley:27709191:media:cai247:cai247-math-0035, urn:x-wiley:27709191:media:cai247:cai247-math-0036, and so on, see Naess and Gaidai [32] and Naess and Moan [49].

    Next, the local maxima temporal instants urn:x-wiley:27709191:media:cai247:cai247-math-0037 in monotonously nondecreasing order being sorted into one single merged synthetic time vector urn:x-wiley:27709191:media:cai247:cai247-math-0038. Note that urn:x-wiley:27709191:media:cai247:cai247-math-0039, urn:x-wiley:27709191:media:cai247:cai247-math-0040. In this case, urn:x-wiley:27709191:media:cai247:cai247-math-0041 represents the local maxima of one of the MDOF biosystem response components either urn:x-wiley:27709191:media:cai247:cai247-math-0042, urn:x-wiley:27709191:media:cai247:cai247-math-0043, or urn:x-wiley:27709191:media:cai247:cai247-math-0044, and so on. That means that having urn:x-wiley:27709191:media:cai247:cai247-math-0045 time record, one just needs to continuously and simultaneously screen for unidimensional response component local maxima and record its exceedance of the MDOF limit vector urn:x-wiley:27709191:media:cai247:cai247-math-0046 in any of its components urn:x-wiley:27709191:media:cai247:cai247-math-0047. The local unidimensional response component maxima are merged into one temporal nondecreasing vector urn:x-wiley:27709191:media:cai247:cai247-math-0048 in accordance with the merged time vector urn:x-wiley:27709191:media:cai247:cai247-math-0049. That is to say, each local maxima urn:x-wiley:27709191:media:cai247:cai247-math-0050 is the actual encountered local maxima corresponding to either urn:x-wiley:27709191:media:cai247:cai247-math-0051, urn:x-wiley:27709191:media:cai247:cai247-math-0052, or urn:x-wiley:27709191:media:cai247:cai247-math-0053, and so on. Finally, the unified limit vector urn:x-wiley:27709191:media:cai247:cai247-math-0054 is introduced with each component urn:x-wiley:27709191:media:cai247:cai247-math-0055 is either urn:x-wiley:27709191:media:cai247:cai247-math-0056, urn:x-wiley:27709191:media:cai247:cai247-math-0057, or urn:x-wiley:27709191:media:cai247:cai247-math-0058 and so on, depending on which of urn:x-wiley:27709191:media:cai247:cai247-math-0059 or urn:x-wiley:27709191:media:cai247:cai247-math-0060 or urn:x-wiley:27709191:media:cai247:cai247-math-0061, and so forth, corresponds to the current local maxima with the running index urn:x-wiley:27709191:media:cai247:cai247-math-0062.

    Next, a scaling parameter urn:x-wiley:27709191:media:cai247:cai247-math-0063 is introduced to artificially simultaneously decreases limit values for all biosystem response components, namely, the new MDOF limit vector urn:x-wiley:27709191:media:cai247:cai247-math-0064 with urn:x-wiley:27709191:media:cai247:cai247-math-0065, urn:x-wiley:27709191:media:cai247:cai247-math-0066, urn:x-wiley:27709191:media:cai247:cai247-math-0067, … is introduced. The unified limit vector urn:x-wiley:27709191:media:cai247:cai247-math-0068 introduced with each component urn:x-wiley:27709191:media:cai247:cai247-math-0069 is either urn:x-wiley:27709191:media:cai247:cai247-math-0070, urn:x-wiley:27709191:media:cai247:cai247-math-0071, or urn:x-wiley:27709191:media:cai247:cai247-math-0072 and so on. The latter automatically defines probability urn:x-wiley:27709191:media:cai247:cai247-math-0073 as a function of urn:x-wiley:27709191:media:cai247:cai247-math-0074; note that urn:x-wiley:27709191:media:cai247:cai247-math-0075 from Equation (1). Nonexceedance probability urn:x-wiley:27709191:media:cai247:cai247-math-0076 can be now estimated as follows:
    urn:x-wiley:27709191:media:cai247:cai247-math-0078()
    In practice, the dependency between neighboring urn:x-wiley:27709191:media:cai247:cai247-math-0079 values is not always negligible; thus, the following one-step (i.e., “conditioning level”; urn:x-wiley:27709191:media:cai247:cai247-math-0080) memory approximation is introduced
    urn:x-wiley:27709191:media:cai247:cai247-math-0081()
    for urn:x-wiley:27709191:media:cai247:cai247-math-0082 (called here conditioning level urn:x-wiley:27709191:media:cai247:cai247-math-0083). Approximation being introduced by Equation (4) may be further expressed as
    urn:x-wiley:27709191:media:cai247:cai247-math-0084()
    where urn:x-wiley:27709191:media:cai247:cai247-math-0085 (will be called conditioning level urn:x-wiley:27709191:media:cai247:cai247-math-0086) and so on. The motivation is to monitor each independent failure that happened locally first in time, thus avoiding cascading local intercorrelated exceedances [36-48].
    Equation (5) presents subsequent refinements of the statistical independence assumption. The latter type of approximations enables capturing the statistical dependence effect between neighboring maxima with increased accuracy. Since the original MDOF bioprocess urn:x-wiley:27709191:media:cai247:cai247-math-0087 was assumed ergodic and therefore stationary, probability urn:x-wiley:27709191:media:cai247:cai247-math-0088 for urn:x-wiley:27709191:media:cai247:cai247-math-0089 will be independent of urn:x-wiley:27709191:media:cai247:cai247-math-0090 but only dependent on conditioning level urn:x-wiley:27709191:media:cai247:cai247-math-0091. Thus, the nonexceedance probability can be approximated as in the Naess–Gaidai method, see [32, 49], where:
    urn:x-wiley:27709191:media:cai247:cai247-math-0092()
    Note that Equation (6) follows from Equation (1) by neglecting urn:x-wiley:27709191:media:cai247:cai247-math-0093, as the design failure probability is usually very small. Further, it is assumed that urn:x-wiley:27709191:media:cai247:cai247-math-0094. Note that Equation (5) is similar to the well-known mean up-crossing rate equation for the probability of exceedance [32, 49]. There is observed convergence with respect to conditioning parameter urn:x-wiley:27709191:media:cai247:cai247-math-0095
    urn:x-wiley:27709191:media:cai247:cai247-math-0096()
    Note that Equation (6) for urn:x-wiley:27709191:media:cai247:cai247-math-0097 turns into the quite well-known nonexceedance probability relationship with the mean up-crossing rate function
    urn:x-wiley:27709191:media:cai247:cai247-math-0098()
    where urn:x-wiley:27709191:media:cai247:cai247-math-0099 is the mean up-crossing rate of the response level urn:x-wiley:27709191:media:cai247:cai247-math-0100 for the above assembled nondimensional vector urn:x-wiley:27709191:media:cai247:cai247-math-0101 assembled from scaled MDOF biosystem response urn:x-wiley:27709191:media:cai247:cai247-math-0102. The proposed methodology can also treat nonstationary cases. An illustration of how the methodology can be used to treat nonstationary cases is provided as follows. Consider a scattered diagram of urn:x-wiley:27709191:media:cai247:cai247-math-0103 bioenvironmental states, with each short-term bioenvironmental state having probability urn:x-wiley:27709191:media:cai247:cai247-math-0104 so that urn:x-wiley:27709191:media:cai247:cai247-math-0105. The corresponding long-term equation is then
    urn:x-wiley:27709191:media:cai247:cai247-math-0106()
    with urn:x-wiley:27709191:media:cai247:cai247-math-0107 being the same function as in Equation (7) but corresponding to a specific short-term environmental state with the number urn:x-wiley:27709191:media:cai247:cai247-math-0108. Note that this statistical model has already been validated [47, 50-52].

    3 RESULTS

    Prediction of CVD and cancer has long been a target in the fields of epidemiology and mathematical biology. Public health systems are dynamic, highly nonlinear, multidimensional, and spatially diverse systems that are challenging to analyze. Previous studies have used a variety of approaches to predict CVD and cancer cases. In this section, the above-described methodology is applied to real-world CVD data sets for all countries of the world.

    The statistical data in the present section are from the “Our World in Data” website [30], which provides annual CVD death rates for all countries for the period 1990–2019. The death rates for the 195 countries (components urn:x-wiley:27709191:media:cai247:cai247-math-0109) constitute 195 dimensional (195D) data for a dynamic biological system.

    General failure limits (urn:x-wiley:27709191:media:cai247:cai247-math-0110), that is, CVD thresholds, are less intuitive than setting failure limits for each individual country according to its population, such that urn:x-wiley:27709191:media:cai247:cai247-math-0111 are equal to the annual death rate of a given country. The death rate for cancer is lower than that for CVD, but it is typically more painful to die from cancer. In this paper, the “failure limit” for cancer is lowered fourfold to match that for CVD.

    Next, the local maxima from all nondimensionalized time series data are merged into a single time series using Equation (5):
    urn:x-wiley:27709191:media:cai247:cai247-math-0112()

    Each maximum, such as urn:x-wiley:27709191:media:cai247:cai247-math-0113, is inserted into single time series according to its temporal occurrence (denoted by subscript urn:x-wiley:27709191:media:cai247:cai247-math-0114).

    Figure 1 presents the annual deaths from CVD and cancer by country and year. Figure 2 presents the number of new deaths as a 195D vector urn:x-wiley:27709191:media:cai247:cai247-math-0115. Data for Uzbekistan were excluded from the analysis because they were regarded as outliers. urn:x-wiley:27709191:media:cai247:cai247-math-0116 was assembled from different regional components, that is, CVD data sets. Index urn:x-wiley:27709191:media:cai247:cai247-math-0117 is a running index of local maxima encountered in the “non-decreasing” time series.

    Details are in the caption following the image
    Annual deaths from cardiovascular disease and cancer as a percentage of the population for 195 countries.
    Details are in the caption following the image
    Left: Cross-correlations between cardiovascular disease (CVD) and cancer cases as a percentage of the population. Right: Annual death rates as a 195-dimensional vector urn:x-wiley:27709191:media:cai247:cai247-math-0118, as a percentage of the population of the corresponding country. The cancer rate was increased fourfold to match that of CVD.

    Overall, there is a clear East–West divide in the CVD death rates. Rates across North America and Western/Northern Europe tended to be lower than those across Eastern Europe, Asia, and Africa. For most of Latin America, the rates were moderate. As an example, in France, the age-standardized CVD death rate was around 86 per 100,000 in 2017, while across Eastern Europe, it was around five times higher (400–500 per 100,000). Uzbekistan had the highest rate of 724 per 100,000.

    Figure 3 presents the predicted annual CVD death rates (percentage relative to the entire population of a given country) over 100 years, extrapolated from Equation (10). urn:x-wiley:27709191:media:cai247:cai247-math-0119 was used as a cut-off value. The 95% confidence intervals (CIs) were calculated. According to Equation (5), urn:x-wiley:27709191:media:cai247:cai247-math-0120 is directly related to the target failure probability (urn:x-wiley:27709191:media:cai247:cai247-math-0121) derived from Equation (1). Therefore, system failure probability can be estimated as urn:x-wiley:27709191:media:cai247:cai247-math-0122. Note that, in Equation (6), urn:x-wiley:27709191:media:cai247:cai247-math-0123 corresponds to the total number of local maxima in response vector urn:x-wiley:27709191:media:cai247:cai247-math-0124. Conditioning parameter urn:x-wiley:27709191:media:cai247:cai247-math-0125 was found to be sufficient because of the convergence of urn:x-wiley:27709191:media:cai247:cai247-math-0126 (see Equation 6). In Figure 3, the 95% CIs are relatively narrow, which represents an advantage of the proposed method. Table 1 compares 100-year predictions based on data for 15- and 30-year periods. The 15-year data set was derived from the full 30-year data set by omitting odd years. The 95% CIs were wider for the truncated data set, as expected.

    Details are in the caption following the image
    Death rate predictions over 100 years extrapolated from urn:x-wiley:27709191:media:cai247:cai247-math-0127. The critical level is indicated by a star. The 95% confidence intervals are indicated by dotted lines. The percentage of the population is represented by the horizontal axis. Left: Predictions based on 30 years of data; Right: predictions based on 15 years of data.
    Table 1. Predicted cardiovascular disease death rates over 100 years based on 30- and 15-year data sets.
    Predicted death rate (%) 95% CI, lower bound 95% CI, upper bound
    30-year data set 0.942 0.909 0.966
    15-year data set 0.914 0.879 0.949
    • Abbreviation: CI, confidence interval.

    The predicted average annual CVDs over the next 100 years, among all years and countries, were found below 1%. Our methodology uses available data efficiently by assuming that healthcare system data sets are multidimensional and extrapolates death rates even when the data set is relatively limited. The predicted nondimensional factor urn:x-wiley:27709191:media:cai247:cai247-math-0128, indicated by the star in Figure 3, represents the probability of excess CVD mortality for any given country. Our method could be applied to predict cancer clusters, rather than merely death rates over time, which would be of high practical importance.

    4 CONCLUSIONS

    Traditional methods for assessing the reliability of healthcare systems on the basis of time series data do not efficiently deal with systems characterized by high dimensionality and cross-correlations. The main advantage of our methodology is its ability to assess the reliability of high-dimensional nonlinear dynamic systems. Despite its simplicity, the novel multidimensional modeling strategy introduced herein can be used for accurate forecasting of CVD death rates in individual countries.

    We analyzed 195D data, that is, CVD and cancer death rates for 195 countries worldwide, for the period 1990–2019. A novel method for analyzing the reliability of a multidimensional biosystem was applied and the mechanisms of the proposed method were described in detail. Direct measurements and Monte Carlo simulations are both suitable for assessing the reliability of dynamic biological systems; however, the complexity and high dimensionality of such systems necessitate the further development of robust and accurate techniques that can use limited data sets in an efficient manner.

    This study predicted an average annual death rate for CVD over a 100-year period of about 1% across countries and years. Under current national health management approaches, CVDs will continue to represent a threat to the health of the world population.

    This study introduced a general-purpose, robust, and easy-to-apply method for analyzing the reliability of multidimensional systems. The method has previously been validated by application to a wide range of simulation models but only in the context of one-dimensional systems; in general, highly accurate predictions were obtained. Both measurement and numerically simulated time series data can be analyzed. Applying the method to the data set used in this study yielded reasonable confidence intervals, indicating that it could serve as a useful tool for reliability studies of various nonlinear dynamic biological systems. Finally, the suggested methodology has many potential public health applications beyond the prediction of CVD death rates.

    AUTHOR CONTRIBUTIONS

    Oleg Gaidai: Conceptualization (equal). Yihan Xing: Validation (equal). Rajiv Balakrishna: Investigation (equal). Jiayao Sun: Methodology (equal). Xiaolong Bai: Validation (equal).

    ACKNOWLEDGMENTS

    None.

      CONFLICT OF INTEREST STATEMENT

      The authors declare no conflict of interest.

      ETHICS STATEMENT

      Not applicable.

      INFORMED CONSENT

      Not applicable.

      DATA AVAILABILITY STATEMENT

      Data sets analyzed during the current study are available online at https://ourworldindata.org/causes-of-death (“Our World in Data” [30]).

        The full text of this article hosted at iucr.org is unavailable due to technical difficulties.