Volume 32, Issue 9 pp. 1032-1048
ORIGINAL ARTICLE
Open Access

Assessing heterogeneity of electronic health-care databases: A case study of background incidence rates of venous thromboembolism

Martin Russek

Martin Russek

Data Analytics and Methods Task Force, European Medicines Agency, Amsterdam, The Netherlands

Search for more papers by this author
Chantal Quinten

Corresponding Author

Chantal Quinten

Data Analytics and Methods Task Force, European Medicines Agency, Amsterdam, The Netherlands

Correspondence

Chantal Quinten, European Medicines Agency Domenico Scarlattilaan 6 1083 Amsterdam, The Netherlands.

Email: [email protected]

Search for more papers by this author
Valentijn M. T. de Jong

Valentijn M. T. de Jong

Data Analytics and Methods Task Force, European Medicines Agency, Amsterdam, The Netherlands

Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands

Search for more papers by this author
Catherine Cohet

Catherine Cohet

Data Analytics and Methods Task Force, European Medicines Agency, Amsterdam, The Netherlands

Search for more papers by this author
Xavier Kurz

Xavier Kurz

Data Analytics and Methods Task Force, European Medicines Agency, Amsterdam, The Netherlands

Search for more papers by this author
First published: 17 April 2023
Citations: 1

Abstract

Purpose

Heterogeneous results from multi-database studies have been observed, for example, in the context of generating background incidence rates (IRs) for adverse events of special interest for SARS-CoV-2 vaccines. In this study, we aimed to explore different between-database sources of heterogeneity influencing the estimated background IR of venous thromboembolism (VTE).

Methods

Through forest plots and random-effects models, we performed a qualitative and quantitative assessment of heterogeneity of VTE background IR derived from 11 databases from 6 European countries, using age and gender stratified background IR for the years 2017–2019 estimated in two studies. Sensitivity analyses were performed to assess the impact of selection criteria on the variability of the reported IR.

Results

A total of 54 257 284 subjects were included in this study. Age–gender pooled VTE IR varied from 5 to 421/100 000 person-years and IR increased with increasing age for both genders. Wide confidence intervals (CIs) demonstrated considerable within-data-source heterogeneity. Selecting databases with similar characteristics had only a minor impact on the variability as shown in forest plots and the magnitude of the I2 statistic, which remained large. Solely including databases with primary care and hospital data resulted in a noticeable decrease in heterogeneity.

Conclusions

Large variability in IR between data sources and within age group and gender strata warrants the need for stratification and limits the feasibility of a meaningful pooled estimate. A more detailed knowledge of the data characteristics, operationalisation of case definitions and cohort population might support an informed choice of the adequate databases to calculate reliable estimates.

Key Points

  1. Using a multi-database approach provides a more accurate picture of true IRs, as there may be large clinical differences underpinning the variability in the estimates across different databases.
  2. After mitigating unwanted heterogeneity through harmonization of database characteristics, there might still be some heterogeneity present, but this should be considered as a source of knowledge; our study confirmed prior knowledge that VTE backgrounds IRs were different dependents on the age and gender of the individual.
  3. The level of the heterogeneity in estimates depends on differences in database characteristics. In our study, databases collecting data from different parts of the health-care systems were the largest contributors to heterogeneity in estimates.
  4. When heterogeneity is present, a careful trade-off has to be made for the choice of IR, between stratified estimates or a pooled estimate, to support use in pharmacoepidemiological and regulatory evaluation.
  5. To attenuate heterogeneity, a pre-screening of database characteristics through a meta-dataset and adequate analytical tools at study design stage might be considered.

Plain Language Summary

Real-world data collected in everyday clinical practice can complement information used in regulatory decision-making and provide evidence to support the benefit-risk assessment of medicines. To improve the added value of real-world data for regulatory decision-making, regulators pool information from multiple databases to provide a more accurate picture of the outcome of interest. However, there is regularly variability, also called heterogeneity, in study outcomes when using data from different databases and this poses challenges for interpretation and communication. In this study, we examined incidence rates of venous thromboembolism, identified as a potential side effect for some COVID-19 vaccines, derived from multiple databases. We investigated how differences in database characteristics might cause variation between rate estimates and concluded that the largest contributor to heterogeneity was the use of data from different health-care settings. Understanding which database characteristics contribute to variability can allow to mitigate variation. This can be done by selecting databases with similar data characteristics, such as harmonised codes to refer to clinical outcomes and comparable selection criteria for participants and by using appropriate statistical methods to analyse the variability. Overall, our study provides an overview of the complexity of real-world evidence and can be used to better understand and analyse sources of variability.

1 INTRODUCTION

In the past two decades, the usage of large health-care databases has increased greatly.1 Regulatory agencies such as the European Medicines Agency (EMA) and the US Food and Drug Administration (FDA) have highlighted the value of real-world data (RWD) in medicines regulation.2, 3 In Europe, the initiation of the Data Analysis and Real World Interrogation Network (DARWIN EU),4 as well as the European Health Data Space are changing the landscape of real-world evidence (RWE) generation towards multi-database studies.

While there are already a number of advantages in using RWD in regulatory decision-making, those benefits can be improved by using more than one data source.5 Trivially, incorporating data from multiple data sources in an analysis will increase sample size. This can be crucial in situations with low event counts, such as for estimating the incidence rates (IRs) of a rare disease. While observational data are more generalizable to the real world than randomised controlled trials, the level of generalizability can be even further increased, by covering a broader and more representative population, thus possibly mitigating selection biases that are specific to single databases and by allowing for the quantification of true differences between populations.

Even though the benefits of using multiple data sources cannot be denied, data from those sources should not be pooled without a preliminary assessment of the suitability of pooling data, due to inherent differences in their characteristics. Simulations have shown increased risks of false-positive and false-negative safety signals when pooling data from multiple databases.6 This heterogeneity can have multiple forms, some of which are desirable for understanding true differences in outcomes event rates, while others make interpretation of results and decision-making regarding selection of suitable background rates for specific purposes such as observed-to-expected analyses for vaccines highly challenging.7

Sources of heterogeneity can be categorized into three types: measurement heterogeneity, information heterogeneity (both may be considered methodological heterogeneity), and true heterogeneity (also called clinical heterogeneity).8 While measurement (e.g., clinical classification systems) and information heterogeneity (e.g., granularity of clinical codes) can generally be considered undesirable, clinical heterogeneity has its value, for example, by improving external validity of results or understanding differences in prescription patterns or impact of risk minimisation measures (RMMs) between different geographical regions, health-care systems or behaviours. However, to understand heterogeneity, it is important to use appropriate tools to detect, report and account for it.

During the COVID-19 pandemic, RWD rapidly provided impactful evidence on safety and effectiveness of therapeutics and vaccines.9 This included the generation of background IR for adverse events of special interest (AESIs) for COVID-19 vaccines.10 Those background rates continue to be used in observed-to-expected analyses to estimate the expected number of cases in the general population prior introduction of COVID-19 vaccination, or during SARS-CoV2 circulation in non-vaccinated populations.

The list of AESIs included the concepts of deep vein thrombosis (DVT) and pulmonary embolism (PE). These two concepts make up the term venous thromboembolism (VTE).11 EMA pharmacovigilance activities identified VTE as a possible adverse event of Jcovden (former COVID-19 Vaccine Janssen)12 and listed VTE as an adverse event of Vaxzevria.13 Background rates were used to calculate the excess number of cases potentially linked with these vaccines. However, the reported background rates showed large differences between EU countries as reflected through national health-care records.14, 15

The objective of this study was to explore data characteristics that trigger heterogeneity in IR through both descriptive and statistical measures, using VTE as a case study. This investigation will further provide support in selecting adequate statistical methods for handling heterogeneity when pooling observations to derive meaningful pooled estimates to support regulatory decision-making.

2 METHODS

2.1 Data

To demonstrate an analytical workflow for handling heterogeneity between databases, we selected VTE, a safety concern listed for a class of COVID-19 vaccines, as a case study.

In order to assess potential adverse reactions related to approve COVID-19 vaccines in the EU, EMA-funded two studies through large research consortia: the ACCESS project with University Medical Center Utrecht and the EU PE&PV Research Network15, 16 and a study by ERASMUS University Medical Center17 to generate aggregated background IRs of AESIs, including VTE. Both consortia reported background IRs from multiple databases stratified by age group and gender, using the same eight age categories. IRs were estimated by dividing the number of incident cases by the total person-time at risk, with individuals entering the study cohort on their first visit after January 1, 2017, and being followed until the outcome, exit from the database or end of the study period. The study period covered 2017–2020. In the ACCESS protocol, the study population included all individuals who were observed in the databases for at least 1 day during the study period (January 1, 2017 to last date available) and who had at least 1 year of data availability before cohort entry, except for individuals <1 year of age with data available since birth. In the ERASMUS protocol, the study population was defined slightly different people observed on January 1 2017, January 1, 2018, or January 1, 2019 had to be observed continuously for at least 365 days with no event before this observation date. Ninety-five percent of CIs were calculated using an exact method described by Ulm.18

Case definitions for VTE were developed by the researchers independently. ACCESS utilized the CodeMapper tool19 to find harmonized definitions across coding systems. The full list of included concepts and details on its generation process is publicly available.20 Through using the OMOP common data model for its analyses, clinical codes in databases to which ERASMUS has access to were mapped to the SNOMED system, ensuring harmonized case definitions. The list of clinical codes included by ERASMUS can be accessed in the ATLAS application.21 Table A1 in the appendix shows included ICD10 codes for both ACCESS and ERASMUS. Only ICD10 codes are shown since both research organisations harmonized their definitions across coding systems. For both PE and DVT, the definition by ACCESS includes a broader range of concepts. For PE, the additional concepts are related to septic PE. The additional concepts for DVT mostly correspond to phlebitis, thrombophlebitis and DVT related to pregnancy. As described in a report by the FDA,22 there are no clear guidance on whether these concepts should be included or not.

A short overview of the databases is provided in Table 1. Further details, including demographic characteristics and total population, are provided in the corresponding published reports.15, 16 All databases are listed in the ENCePP research database, which also shows a list of relevant research publications they have been used in. For three databases (PHARMO, BIFAP, and SIDIAP), the IR had been estimated both on the total population and on the subset of subjects with linked primary care and hospital records (PC-H linkage). For the primary analysis, the total population estimates were used.

TABLE 1. Overview of main characteristics by data source.
Consortium Region Type of data source/health care system: Primary care (PC) or Hospital (H) data Study population Clinical Classification Coding system
PHARMO ACCESS Netherlands H 9 184 832 ICD10
PHARMO ACCESS Netherlands PC and H 496 197 ICPC (PC), ICD10 (H)
BIFAP ACCESS Spain PC 10 266 468 SNOMED, ICD9
BIFAP ACCESS Spain PC and H 4 423  843 SNOMED, ICD9 (PC), ICD10 (H)
SIDIAP ACCESS Catalonia (Spain) PC 6 205 573 ICD10
SIDIAP ACCESS Catalonia (Spain) PC and H 1  758  239 ICD10
FISABIO ACCESS Valencia region (Spain) PC and H 5 886  560 ICD9, ICD10
PEDIANET ACCESS Italy PC 181  290 ICD9
ARS ACCESS Tuscany (Italy) PC and H 3  067  602 ICD9
CPRD GOLD ACCESS United Kingdom PC 4  688  710 READ, SNOMED
CPRD GOLD ERASMUS United Kingdom PC 3  913  071 READ
IQVIA DA Germany ERASMUS Germany PC 8  459  098 ICD10
IQVIA LPD France ERASMUS France PC 3  951  633 ICD10
IPCI ERASMUS Netherlands PC 1  299  288 ICPC
IQVIA LPD Italy ERASMUS Italy PC 1  066  230 ICD9
SIDIAP ERASMUS Catalonia (Spain) PC and H 1  909  814 ICD10
  • Abbreviations: ARS, Agenzia Regionale di Sanita Toscana; BIFAP, Base de Datos para la Investigacion Farmacoepidemiologica en Atencian Primaria; CPRD, clinical practice research datalink; DA, disease analyzer; FISABIO, Fundación para el Fomento de la Investigación Sanitaria y Biomédica de la Comunitat Valenciana; IPCI, integrated primary care function; LPD, longitudinal patient data; SIDIAP, Sistema d'Informació per al Desenvolupament de la Investigació en Atenció Primària.

The data included the years 2017–2019. ACCESS reported IRs by year; hence, rates were pooled based on counts to match the data structure of ERASMUS, who reported only combined estimates for all years. Data from the Danish registries (DCE-AU) were reported only for the years 2010–2013 and thus were not included in the analysis. For the PHARMO database, only data for 2017 and 2019 was reported, due to an error in the imputation of a subset of data for 2018; BIFAP data with hospital linkage were only reported for 2017–2018.

2.2 Analysis

We used forest plots to visualize heterogeneity, displaying estimated IRs as squares with CIs for each database. The size of the squares is proportional to the precision of the estimate.

A random-effects meta-analysis, using the restricted maximum likelihood (REML) estimation method, was performed on the log scale of the IRs to calculate a summary estimate and to quantify the level of heterogeneity, thereby allowing for heterogeneity between databases, which is more realistic than assuming that the true value of the estimand is exactly the same for each database.

To quantify the absolute value of this heterogeneity, we reported estimates of τ2 measuring the ‘dispersion of true effect sizes between studies in terms of the scale of the effect size’23 and I2 measuring what proportion of variation in the observed effects is due to variation in true effects, that is, due to inherent differences between the investigated data sources rather than sampling error.24 Borenstein et al.18 stress that I2 represents a proportion rather than an absolute value. Therefore, we estimated the level of heterogeneity in comparison with statistical variability rather than heterogeneity itself. In addition, a prediction interval was calculated.25 Such an interval combines uncertainty due to sampling variation and due to heterogeneity to provide an approximate range of true values.

Finally, based on the available metadata characteristics of each of the included databases (see Table 1), several supplementary analyses were performed omitting or selecting a set of databases meeting selected criteria, aiming to reduce potential unwanted measurement and information heterogeneity, in order to assess more accurately true differences in subject-level data (i.e., true heterogeneity). These exploratory analyses allowed determining the contribution of each database characteristic to the heterogeneity in IR across databases.

The following supplementary analyses were performed:
  1. Restricting the analysis to a subpopulation; only those databases with linkage between primary care and hospital data can provide information on the influence of the health-care setting (i.e., type of data source). Including data sources with only primary care data might lead to underestimation in case of in-patient diagnosis. Data sources with only hospital data may underestimate the events in case of out-patient diagnosis.
  2. Restricting the analysis to only those databases using the same clinical classification system for diseases. In this study we included only databases that used the International Clinical Classification of Disease (ICD10)26 to diagnose VTE as it is the most widely used vocabulary among the available data sources.
  3. Restricting the analysis to databases with homogeneous case definition. Table A1 in the appendix specifies the ICD10 codes used to diagnose VTE in the two studies. We performed separate analyses by study, that is, ACCESS and ERASMUS, to explore differences in case definitions and population selection criteria.
  4. The analyses were performed using the R software27 package meta.28

3 RESULTS

A total of 60 080 169 subjects contributed to the 13 databases. See Tables A2 and A3 for an overview of the reported IRs stratified by age category and gender, by database and by study.

Two databases (CPRD GOLD and SIDIAP) were used in both studies by both consortia. Differences in defining the study cohort resulted in the cohort entry criteria not being identical. After removing the duplicated databases, a total of 54 257 284 subjects derived from 11 databases were included in the main analysis, representing collectively all age and gender subgroups from six countries.

As the first step, we display the age–gender-database-specific IR estimates in a forest plot (Figure 1) using total population estimates. The forest plot showed a relatively large amount of heterogeneity between databases and within strata of age groups and gender. While for the 0–19 age group the IRs appeared to be in the same order of magnitude, with increasing age and increasing IRs the heterogeneity increased. In the 80+ age group, estimates ranged from <100 to >1 000 per 100 000 person-years. There did not seem to be a large difference in heterogeneity between gender categories. Two databases, LPD France and PHARMO, showed considerably lower estimates than the other databases in most age groups. The Dutch IPCI database, on the other hand, showed the highest estimates, especially for younger age groups and women. We generally observed consistent ranking of IR estimates across age groups, that is, across strata estimates are consistently high or low relative to the other databases.

Details are in the caption following the image
Age–gender-stratified IR estimates and 95% CIs* for VTE by database and pooled. *Due to the CIs being too small compared with the size of the square, some of the CIs are not noticeable in the figure.

Next to the age–gender-database-specific IRs, age–gender IRs from meta-analyses were calculated. In our study, the meta-analysis estimated IR of VTE from 5 to 421 per 100 000 person-years depending on age–gender strata. The wide confidence interval of the summary (i.e., pooled) measure identified even within each stratum large patient-level differences. Table A4 in the appendix displays the age–gender IR estimates and CIs of the pooled measure for VTE from meta-analyses.

Figure A1 in the appendix shows the calculated prediction interval for the primary analysis. The prediction intervals for each age–gender group were notably high confirming the substantial population-level heterogeneity observed across data sources.

Table 2 shows the estimated I2 and τ2 values by age–gender stratum. The values for I2 indicated that a majority of the observed variability is due to differences between databases rather than random sampling error. Supporting the impression from the forest plot, we observed an increasing estimate of I2 with increasing age in both sexes. There did not seem to be an age-related trend in the estimates of τ2, but estimates for τ2 appear to be lower for males than for females.

TABLE 2. Age–gender-stratified I2 and τ2 estimates from meta-analyses.
Age Gender I2 τ2
0–19 Female 0.872 0.253
0–19 Male 0.921 0.376
20–29 Female 0.983 0.531
20–29 Male 0.954 0.381
30–39 Female 0.992 0.626
30–39 Male 0.981 0.423
40–49 Female 0.996 0.600
40–49 Male 0.991 0.367
50–59 Female 0.997 0.517
50–59 Male 0.995 0.299
60–69 Female 0.998 0.473
60–69 Male 0.996 0.309
70–79 Female 0.998 0.527
70–79 Male 0.997 0.358
80+ Female 0.998 0.735
80+ Male 0.997 0.625

3.1 Sensitivity analyses

In the first sensitivity analysis, we restricted the databases to those with PC-H linkage (Figure 2). The forest plot demonstrates a relatively large decrease in heterogeneity when restricting the analysis to databases with PC-H linkage, across all age groups. It became apparent that this restriction of databases primarily leads to low estimates being excluded from the analysis. In Table A5 in the appendix, which lists I2 and τ2 estimates for all sensitivity analyses, we noticed lowered I2 estimates especially for younger age groups and considerably lowered τ2 values for all age groups. Figure 3 did not imply any reduction in heterogeneity when restricting the analysis to databases using ICD 10 codes to diagnose VTE. Both range and distribution of estimates were similar to the primary analysis. The same was true for estimates of I2 and τ2. Figure 4a, b showed forest plots of the analysis considering data from ACCESS and ERASMUS separately. Note that due to including the estimates from both the total population and the sub-population with PC-H linkage, there was some dependence between the estimates of BIFAP, SIDIAP, and PHARMO. For ACCESS, the hospital database PHARMO showed far lower estimates than all other databases included. This could be linked to an oversampling of the denominator. Apart from this, visually there seemed to be some reduction in heterogeneity. Data from ERASMUS and ACCESS showed a similar amount of heterogeneity; the forest plots indicate that estimates from ERASMUS are spread more equally, while the PHARMO hospital data differ strongly from the other estimates within ACCESS.

Details are in the caption following the image
Age–gender-database-specific IR estimates and 95% CIs for VTE in databases with PC-H linkage. *Due to the CIs being too small compared with the size of the square, some of the CIs are not noticeable in the figure.
Details are in the caption following the image
Age–gender-database-specific IR estimates and 95% CIs for VTE in databases using ICD10. *Due to the CIs being too small compared with the size of the square, some of the CIs are not noticeable in the figure.
Details are in the caption following the image
a. Age–gender-database specific IR estimates and 95% CIs for VTE in databases from ACCESS. *Due to the CIs being too small compared with the size of the square, some of the CIs are not noticeable in the figure. b. Age–gender-database-specific IR estimates and 95% CIs for VTE in databases from ERASMUS. *Due to the CIs being too small compared with the size of the square, some of the CIs are not noticeable in the figure.

4 DISCUSSION

This study explored heterogeneity in background IRs of VTE reported from 11 data sources spanning six EU countries and derived from two observational studies, by focusing on the database as a source of heterogeneity. Through investigating data source characteristics potentially introducing differences in estimated IRs, our aim was to investigate the amount of unwanted (i.e., methodological) heterogeneity or uncertainty between data sources to provide more valid conclusions for safety surveillance activities. Data sources used in this study were mostly from primary care settings, partly with linkage to hospital data. The study used aggregated background IRs of VTE, considered a relevant AESI for a class of EU-approved COVID-19 vaccines.

Substantial heterogeneity in the background IRs was observed between all included data sources, in addition to observed within-data-source differences across age groups and genders. Age was the main contributor to the heterogeneity as shown in our study. Overall, it was observed that background rates increased with increasing age with no clear pattern in IR between males and females. The observation of increased IRs with increasing age is in line with another study on VTE,29 with the same study also suggesting a difference in IR between genders: IRs increase markedly with age for men and women; the overall age-adjusted IR is higher for men (130 per 100 000) than women (110 per 100 000). The observed heterogeneity in the different age–gender strata is a source of information that leads to a better understanding of the burden of VTE in the general population. However, as demonstrated through the summary estimate and CIs, we still found substantial heterogeneity between data sources within each stratum, suggesting that there still might be unobserved patient-level heterogeneity and therefore a single estimate for each stratum might be inaccurate.

In an attempt to understand the contribution of database characteristics to the reported heterogeneity, we performed several exploratory analyses. Our databases included data derived from both hospital and primary care settings. In all data sources, when estimating background rates, it is important to consider how the population denominator was derived. When linking data between the two settings, depending on the mechanism of linkage, there is a risk of only capturing those subjects that had a hospital visit recorded, which could lead to biased estimates. Restricting the databases that included a link with hospital (PC-H linkage) resulted in a moderate decrease in the reported variability. Alongside, we did not see a decrease in heterogeneity using only databases that used the ICD-10 vocabulary demonstrating that the type of vocabulary used for clinical classification of VTE could not be identified as a major source of heterogeneity. Differences in background rates remained between the two studies even if the time at risk in which the rates were collected and age–gender subgroup definitions and analytical methods were similar. Comparing more closely the methodology applied in the two studies, differences in case definitions were noted, with some clinical codes only included in one of the studies (Table A1 in the appendix). The inclusion and exclusion criteria for individuals also differed, leading to non-identical study populations even when within the same data source. Since we did not find any systematic differences in estimates between the two consortia, it is unlikely that differences in case definition or inclusion criteria had a large influence on observed heterogeneity.

When quantifying heterogeneity using the statistical measure I2, a considerable amount of heterogeneity (close to 100%) is reported. The large values of I2 are not surprising, as the large sample sizes in every database imply small variance estimates. In particular the I2 estimates in the 0–19 age group seem to be influenced by this fact: due to a larger sample size, the variance is lower than for the other age groups, leading to I2 estimates that appear too high in comparison with the other age groups when looking at the forest plot.

In this study, we have calculated pooled estimates for the primary analyses. However, when large heterogeneity is present, focusing on a pooled estimate is not advisable, given that the pooled estimate will derive largely from the particular choice of databases and the relative weights associated with each database.30 Following the classification in Deeks et al.8 it is not advised to combine the estimates if the value of I2 was estimated to be larger than 90%. In addition, when reporting the results after combining estimates, attention needs to be given to uncertainty quantification. In addition to the common risk of misinterpretation for CIs,31 CIs for random-effects meta-analyses are easily misinterpreted to quantify dispersion of study effects. However, CIs only represent the uncertainty in estimating the mean effect size, not taking into account variability due to different database characteristics. This means, that CIs are always smaller than the range of observed estimates.

A major strength of this study is the use of data aggregated from a large number of data sources independently provided by two research consortia using the same calculation method to estimate the IRs. This enabled the exploration of database-specific aspects related to heterogeneity. From 11 databases, 54 257 284 subjects contributed to the main analysis; with the databases spanning a large part of Europe, this can be considered a representative sample of the total population.

The above exploratory analyses show that even when certain database characteristics are harmonized, significant heterogeneity is still present. A limitation of our study is that only aggregated data was available which prevented us from investigating potential sources of heterogeneity attributable to patient characteristics. For instance, comorbidities may have an influence on IRs.32 A final limitation is that clinical validation of the diagnosis codes was not performed in these studies which might have led to different frequencies of misclassification or underreporting of VTE cases, which may affect estimates of heterogeneity.33

This study highlights the challenges regarding the varying levels of available information about database characteristics and the difficulty to identify sufficiently detailed information about the data sources. For example, some differences can only be explored through subject matter expertise about the corresponding health-care systems. Health-care systems might differ between regions, implying possible differences in the probability of recording certain events even in the same health-care setting. The process of clinical coding could also influence the quality of recording, with different levels of quality control or incentives for correct coding. With the large level of observed heterogeneity, an important recommendation is to use the same databases when comparing estimates at different time-points, for example, for pre- or post-exposure IRs of AESI.

The above considerations highlight the necessity for careful assessment of the suitability of databases to include in multi-database studies. In the two studies, a variety of databases was included because many different AESIs were considered simultaneously. For a study on a specific outcome, more specific restrictions on the databases should be placed a priori. In our study we have observed that the type of data source is one of the most important considerations. Based on subject matter knowledge or available validation studies, it should be evaluated in which type of setting the most accurate estimation is possible.

Our analysis also highlights the importance of unified case definitions and inclusion and exclusion cohort criteria to select adequate data sources in multi-database studies. This is evidenced in this study by the CPRD data source used by both study groups to address the same objective, but with a difference of 17% in the total number of individuals included in the study cohort, most likely related to differences in the operationalization of case definitions.

It is important to note that the methods used for detecting and addressing heterogeneity should be specified before starting any meta-analysis. When the forest plot show outliers among observed rates, it can be tempting to exclude the corresponding databases from the analysis without further investigating causes for outliers. This practice is, however, likely to introduce bias and should be avoided in most situations. Criteria for excluding certain databases should be specified prior to performing the analysis, but even then, it is advisable to also present results with the excluded databases, as a sensitivity analysis. In parallel, the choice of method for handling heterogeneity should also be prespecified, conditional of the outcome of the method for detecting heterogeneity. Also, it is preferable that the method for estimating the meta-analysis model and its statistical heterogeneity (e.g., REML), the methods for quantifying CIs (e.g., the Hartung-Knapp and Sidik-Jonkman modifications to the Wald method) and prediction intervals are prespecified.34

More specifically, exploring the level of heterogeneity using multiple databases must be considered if these rates are intended to be used to support safety signal detection activities and to avoid misleading recommendations. One of the current initiatives is the DIVERSE project with the aim to develop guidelines for the identification, collection and reporting of heterogeneity in multi-database studies.35 In addition, EMA's list of metadata for Real World Data catalogues,36 which will be the basis of a catalogue of RWD sources, will provide researchers with standardized, relevant information about databases to use for RWE studies. Another approach would be to develop a set of metrics to measure database heterogeneity or to develop phenotype libraries to identify important variables in different databases. For instance, Ostropolets et al.22 quantify factors influencing IRs through a set of sensitivity analysis using patient level data. Finally, the use of SNOMED CT (systematized nomenclature of medicine – clinical terms),37 a terminology that can cross-map to other classifications and code systems, may reduce the variability among estimates derived from different data sources. Nonetheless, the mapping of original coding systems to SNOMED may not reduce the heterogeneity as such, but may merely conceal possible heterogeneity introduced by different classification systems to operationalize the case definition of VTE. This is evidenced in this study by the ERASMUS data sources, where still large variability in the estimates is seen even when converted to SNOMED.

5 CONCLUSION

Our study revealed large variability in estimated age–gender-stratified background IRs of VTE between different databases, demonstrating the presence of one or several sources of heterogeneity. Restricting the databases to similar health-care settings contributed to less variability in the reported rates. Still, variability was present, triggered most likely by presence of analytical heterogeneity through differences in the case definitions and population cohorts as defined in the protocols used by the two study groups. The use of HARPER (harmonized protocol template to enhance reproducibility)38 to operationalize code definitions will improve the creation of unambiguous clinical codes in studies integrating data from multiple data sources.

Our study can be utilized to better understand the complexity of RWE and to illustrate the importance of a cautious selection of databases, based on their characteristics, so that the observed heterogeneity represents true differences, to ultimately improve the reliability of RWE. Our findings should be considered in context of similar analyses with other databases and in other settings.

AUTHOR CONTRIBUTIONS

All authors meet ICMJE criteria, they have all contributed to the development of the study, as well as writing and approval of the manuscript.

ACKNOWLEDGEMENTS

The authors thank Karin Hedenmalm and Luca Giraldi for their critical review of the manuscript.

    DISCLAIMER

    The views expressed in this article are the personal views of the authors and may not be understood or quoted as being made on behalf of or reflecting the position of the European Medicines Agency or one of its committees or working parties.

    APPENDIX A

    Details are in the caption following the image
    Age–gender-stratified prediction interval* and 95% CIs for VTE by database and pooled. * Due to the prediction interval being too small compared with the size of the square, some of the prediction intervals are not noticeable in the figure.
    TABLE A1. ICD-10 codes used by ACCESS and ERAMUS to diagnose VTE.
    Coding system Code Code name
    ACCESS
    ICD10CM I26 pulmonary (acute) (artery)(vein) thromboembolism
    ICD10CM I26 Pulmonary embolism
    ICD10CM I80 Phlebitis and thrombophlebitis
    ICD10CM I81 Portal vein thrombosis
    ICD10CM I82 Other venous embolism and thrombosis
    ICD10CM O08.2 Embolism following ectopic and molar pregnancy
    ICD10CM O22.3 Deep phlebothrombosis in pregnancy
    ICD10CM O87.1 Deep phlebothrombosis in the puerperium
    Coding system Code Code name
    ERASMUS
    ICD10 I26 Pulmonary embolism
    ICD10CM I26 Pulmonary embolism
    ICD10 I26.0 Pulmonary embolism with mention of acute cor pulmonale
    ICD10CM I26.0 Pulmonary embolism with acute cor pulmonale
    ICD10CM I26.02 Saddle embolus of pulmonary artery with acute cor pulmonale
    ICD10CM I26.09 Other pulmonary embolism with acute cor pulmonale
    ICD10 I26.9 Pulmonary embolism without mention of acute cor pulmonale
    ICD10CM I26.9 Pulmonary embolism without acute cor pulmonale
    ICD10CM I26.92 Saddle embolus of pulmonary artery without acute cor pulmonale
    ICD10CM I26.93 Single subsegmental pulmonary embolism without acute cor pulmonale
    ICD10CM I26.94 Multiple subsegmental pulmonary emboli without acute cor pulmonale
    ICD10CM I26.99 Other pulmonary embolism without acute cor pulmonale
    ICD10CM I80.21 Phlebitis and thrombophlebitis of iliac vein
    ICD10CM I80.219 Phlebitis and thrombophlebitis of unspecified iliac vein
    ICD10 I82.2 Embolism and thrombosis of vena cava
    ICD10CM I82.2 Embolism and thrombosis of vena cava and other thoracic veins
    ICD10CM I82.21 Embolism and thrombosis of superior vena cava
    ICD10CM I82.210 Acute embolism and thrombosis of superior vena cava
    ICD10CM I82.211 Chronic embolism and thrombosis of superior vena cava
    ICD10CM I82.22 Embolism and thrombosis of inferior vena cava
    ICD10CM I82.220 Acute embolism and thrombosis of inferior vena cava
    ICD10CM I82.221 Chronic embolism and thrombosis of inferior vena cava
    ICD10 I82.3 Embolism and thrombosis of renal vein
    ICD10CM I82.3 Embolism and thrombosis of renal vein
    ICD10CM I82.4 Acute embolism and thrombosis of deep veins of lower extremity
    ICD10CM I82.40 Acute embolism and thrombosis of unspecified deep veins of lower extremity
    ICD10CM I82.401 Acute embolism and thrombosis of unspecified deep veins of right lower extremity
    ICD10CM I82.402 Acute embolism and thrombosis of unspecified deep veins of left lower extremity
    ICD10CM I82.403 Acute embolism and thrombosis of unspecified deep veins of lower extremity, bilateral
    ICD10CM I82.409 Acute embolism and thrombosis of unspecified deep veins of unspecified lower extremity
    ICD10CM I82.41 Acute embolism and thrombosis of femoral vein
    ICD10CM I82.411 Acute embolism and thrombosis of right femoral vein
    ICD10CM I82.412 Acute embolism and thrombosis of left femoral vein
    ICD10CM I82.413 Acute embolism and thrombosis of femoral vein, bilateral
    ICD10CM I82.419 Acute embolism and thrombosis of unspecified femoral vein
    ICD10CM I82.42 Acute embolism and thrombosis of iliac vein
    ICD10CM I82.421 Acute embolism and thrombosis of right iliac vein
    ICD10CM I82.422 Acute embolism and thrombosis of left iliac vein
    ICD10CM I82.423 Acute embolism and thrombosis of iliac vein, bilateral
    ICD10CM I82.429 Acute embolism and thrombosis of unspecified iliac vein
    ICD10CM I82.43 Acute embolism and thrombosis of popliteal vein
    ICD10CM I82.431 Acute embolism and thrombosis of right popliteal vein
    ICD10CM I82.432 Acute embolism and thrombosis of left popliteal vein
    ICD10CM I82.433 Acute embolism and thrombosis of popliteal vein, bilateral
    ICD10CM I82.439 Acute embolism and thrombosis of unspecified popliteal vein
    ICD10CM I82.44 Acute embolism and thrombosis of tibial vein
    ICD10CM I82.441 Acute embolism and thrombosis of right tibial vein
    ICD10CM I82.442 Acute embolism and thrombosis of left tibial vein
    ICD10CM I82.443 Acute embolism and thrombosis of tibial vein, bilateral
    ICD10CM I82.449 Acute embolism and thrombosis of unspecified tibial vein
    ICD10CM I82.49 Acute embolism and thrombosis of other specified deep vein of lower extremity
    ICD10CM I82.491 Acute embolism and thrombosis of other specified deep vein of right lower extremity
    ICD10CM I82.492 Acute embolism and thrombosis of other specified deep vein of left lower extremity
    ICD10CM I82.493 Acute embolism and thrombosis of other specified deep vein of lower extremity, bilateral
    ICD10CM I82.499 Acute embolism and thrombosis of other specified deep vein of unspecified lower extremity
    ICD10CM I82.4Y Acute embolism and thrombosis of unspecified deep veins of proximal lower extremity
    ICD10CM I82.4Y1 Acute embolism and thrombosis of unspecified deep veins of right proximal lower extremity
    ICD10CM I82.4Y2 Acute embolism and thrombosis of unspecified deep veins of left proximal lower extremity
    ICD10CM I82.4Y3 Acute embolism and thrombosis of unspecified deep veins of proximal lower extremity, bilateral
    ICD10CM I82.4Y9 Acute embolism and thrombosis of unspecified deep veins of unspecified proximal lower extremity
    ICD10CM I82.4Z Acute embolism and thrombosis of unspecified deep veins of distal lower extremity
    ICD10CM I82.4Z1 Acute embolism and thrombosis of unspecified deep veins of right distal lower extremity
    ICD10CM I82.4Z2 Acute embolism and thrombosis of unspecified deep veins of left distal lower extremity
    ICD10CM I82.4Z3 Acute embolism and thrombosis of unspecified deep veins of distal lower extremity, bilateral
    ICD10CM I82.4Z9 Acute embolism and thrombosis of unspecified deep veins of unspecified distal lower extremity
    ICD10CM I82.5 Chronic embolism and thrombosis of deep veins of lower extremity
    ICD10CM I82.50 Chronic embolism and thrombosis of unspecified deep veins of lower extremity
    ICD10CM I82.501 Chronic embolism and thrombosis of unspecified deep veins of right lower extremity
    ICD10CM I82.502 Chronic embolism and thrombosis of unspecified deep veins of left lower extremity
    ICD10CM I82.503 Chronic embolism and thrombosis of unspecified deep veins of lower extremity, bilateral
    ICD10CM I82.509 Chronic embolism and thrombosis of unspecified deep veins of unspecified lower extremity
    ICD10CM I82.59 Chronic embolism and thrombosis of other specified deep vein of lower extremity
    ICD10CM I82.591 Chronic embolism and thrombosis of other specified deep vein of right lower extremity
    ICD10CM I82.592 Chronic embolism and thrombosis of other specified deep vein of left lower extremity
    ICD10CM I82.593 Chronic embolism and thrombosis of other specified deep vein of lower extremity, bilateral
    ICD10CM I82.599 Chronic embolism and thrombosis of other specified deep vein of unspecified lower extremity
    ICD10CM I82.5Y Chronic embolism and thrombosis of unspecified deep veins of proximal lower extremity
    ICD10CM I82.5Y1 Chronic embolism and thrombosis of unspecified deep veins of right proximal lower extremity
    ICD10CM I82.5Y2 Chronic embolism and thrombosis of unspecified deep veins of left proximal lower extremity
    ICD10CM I82.5Y3 Chronic embolism and thrombosis of unspecified deep veins of proximal lower extremity, bilateral
    ICD10CM I82.5Y9 Chronic embolism and thrombosis of unspecified deep veins of unspecified proximal lower extremity
    ICD10CM I82.62 Acute embolism and thrombosis of deep veins of upper extremity
    ICD10CM I82.621 Acute embolism and thrombosis of deep veins of right upper extremity
    ICD10CM I82.622 Acute embolism and thrombosis of deep veins of left upper extremity
    ICD10CM I82.623 Acute embolism and thrombosis of deep veins of upper extremity, bilateral
    ICD10CM I82.629 Acute embolism and thrombosis of deep veins of unspecified upper extremity
    ICD10CM I82.81 Embolism and thrombosis of superficial veins of lower extremities
    ICD10CM I82.811 Embolism and thrombosis of superficial veins of right lower extremity
    ICD10CM I82.812 Embolism and thrombosis of superficial veins of left lower extremity
    ICD10CM I82.813 Embolism and thrombosis of superficial veins of lower extremities, bilateral
    ICD10CM I82.819 Embolism and thrombosis of superficial veins of unspecified lower extremity
    ICD10CM I82.A Embolism and thrombosis of axillary vein
    ICD10CM I82.A1 Acute embolism and thrombosis of axillary vein
    ICD10CM I82.A11 Acute embolism and thrombosis of right axillary vein
    ICD10CM I82.A12 Acute embolism and thrombosis of left axillary vein
    ICD10CM I82.A13 Acute embolism and thrombosis of axillary vein, bilateral
    ICD10CM I82.A19 Acute embolism and thrombosis of unspecified axillary vein
    ICD10CM I82.B Embolism and thrombosis of subclavian vein
    ICD10CM I82.B1 Acute embolism and thrombosis of subclavian vein
    ICD10CM I82.B11 Acute embolism and thrombosis of right subclavian vein
    ICD10CM I82.B12 Acute embolism and thrombosis of left subclavian vein
    ICD10CM I82.B13 Acute embolism and thrombosis of subclavian vein, bilateral
    ICD10CM I82.B19 Acute embolism and thrombosis of unspecified subclavian vein
    ICD10CM I82.C Embolism and thrombosis of internal jugular vein
    ICD10CM I82.C1 Acute embolism and thrombosis of internal jugular vein
    ICD10CM I82.C11 Acute embolism and thrombosis of right internal jugular vein
    ICD10CM I82.C12 Acute embolism and thrombosis of left internal jugular vein
    ICD10CM I82.C13 Acute embolism and thrombosis of internal jugular vein, bilateral
    ICD10CM I82.C19 Acute embolism and thrombosis of unspecified internal jugular vein
    • a ACCESS included all subcodes, whereas ERASMUS only used the listed codes and not any unlisted codes.
    • b ICD-10 codes not included by ERASMUS as compared to ACCESS are: I26.01, I26.90, I80 and subcodes except I80.21 and I80.219, I81 and subcodes, I82, I82.0, I82.1, I82.29, I82.290, I82.291, I82.51-I82.56 and I82.5Z and subcodes, I82.6-I82.7 and I82.9 and subcodes except I82.62 and subcodes, I82.B2 and I82.C2 and subcodes O08.02, O22.3, O87.1.
    TABLE A2. Age–gender-stratified IRs per 100 000 person-years (with 95% CIs) for VTE for the databases provided by ERASMUS.
    SIDIAP_H IR_upr 8.401 8.558 42.831 34.043 65.147 50.089 83.982 114.790 142.046 214.288 257.495 351.061 479.374 495.524 708.428 611.455
    IR_lwr 3.683 3.935 27.519 20.090 49.371 35.527 66.827 94.446 117.867 184.456 222.594 309.225 425.596 437.671 637.169 528.101
    IR 5.691 5.921 34.551 26.391 56.852 42.344 75.039 104.247 129.535 198.953 239.569 329.646 451.886 465.925 672.090 568.635
    SIDIAP IR_upr 3.965 2.158 26.031 16.999 35.844 33.165 52.101 71.404 80.333 129.702 156.578 209.455 325.792 319.963 506.255 432.463
    IR_lwr 2.114 0.874 19.069 11.348 28.958 26.360 44.267 62.204 69.831 115.971 140.480 189.719 298.792 290.318 469.686 387.497
    IR 2.933 1.412 22.349 13.962 32.264 29.616 48.065 66.685 74.945 122.692 148.365 199.404 312.073 304.870 487.713 409.517
    LPD ITALY IR_upr 18.533 20.361 29.521 35.718 56.030 60.506 108.185 102.275 129.371 173.693 222.919 254.114 482.714 409.460 728.965 633.780
    IR_lwr 3.125 3.973 13.952 15.924 35.379 34.264 83.512 72.635 103.875 136.809 186.667 206.169 424.357 341.139 648.693 522.077
    IR 8.515 9.882 20.679 24.377 44.830 46.019 95.254 86.515 116.101 154.431 204.192 229.207 452.832 374.134 687.952 575.907
    LPD FRANCE IR_upr 3.078 2.431 14.368 12.986 20.481 22.188 28.467 43.741 36.220 65.696 61.570 90.441 100.744 114.658 153.878 150.461
    IR_lwr 0.803 0.474 7.661 5.259 13.299 12.565 20.533 31.514 27.612 51.922 49.968 74.395 83.309 93.896 127.632 118.459
    IR 1.674 1.180 10.629 8.496 16.604 16.875 24.259 37.256 31.699 58.507 55.543 82.125 91.717 103.890 140.297 133.747
    IPCI IR_upr 18.781 9.74 124.661 53.590 213.464 95.729 326.663 207.130 310.978 333.740 424.226 481.229 740.676 603.106 886.734 837.853
    IR_lwr 10.165 3.853 92.163 33.065 171.910 66.879 280.467 168.438 268.142 288.068 369.025 421.215 655.350 522.530 761.507 689.366
    IR 13.990 6.308 107.500 42.416 191.848 80.346 302.907 187.038 288.966 310.275 395.905 450.474 697.035 561.737 822.337 760.906
    DA Germany IR_upr 8.974 4.870 50.190 41.029 78.340 71.979 102.490 121.669 118.153 166.851 175.389 226.173 278.467 278.472 394.823 319.480
    IR_lwr 5.575 2.439 40.803 30.197 67.861 58.776 91.478 106.356 108.009 152.551 161.889 208.809 260.228 258.221 367.633 290.473
    IR 7.126 3.502 45.315 35.305 72.960 65.128 96.867 113.819 112.995 159.581 168.538 217.361 269.231 268.203 381.046 304.717
    CPRD IR_upr 13.048 4.148 94.268 32.978 118.780 90.853 133.897 145.978 156.620 197.799 268.255 324.970 477.566 488.495 717.938 622.249
    IR_lwr 8.636 1.849 77.761 23.299 101.457 74.995 116.096 126.803 138.019 176.771 240.892 294.459 435.655 443.471 652.590 547.731
    IR 10.674 2.831 85.718 27.827 109.863 82.640 124.759 136.138 147.099 187.064 254.297 309.433 456.250 465.575 684.679 584.100
    gender Female Male Female Male Female Male Female Male Female Male Female Male Female Male Female Male
    age_gr 0–19 0–19 20–29 20–29 30–39 30–39 40–49 40–49 50–59 50–59 60–69 60–69 70–79 70–79 80+ 80+
    TABLE A3. Age–gender-stratified IRs per 100  000 person-years (with 95% CIs) for VTE for the databases provided by ACCESS.
    IT_PEDIANET IR_upr 2.781 3.872
    IR_lwr 0 0.018
    IR 0 0.695
    UK_CPRD IR_upr 4.880 3.650 64.120 30.740 128.380 80.630 168.040 140.790 200.070 196.850 325.070 305.510 541.160 416.750 639.560 457.340
    IR_lwr 2.580 1.740 51.520 22.360 111.950 67.700 149.330 123.930 180.140 177.260 296.480 277.500 499.000 377.750 583.400 399.390
    IR 3.600 2.570 57.570 26.300 119.960 73.950 158.480 132.160 189.910 186.860 310.530 291.250 519.760 396.890 610.990 427.630
    SIDIAP IR 2.930 1.410 22.350 13.960 32.260 29.620 48.060 66.690 74.940 122.690 148.370 199.400 312.070 304.870 487.710 409.520
    NL_PHARMO_PCHOSP IR_upr 33.180 26.760 117.900 79.800 216.990 127.630 300.050 224.150 321.150 325.390 436.910 549.590 768.650 613.540 850.010 832.610
    IR_lwr 10.810 7.470 60.230 33.710 136.590 66.960 211.410 148.350 234.340 238.820 327.790 426.980 601.030 459.250 625.020 555.830
    IR 19.780 14.960 85.540 53.180 173.360 93.730 252.860 183.360 275.210 279.620 379.440 485.410 681.000 532.240 731.090 683.890
    NL_PHARMO_HOSP IR_upr 5.490 3.320 14.800 7.100 19.500 13.270 29.410 28.410 40.710 65.520 73.750 109.220 124.250 146.060 87.750 94.100
    IR_lwr 3.240 1.660 10.540 4.110 14.850 9.190 23.840 22.650 34.100 56.820 63.780 96.600 109.440 128.960 75.850 78.940
    IR 4.260 2.390 12.530 5.460 17.050 11.090 26.520 25.410 37.300 61.050 68.630 102.760 116.670 137.310 81.640 86.280
    IT_ARS IR_upr 7.960 9.680 24.190 31.090 63.250 46.430 88.570 96.200 113.730 179.620 221.040 308.280 483.490 521.680 884.070 828.570
    IR_lwr 3.600 4.950 13.700 19.510 47.120 32.580 73.010 79.640 96.410 157.230 194.180 274.950 442.240 475.000 824.970 755.530
    IR 5.470 7.030 18.400 24.800 54.740 39.050 80.510 87.630 104.800 168.140 207.290 291.260 462.520 497.930 854.140 791.420
    ES_SIDIAP_PCHOSP IR_upr 13.200 12.910 66.990 42.210 131.210 83.290 172.120 163.870 253.610 301.970 448.040 467.950 850.560 727.520 1154.280 924.290
    IR_lwr 6.920 6.880 46.800 26.820 106.880 64.330 146.590 139.830 219.720 265.000 397.970 413.990 771.920 648.100 1055.760 811.170
    IR 9.690 9.550 56.220 33.870 118.580 73.350 158.970 151.490 236.210 283.030 422.450 440.350 810.520 686.950 1104.190 866.350
    ES_SIDIAP_PC IR_upr 8.370 7.380 55.390 33.200 102.730 65.050 146.040 131.130 203.940 232.820 370.020 360.970 677.690 515.690 924.290 688.350
    IR_lwr 5.730 4.970 45.710 25.880 91.470 56.240 133.670 119.820 187.910 215.680 345.810 335.930 640.150 479.820 878.070 636.240
    IR 6.960 6.090 50.380 29.370 96.980 60.520 139.750 125.380 195.800 224.130 357.760 348.280 658.720 497.510 900.960 661.910
    ES_FISABIO IR_upr 8.930 11.360 44.490 30.200 84.110 66.510 111.720 122.540 157.240 246.670 303.970 405.680 591.720 614.000 995.190 935.500
    IR_lwr 6.110 8.240 35.450 22.890 73.280 56.930 100.300 110.840 142.870 228.450 282.010 378.810 557.590 576.050 943.420 871.940
    IR 7.420 9.700 39.780 26.350 78.560 61.580 105.900 116.580 149.930 237.430 292.840 392.070 574.470 594.800 969.050 903.300
    ES_BIFAP_PCHOSP IR_upr 10.140 9.450 58.450 37.620 118.620 72.350 167.140 134.890 228.930 246.740 420.460 392.630 722.730 578.150 891.520 684.130
    IR_lwr 5.900 5.460 44.020 26.210 100.760 58.420 147.770 117.620 205.620 222.610 384.980 358.170 670.840 528.900 839.780 627.510
    IR 7.810 7.250 50.850 31.530 109.420 65.110 157.230 126.030 217.040 234.440 402.430 375.100 696.430 553.110 865.360 655.360
    ES_BIFAP_PC IR_upr 7.980 5.980 49.790 28.510 100.760 63.040 147.680 117.380 204.810 206.660 362.580 320.880 625.820 456.320 798.300 583.870
    IR_lwr 5.890 4.230 42.430 23.010 91.690 55.840 137.640 108.470 192.540 194.280 344.240 303.160 598.780 431.280 768.780 551.730
    IR 6.880 5.050 46.000 25.650 96.150 59.360 142.590 112.860 198.600 200.400 353.320 311.920 612.190 443.670 783.430 567.630
    gender Female Male Female Male Female Male Female Male Female Male Female Male Female Male Female Male
    age_gr 0–19 0–19 20–29 20–29 30–39 30–39 40–49 40–49 50–59 50–59 60–69 60–69 70–79 70–79 80+ 80+
    TABLE A4. Age–gender IR estimates and CIs for VTE from meta-analyses.
    Age Gender Estimate Lower Upper
    0–19 Female 5.697 4.095 7.924
    0–19 Male 4.322 2.911 6.418
    20–29 Female 32.773 20.772 51.707
    20–29 Male 21.825 14.778 32.231
    30–39 Female 62.316 38.087 101.957
    30–39 Male 44.485 29.621 66.807
    40–49 Female 93.155 57.581 150.707
    40–49 Male 89.761 61.579 130.841
    50–59 Female 118.074 75.574 184.476
    50–59 Male 157.258 111.909 220.984
    60–69 Female 203.979 133.110 312.582
    60–69 Male 242.021 171.372 341.798
    70–79 Female 372.142 237.184 583.891
    70–79 Male 340.595 234.922 493.800
    80+ Female 495.147 290.933 842.703
    80+ Male 421.791 258.186 689.069
    TABLE A5. Age–gender-stratified I2 and τ2 estimates from meta-analyses for the different sensitivity analyses.
    PC-H linkage ICD-10 code ACCESS ERASMUS
    Age Gender I2 τ2 I2 τ2 I2 τ2 I2 τ2
    0–19 Female 0.771 0.145 0.894 0.506 0.851 0.177 0.944 0.521
    0–19 Male 0.539 0.019 0.928 0.588 0.927 0.309 0.896 0.479
    20–29 Female 0.942 0.293 0.961 0.328 0.975 0.345 0.988 0.573
    20–29 Male 0.731 0.042 0.936 0.245 0.950 0.351 0.966 0.290
    30–39 Female 0.971 0.178 0.985 0.481 0.990 0.443 0.994 0.570
    30–39 Male 0.904 0.084 0.954 0.208 0.980 0.383 0.983 0.250
    40–49 Female 0.985 0.184 0.991 0.485 0.993 0.406 0.996 0.554
    40–49 Male 0.958 0.070 0.976 0.204 0.990 0.322 0.990 0.235
    50–59 Female 0.988 0.149 0.994 0.463 0.995 0.366 0.995 0.392
    50–59 Male 0.970 0.042 0.991 0.218 0.994 0.213 0.992 0.199
    60–69 Female 0.989 0.086 0.997 0.445 0.996 0.321 0.995 0.305
    60–69 Male 0.968 0.035 0.995 0.286 0.995 0.205 0.994 0.215
    70–79 Female 0.988 0.045 0.998 0.468 0.997 0.335 0.996 0.304
    70–79 Male 0.951 0.014 0.997 0.320 0.999 0.226 0.994 0.232
    80+ Female 0.967 0.019 0.998 0.368 0.998 0.631 0.994 0.228
    80+ Male 0.970 0.018 0.997 0.343 0.997 0.521 0.991 0.222

      The full text of this article hosted at iucr.org is unavailable due to technical difficulties.