Genetic Factors Influencing Cerebral Small Vessel Disease and Their Link to Recovery Outcomes Following Ischemic Stroke: A Two-Sample Mendelian Randomization Study
Abstract
Objective: The association between cerebral small vessel disease (CSVD) and postischemic stroke outcomes has been reported in observational studies. This study is aimed at clarifying the causal relationship between genetic predispositions to CSVD phenotypes and functional recovery after ischemic stroke using Mendelian randomization (MR).
Methods: We employed instrumental variables derived from genome-wide association studies (GWAS) of individuals of European ancestry to represent magnetic resonance imaging (MRI)-detected CSVD phenotypes, including white matter hyperintensities, cerebral microbleeds, and perivascular spaces. Data on functional outcomes after ischemic stroke were obtained from the Genetics of Ischemic Stroke Functional Outcome (GISCOME) network. The primary analysis, conducted as a two-sample MR study, utilized the inverse-variance weighted approach, which was further supplemented by additional MR techniques in sensitivity analyses to validate the robustness of our findings. The Steiger directionality test was applied to evaluate the direction of the causal relationship.
Results: In the primary analysis, no significant causal associations were found between genetic markers for CSVD phenotypes and poor functional outcomes (modified Rankin Scale ≥ 3) following ischemic stroke. The odds ratios (95% confidence intervals) for the different phenotypes were as follows: 0.90 (0.49–1.64) for white matter hyperintensity volume, 1.12 (0.85–1.49) for cerebral microbleeds, 3.42 (0.79–14.85) for white matter perivascular spaces, 0.02 (0.01–6.08) for basal ganglia perivascular spaces, and 1.02 (0.01–249.21) for hippocampal perivascular spaces. Sensitivity analyses supported the reliability of these results, showing no evidence of statistical heterogeneity or directional pleiotropy. Furthermore, the Steiger directionality test confirmed the accuracy of the inferred causal directions between CSVD phenotypes and functional outcomes.
Conclusion: This MR study does not support a causal effect of genetic liability to CSVD phenotypes on functional outcomes after ischemic stroke. These findings suggest that current genetic evidence does not support a direct cause effect of CSVD phenotypes on recovery after ischemic stroke.
1. Introduction
Cerebral small vessel disease (CSVD) is a group of pathological processes affecting the small arteries, arterioles, venules, and capillaries in the brain. It is more frequently observed in magnetic resonance imaging (MRI) scans of patients with ischemic stroke than in the general population [1]. CSVD encompasses a spectrum of MRI markers, including lacunes, white matter (WM) hyperintensities (WMHs), cerebral microbleeds (CMBs), and perivascular spaces (PVS) [2]. These markers reflect distinct but interrelated pathophysiological mechanisms, such as chronic hypoperfusion, endothelial dysfunction, and blood–brain barrier disruption. WMH are associated with demyelination and gliosis, CMBs indicate prior microhemorrhages, and PVSs represent impaired interstitial fluid drainage. Recent studies emphasize that these features often co-occur, and their combined burden—referred to as total CSVD burden—may better predict clinical outcomes than individual markers alone [2]. Furthermore, the variability in imaging definitions and measurement techniques contributes to heterogeneity in research findings, underscoring the need for standardized approaches and causal inference methods.
Previous studies have identified a paradox in ischemic stroke patients: Individual CSVD markers, such as lacune count, WMH, and PVS, are associated with poststroke functional outcomes [3, 4]. However, other research presents conflicting evidence, highlighting the predictive role of CMBs and the presence of lacunes for poststroke outcomes, leading to inconsistent conclusions [5, 6]. Establishing causal relationships between specific CSVD phenotypes and stroke outcomes is challenging, as observational studies are often confounded by biases and reverse causality.
Mendelian randomization (MR), which utilizes genetic variants as instrumental variables, provides a method to address these limitations. This approach is particularly effective in mitigating confounding biases and reverse causality due to the random allocation of genetic variants during meiosis [7]. While MR has been used to explore the relationship between genetically inferred MRI-detected CSVD phenotypes (such as WMH, CMBs, and PVS) and the risk of ischemic stroke [8–10], its application in studying poststroke functional outcomes remains unexplored.
In this study, we performed a two-sample MR analysis to investigate the causal effects of genetically predicted CSVD phenotypes on functional outcomes after ischemic stroke. Additionally, subgroup analyses were conducted to examine the impact of different CMB subtypes on these outcomes. Our findings provide valuable insights into the potential role of specific CSVD phenotypes in the secondary prevention of ischemic stroke.
2. Materials and Methods
2.1. Data Availability and Ethics Statement
All data used in this study are publicly available from genome-wide association studies (GWAS). As the study used publicly available, deidentified summary-level data, no additional ethical approval was required.
2.2. Study Design
This study employed a two-sample MR to assess the causal relationship between MRI-defined CSVD markers and functional outcomes following ischemic stroke. We selected GWAS data from populations primarily of European descent to minimize bias from population stratification. The study design is illustrated in Figure 1 and was conducted in accordance with the Strengthening the Reporting of Observational Studies in Epidemiology using Mendelian Randomization (STROBE-MR) guidelines [11].

2.3. Outcome Data Source
The primary outcome was assessed as poor functional outcome after ischemic stroke, defined by a modified Rankin Scale (mRS) score of 3–6, indicating moderate to severe disability or death, measured 3 months poststroke. A score of 0–2 on the mRS was considered a good functional outcome. Genetic predictions for the 3-month post-stroke mRS scores were obtained from the Genetics of Ischemic Stroke Functional Outcome (GISCOME) network’s GWAS dataset, which collated data from 12 European-ancestry cohorts [12, 13]. The mRS score at 3 months was used as a measure of short-term functional prognosis, which is commonly adopted in clinical research and practice. The outcomes of this GWAS were bifurcated into two groups (mRS scores of 3–6 vs. 0–2) and were adjusted for variables such as age, sex, ancestry, and the initial severity of the stroke, measured by the NIH Stroke Scale (NIHSS) [13]. However, treatment strategies such as thrombolysis or mechanical thrombectomy were not uniformly available across cohorts and were not adjusted for in the GWAS summary statistics, which may contribute to residual confounding. The study included 6,021 ischemic stroke patients, with 2280 (37.9%) experiencing poor outcomes (mRS scores of 3–6) [13].
2.4. Exposure Data Source
Our exposure data were derived from the largest and most recent GWAS on MRI-detected CSVD phenotypes: WMH volume (N = 48,454), CMBs (N = 23,032), and PVS in white matter perivascular spaces (WM-PVSs) (N = 38,598), basal ganglia (BG) perivascular spaces (BG-PVSs) (N = 38,903), and hippocampus (HIP) perivascular spaces (HIP-PVS) (N = 38,871) [8, 10, 14]. We used summary statistics from GWAS of participants predominantly of European ancestry, with a small proportion (5.8%) from other ancestries in the CMBs dataset (see Table S1).
The WMH volume data were obtained from individuals in the UK Biobank and the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) consortium [15], measured using T1, T2, and fluid-attenuated inversion recovery (FLAIR) sequences, typically with fully automated software. The WMH volume data were transformed using a rank-based inverse-normal transformation (Table S1) [8]. Notably, measures such as mean diffusivity and fractional anisotropy, which indicate WM lesions from diffusion tensor imaging, were excluded due to their advanced technical requirements, which are not commonly available in primary hospitals.
CMBs were identified as small, hypointense lesions on susceptibility-weighted imaging or T2∗-weighted gradient echo sequences, excluding participants with dementia or stroke from the GWAS summary [14]. CMB classification was based on their location: lobar (absence in deep or infratentorial regions) and mixed (presence in deep or infratentorial regions, possibly combined with lobar CMBs) [14].
PVS are fluid-filled spaces that match the signal of cerebrospinal fluid, appearing round, ovoid, or linear depending on the slice direction, typically under 3 mm in diameter, without a hyperintense rim on T2-weighted or FLAIR sequences, and located in areas supplied by perforating arteries [2]. The PVS burden was quantified in the cerebral WM, BG, and HIP using either visual semiquantitative rating scales or automated methods [10]. To standardize across different scales, the PVS burden was categorized as “extensive” versus other categories, based on a cutoff near the top quartile of the scale distribution within each cohort [10].
2.5. Genetic Instruments for MRI-Detected CSVD Phenotype
For the CSVD phenotype, we selected genetic variants identified as single nucleotide polymorphisms (SNPs) with a genome-wide significance threshold of p < 5 × 10−8 based on GWAS summary data. When SNPs did not meet this threshold, a secondary significance level of p < 10 × 10−6 was applied. To ensure all instruments were subjected to linkage disequilibrium (LD) clumping (r2 < 0.01, 10,000 kb window) to ensure independence. While external validation of these SNPs in separate cohorts was not performed, they have been previously reported and replicated in peer-reviewed studies. All selected SNPs had F − statistics > 10, indicating strong associations with CSVD phenotypes, which helps minimize the risk of weak instrument bias.
Additionally, SNPs associated with CSVD that were also linked to waist-to-hip ratio or smoking—known risk factors for poststroke outcomes in previous MR analyses—were excluded to prevent confounding effects [16, 17]. This exclusion was based on a significance level of p < 0.05, adjusted for the number of SNPs after clumping. The GWAS on waist-to-hip ratio and smoking were restricted to European ancestry to ensure consistency [18, 19], with details presented in Table S1.
2.6. MR Analyses
Before conducting MR analyses, we harmonized the effects of SNPs on the studied exposures and outcomes to align effect sizes with the same effect allele, assuming all alleles were on the forward strand across datasets. Proxy SNPs were not used as instrumental variables for SNPs missing in the ischemic stroke outcome dataset.
The primary analysis employed the random-effects inverse variance weighted (IVW) method [20]. For instrumental variables represented by a single SNP, the Wald method was applied [20]. Sensitivity analyses included the weighted median [21], MR-Egger regression [22], MR-Robust Adjusted Profile Score (MR-RAPS) [23], and MR-PRESSO methods [24], with the MR-Egger intercept test used to detect potential directional pleiotropy. The MR-PRESSO method was employed to identify potential outliers [24]. Cochran’s Q statistic was used to assess SNP heterogeneity, with p < 0.05 considered significant. Leave-one-out plots and funnel plots were used to evaluate the influence of individual SNPs and assess potential pleiotropy.
Odds ratios (ORs) with 95% confidence intervals (CIs) were calculated for poor functional outcomes after ischemic stroke, based on standard deviation increase in WMH volume and the presence or absence of CMBs and extensive PVS. The analysis of CMBs was further stratified into lobar and mixed CMB subtypes.
For each exposure-outcome pair, the Steiger directionality test was conducted to confirm the direction of the causal relationship, aiming to prevent reverse causation [25]. With adjustments for five CSVD phenotypes and one outcome, a Bonferroni-adjusted p value of < 0.01 was considered statistically significant, while p < 0.05 indicated nominal significance.
Statistical analyses were performed using R Version 4.3.1 and the following packages: TwoSampleMR (0.5.10), mr.raps (0.2), and MR-PRESSO (1.0).
3. Results
The GWAS datasets used in this study are summarized in Table S1, which details the number of SNPs after clumping and the removal of variants associated with waist-to-hip ratio or smoking, as shown in Table S2. For CMBs, only two SNPs met the genome-wide association significance criteria, necessitating a lenient threshold (p < 1 × 10−6) for CMBs and their subtypes. After harmonization, the number of independent SNPs selected as instruments was 16 for WMH volume, 6 for CMBs, 12 for WM-PVS, 1 for BG-PVS, and 2 for HIP-PVS, with 9 for lobar CMBs and 8 for mixed CMBs.
The characteristics of the included SNPs associated with MRI-defined CSVD phenotypes are presented in Table S3. The genetic variants used as instruments for CSVD phenotypes explained varying percentages of variance: 1.96% for WMH volume, 0.7% for CMBs, 1.27% for WM-PVS, 0.06% for BG-PVS, and 0.17% for HIP-PVS. The F-statistics for individual SNPs ranged from 21 to 204, indicating a low likelihood of weak instrument bias. This suggests strong associations between the SNPs and the CSVD phenotypes, reducing the risk of weak instrument bias and supporting the validity of the MR estimates.
Main MR analyses indicated no significant association between genetic predispositions for CSVD phenotypes and postischemic stroke functional outcomes. ORs (95% CI) for WMH volume, CMBs, WM-PVS, BG-PVS, and HIP-PVS were 0.90 (0.49–1.64, p = 0.73; no significant association), 1.12 (0.85–1.49, p = 0.41; no significant association), 3.42 (0.79–14.85, p = 0.10; no significant association), 0.02 (0.01–6.08, p = 0.18; no significant association), and 1.02 (0.01–249.21, p = 0.99; no significant association) (Figure 2, Figure S1). The Steiger directionality testing confirmed the accuracy of the causal directions inferred between CSVD phenotypes and functional outcomes (Table S4).

The MR-Egger intercept tests did not identify significant directional pleiotropy for any of the CSVD phenotypes (all p values > 0.05), supporting the validity of the MR assumptions. These findings are consistent with the MR-PRESSO global tests, which also showed no evidence of pleiotropy or outliers. Sensitivity analyses further corroborated these findings, with no evidence of statistical heterogeneity or directional pleiotropy observed (p for Cochran’s Q > 0.05, p for MR-Egger intercept > 0.05) (Figure 2, Figures S1, S2, and S3). Additionally, MR-PRESSO analysis did not detect any outlier SNPs for the tested CSVD phenotypes. The global test also provided no evidence of horizontal pleiotropy, with the following p values: 0.302 for WMH volume, 0.974 for CMBs, 0.968 for lobar CMBs, 0.117 for mixed CMBs, and 0.165 for WM-PVS. These results further reinforce the robustness and validity of our MR estimates.
4. Discussion
This study represents the first systematic exploration of the causal relationship between CSVD phenotypes, as identified by MRI, and functional outcomes after ischemic stroke. While previous MR studies have demonstrated a significant causal relationship between WMH and PVS and the risk of developing ischemic stroke [16, 17], our findings extend this line of inquiry by focusing on poststroke recovery. Notably, we did not observe a significant causal association between the genetic predispositions to CSVD phenotypes—including WMH, CMBs, and PVS across different brain regions—and poor functional outcomes after ischemic stroke. These results suggest that while genetic liability to CSVD may influence stroke risk, it does not appear to have a direct impact on recovery once a stroke has occurred.
This contrasts with findings from several observational studies that have reported associations between CSVD features and poststroke outcomes. Several factors may account for the discrepancies between our results and those of prior observational research. First, observational studies are inherently susceptible to confounding and reverse causation, whereas MR mitigates these biases by leveraging genetic variants as instrumental variables. Second, our analysis was based on summary-level data from populations of predominantly European ancestry, which may limit the generalizability of our findings. In contrast, previous observational studies often included more ethnically diverse cohorts, which could contribute to the observed differences.
Nevertheless, our negative findings should be interpreted with caution, considering both methodological and statistical limitations. Although the genetic instruments used in our study met genome-wide significance thresholds, they may not fully capture the complex, polygenic nature of CSVD, potentially limiting our power to detect modest causal effects. This issue is particularly relevant for phenotypes such as CMBs and hippocampal PVS, where the limited number of available SNPs may have substantially reduced the statistical power, thereby increasing the risk of false-negative results. Moreover, differences between our findings and prior observational studies may reflect variations in study design (MR vs. observational), differences in population characteristics, or the use of genetically predicted proxies rather than direct imaging-based measures.
Additionally, poststroke recovery is influenced by a wide range of clinical and environmental factors—such as initial stroke severity, rehabilitation quality, and comorbid conditions—which may overshadow the contribution of genetic predisposition alone. Collectively, these factors highlight the inherent challenges in inferring causality in complex cerebrovascular traits.
GWAS data for lacunes, a key imaging marker of CSVD strongly associated with stroke prognosis, were not available. The absence of lacune-related genetic data limited the comprehensiveness of the CSVD phenotype assessment and might have led to an incomplete evaluation of its impact on stroke recovery. Future GWAS focused explicitly on lacunar infarctions or composite CSVD scores would be valuable for a more accurate evaluation of causality.
Furthermore, our study was unable to perform ischemic stroke subtype-specific analyses because the GWAS dataset used (GISCOME network) did not provide outcome data stratified by stroke etiology (e.g., large artery atherosclerosis and small vessel occlusion). Since CSVD may have different effects across stroke subtypes, this limitation could have masked potential subtype-specific associations and led to an underestimation of the causal impact.
Importantly, the null results do not negate the clinical relevance of CSVD. Genetic predisposition likely represents only one dimension of CSVD, while acquired vascular risk factors (e.g., hypertension, diabetes, and inflammation) and environmental factors might have stronger impacts on stroke outcomes. Additionally, CSVD-related mechanisms such as chronic hypoperfusion and impaired fluid clearance may still influence stroke recovery, suggesting that CSVD-targeted interventions remain worthy of exploration.
The robustness of our findings is supported by the absence of horizontal pleiotropy as confirmed by MR-Egger intercept and MR-PRESSO tests. Nevertheless, undetected biases cannot be entirely excluded, emphasizing that MR results should always be interpreted within a broader epidemiological context.
Our study has several limitations. The study population consisted predominantly of individuals of European ancestry. The genetic architecture of CSVD may vary across different ethnicities, and the findings may not be fully generalizable to non-European populations. Additionally, although adjustments for major confounders were made in the original GWAS, residual confounding factors, such as differences in acute stroke treatment strategies, could not be completely excluded. This might have introduced bias and affected the reliability of the causal estimates. Future studies should be aimed at including diverse ethnic groups, incorporating comprehensive treatment data, and utilizing larger GWAS datasets to validate our findings and better understand the role of CSVD in stroke recovery.
Furthermore, observational studies conducted in both European [3, 26] and Asian populations [5, 27] have shown a link between the total CSVD score, which combines lacunes, WMH, CMBs, and PVS, and poststroke functional outcomes. Future research should focus on developing GWAS data for the total CSVD score to explore the genetic basis of CSVD more thoroughly and its effects on stroke outcomes.
5. Conclusions
While no causal relationship was established between genetic predispositions for CSVD phenotypes and poststroke functional outcomes in our study, the findings highlight the critical need for ongoing research. Future studies with larger sample sizes and more ethnically diverse populations are needed to better explore the potential genetic underpinnings of CSVD and its role in poststroke recovery. Given the multifactorial nature of stroke outcomes, integrative approaches combining genetic, imaging, and clinical data may offer more comprehensive insights.
Conflicts of Interest
The authors declare no conflicts of interest.
Author Contributions
Zeyu Jiang: writing—original draft, visualization, project administration, methodology, formal analysis, data curation. Shuhan Pan: formal analysis, methodology. Kun Zhao: visualization, project administration. Jian Sun: writing—review and editing, conceptualization. All authors read and approved the final manuscript.
Funding
No funding was received for this research.
Acknowledgments
The authors express their gratitude to the investigators, staff, and participants of the contributing cohorts within the CHARGE consortium, from which the results presented in this publication were derived. Furthermore, acknowledgment is extended to the GISCOME network and the ISGC Cerebrovascular Disease Knowledge Portal for their provision of GWAS (Genome-Wide Association Studies) summary data.
Open Research
Data Availability Statement
The data that support the findings of this study are available in dbGaP at https://www.ncbi.nlm.nih.gov/gap/ (reference number: phs002227.v1.p1). These data were derived from the following resources available in the public domain: Cerebrovascular Disease KP genetic association datasets (https://cd.hugeamp.org/datasets.html); GWAS Catalog (https://www.ebi.ac.uk/gwas/).