On the role of marginal confounder prevalence – implications for the high-dimensional propensity score algorithm
Abstract
Purpose
The high-dimensional propensity score algorithm attempts to improve control of confounding in typical treatment effect studies in pharmacoepidemiology and is increasingly being used for the analysis of large administrative databases. Within this multi-step variable selection algorithm, the marginal prevalence of non-zero covariate values is considered to be an indicator for a count variable's potential confounding impact. We investigate the role of the marginal prevalence of confounder variables on potentially caused bias magnitudes when estimating risk ratios in point exposure studies with binary outcomes.
Methods
We apply the law of total probability in conjunction with an established bias formula to derive and illustrate relative bias boundaries with respect to marginal confounder prevalence.
Results
We show that maximum possible bias magnitudes can occur at any marginal prevalence level of a binary confounder variable. In particular, we demonstrate that, in case of rare or very common exposures, low and high prevalent confounder variables can still have large confounding impact on estimated risk ratios.
Conclusions
Covariate pre-selection by prevalence may lead to sub-optimal confounder sampling within the high-dimensional propensity score algorithm. While we believe that the high-dimensional propensity score has important benefits in large-scale pharmacoepidemiologic studies, we recommend omitting the prevalence-based empirical identification of candidate covariates. Copyright © 2015 John Wiley & Sons, Ltd.