Volume 24, Issue 9 pp. 1004-1007
Brief Report

On the role of marginal confounder prevalence – implications for the high-dimensional propensity score algorithm

Tibor Schuster

Corresponding Author

Tibor Schuster

Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, Quebec, Canada

Centre for Clinical Epidemiology, Lady Davis Institute, Jewish General Hospital, Montreal, Quebec, Canada

Correspondence to: T. Schuster, Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Purvis Hall 1020 Pine Ave. West, Montreal, Quebec, Canada, H3A 1A2. E-mail: [email protected]Search for more papers by this author
Menglan Pang

Menglan Pang

Centre for Clinical Epidemiology, Lady Davis Institute, Jewish General Hospital, Montreal, Quebec, Canada

Search for more papers by this author
Robert W. Platt

Robert W. Platt

Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, Quebec, Canada

Department of Pediatrics, McGill University, Montreal, Quebec, Canada

Search for more papers by this author
First published: 10 April 2015
Citations: 17

Abstract

Purpose

The high-dimensional propensity score algorithm attempts to improve control of confounding in typical treatment effect studies in pharmacoepidemiology and is increasingly being used for the analysis of large administrative databases. Within this multi-step variable selection algorithm, the marginal prevalence of non-zero covariate values is considered to be an indicator for a count variable's potential confounding impact. We investigate the role of the marginal prevalence of confounder variables on potentially caused bias magnitudes when estimating risk ratios in point exposure studies with binary outcomes.

Methods

We apply the law of total probability in conjunction with an established bias formula to derive and illustrate relative bias boundaries with respect to marginal confounder prevalence.

Results

We show that maximum possible bias magnitudes can occur at any marginal prevalence level of a binary confounder variable. In particular, we demonstrate that, in case of rare or very common exposures, low and high prevalent confounder variables can still have large confounding impact on estimated risk ratios.

Conclusions

Covariate pre-selection by prevalence may lead to sub-optimal confounder sampling within the high-dimensional propensity score algorithm. While we believe that the high-dimensional propensity score has important benefits in large-scale pharmacoepidemiologic studies, we recommend omitting the prevalence-based empirical identification of candidate covariates. Copyright © 2015 John Wiley & Sons, Ltd.

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.