Volume 71, Issue 2 pp. 344-353
BIOMETRIC METHODOLOGY

Generalized multilevel function-on-scalar regression and principal component analysis

Jeff Goldsmith

Corresponding Author

Jeff Goldsmith

Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, New York, U.S.A.

email: [email protected]Search for more papers by this author
Vadim Zipunnikov

Vadim Zipunnikov

Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland, U.S.A.

Search for more papers by this author
Jennifer Schrack

Jennifer Schrack

Department of Epidemiology, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland, U.S.A.

Longitudinal Studies Section, Translational Gerontology Branch, National Institute on Aging, National Institutes of Health, Bethesda, Maryland, U.S.A.

Search for more papers by this author
First published: 25 January 2015
Citations: 103

Summary

This manuscript considers regression models for generalized, multilevel functional responses: functions are generalized in that they follow an exponential family distribution and multilevel in that they are clustered within groups or subjects. This data structure is increasingly common across scientific domains and is exemplified by our motivating example, in which binary curves indicating physical activity or inactivity are observed for nearly 600 subjects over 5 days. We use a generalized linear model to incorporate scalar covariates into the mean structure, and decompose subject-specific and subject-day-specific deviations using multilevel functional principal components analysis. Thus, functional fixed effects are estimated while accounting for within-function and within-subject correlations, and major directions of variability within and between subjects are identified. Fixed effect coefficient functions and principal component basis functions are estimated using penalized splines; model parameters are estimated in a Bayesian framework using Stan, a programming language that implements a Hamiltonian Monte Carlo sampler. Simulations designed to mimic the application have good estimation and inferential properties with reasonable computation times for moderate datasets, in both cross-sectional and multilevel scenarios; code is publicly available. In the application we identify effects of age and BMI on the time-specific change in probability of being active over a 24-hour period; in addition, the principal components analysis identifies the patterns of activity that distinguish subjects and days within subjects.

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.