Autoregressive and moving average models for zero-inflated count time series
Vurukonda Sathish
Department of Electrical Engineering, Indian Institute of Technology Bombay, Mumbai, India
Search for more papers by this authorCorresponding Author
Siuli Mukhopadhyay
Department of Mathematics, Indian Institute of Technology Bombay, Mumbai, India
Correspondence Siuli Mukhopadhyay, Department of Mathematics, Indian Institute of Technology Bombay, Mumbai 400 076, India.
Email: [email protected]
Search for more papers by this authorRashmi Tiwari
Department of Mathematics, Indian Institute of Technology Bombay, Mumbai, India
Search for more papers by this authorVurukonda Sathish
Department of Electrical Engineering, Indian Institute of Technology Bombay, Mumbai, India
Search for more papers by this authorCorresponding Author
Siuli Mukhopadhyay
Department of Mathematics, Indian Institute of Technology Bombay, Mumbai, India
Correspondence Siuli Mukhopadhyay, Department of Mathematics, Indian Institute of Technology Bombay, Mumbai 400 076, India.
Email: [email protected]
Search for more papers by this authorRashmi Tiwari
Department of Mathematics, Indian Institute of Technology Bombay, Mumbai, India
Search for more papers by this authorFunding information: Department of Science and Technology India, EMR/2016/005142; Wadhwani Research Centre for Bio-Engineering
Abstract
Zero inflation is a common nuisance while monitoring disease progression over time. This article proposes a new observation-driven model for zero-inflated and over-dispersed count time series. The counts given from the past history of the process and available information on covariates are assumed to be distributed as a mixture of a Poisson distribution and a distribution degenerated at zero, with a time-dependent mixing probability, . Since, count data usually suffers from overdispersion, a Gamma distribution is used to model the excess variation, resulting in a zero-inflated negative binomial regression model with mean parameter . Linear predictors with autoregressive and moving average (ARMA) type terms, covariates, seasonality and trend are fitted to and through canonical link generalized linear models. Estimation is done using maximum likelihood aided by iterative algorithms, such as Newton-Raphson (NR) and Expectation and Maximization. Theoretical results on the consistency and asymptotic normality of the estimators are given. The proposed model is illustrated using in-depth simulation studies and two disease datasets.
REFERENCES
- Aknouche, A., & Francq, C. (2021). Count and duration time series with equal conditional stochastic and mean orders. Econometric Theory, 37(2), 248–280.
- Benjamin, M. A., Rigby, R. A., & Stasinopoulos, D. M. (2003). Generalized autoregressive moving average models. Journal of the American Statistical Association, 98(461), 214–223.
- Briët, O. J., Amerasinghe, P. H., & Vounatsou, P. (2013). Generalized seasonal autoregressive integrated moving average models for count data with application to malaria time series with low case numbers. PLoS One, 8(6), e65761.
- Chan, K. S., & Ledolter, J. (1995). Monte Carlo EM estimation for time series models involving counts. Journal of the American Statistical Association, 90, 242–252.
- Chiogna, M., & Gaetan, C. (2002). Dynamic generalized linear models with application to environmental epidemiology. Applied Statistics, 51, 453–468.
- Clark, S., & Perry, J. (1989). Estimation of the negative binomial parameter k by maximum quasilikelihood. Biometrics, 45(1), 309–316.
- Cox, D. R. (1981). Statistical analysis of time series: Some recent developments. Scandinavian Journal of Statistics, 8, 93–115.
- Davidson, J. (1994). Stochastic limit theory: An introduction for econometricians. Oxford, UK: Oxford University Press.
10.1093/0198774036.001.0001 Google Scholar
- Davis, R. A., Dunsmuir, W. T. M., & Streett, S. B. (2003). Observation-driven models for Poisson counts. Biometrika, 90(4), 777–790.
- Davis, R. A., & Liu, H. (2016). Theory and inference for a class of observation-driven models with application to time series of counts. Statistica Sinica, 102, 1673–1707.
- Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B, 39(1), 1–38.
10.1111/j.2517-6161.1977.tb01600.x Google Scholar
- Douc, R., Doukhan, P., & Moulines, E. (2013). Ergodicity of observation-driven time series models and consistency of the maximum likelihood estimator. Stochastic Processes and Their Applications, 123, 2620–2647.
- Doukhan, P., & Leucht, A. (2021). Mixing properties of non-stationary INGARCH(1; 1) processes. arXiv:2011.05854.
- Doukhan, P., Neumann, M. H., & Truquet, L. (2020). Stationarity and ergodic properties for some observation-driven models in random environments. arXiv preprint arXiv:2007.07623.
- Durbin, J., & Koopman, S. J. (1997). Monte Carlo maximum likelihood estimation for non-Gaussian state space models. Biometrika, 84, 669–684.
- Durbin, J., & Koopman, S. J. (2000). Time series analysis of non-Gaussian observations based on state space models from both classical and Bayesian perspectives. Journal of the Royal Statistical Society. Series B, 62(1), 3–56.
- Durbin, J., & Koopman, S. J. (2012). Time series analysis by state space methods. Oxford, UK: Oxford University Press.
10.1093/acprof:oso/9780199641178.001.0001 Google Scholar
- Fahrmeir, L., & Tutz, G. (2001). Multivariate statistical modeling based on generalized linear models. New York, NY: Springer-Verlag.
10.1007/978-1-4757-3454-6 Google Scholar
- Fahrmeir, L., & Wagenpfeil, S. (1997). Penalized likelihood estimation and iterative Kalman filtering for non-Gaussian dynamic regression models. Computational Statistics and Data Analysis, 24, 295–320.
- Fokianos, K., & Kedem, B. (1998). Prediction and classification of non-stationary categorical time series. Journal of Multivariate Analysis, 67, 277–296.
- Fokianos, K., & Kedem, B. (2004). Partial likelihood inference for time series following generalized linear models. Journal of Time Series Analysis, 25(2), 173–197.
- Fokianos, K., Rahbek, A., & Tjøstheim, D. (2009). Poisson autoregression. Journal of the American Statistical Association, 104(488), 1430–1439.
- Fokianos, K., & Tjøstheim, D. (2011). Log-linear Poisson autoregression. Journal of Multivariate Analysis, 102, 563–578.
- Gamerman, D. (1998). Markov chain Monte Carlo for dynamic generalised linear models. Biometrika, 85, 215–227.
- Gamerman, D., Santos, T. R., & Franco, G. C. (2013). A non-Gaussian family of state-space models with exact marginal likelihood. Journal of Time Series Analysis, 34, 625–645.
- Ghahramani, M., & White, S. (2020). Time series regression for zero-inflated and overdispersed count data: A functional response model approach. Journal of Statistical Theory and Practice, 14(2), 1–18.
- Gharbi, M., Quenel, P., Gustave, J., Cassadou, S., La Ruche, G., Girdary, L., & Marrama, L. (2011). Time series analysis of dengue incidence in Guadeloupe, French West Indies: Forecasting models using climate variables as predictors. BMC Infectious Diseases, 11(1), 166.
- Godolphin, E. J., & Triantafyllopoulos, K. (2006). Decomposition of time series models in state-space form. Computational Statistics and Data Analysis, 50, 2232–2246.
- Hall, P., & Heyde, C. C. (2014). Martingale limit theory and its application. London, UK: Academic press.
- Hardy, G., Littlewood, J., & Polya, G. (1934). Inequalities. Cambridge, MA: Cambridge University Press.
- Jain, R., Sontisirikit, S., Iamsirithaworn, S., & Prendinger, H. (2019). Prediction of dengue outbreaks based on disease surveillance, meteorological and socio-economic data. BMC Infectious Diseases, 19(1), 272.
- Jazi, M. A., Jones, G., & Lai, C.-D. (2012). First-order integer valued AR processes with zero inflated Poisson innovations. Journal of Time Series Analysis, 33(6), 954–963.
- Jensen, S. T., & Rahbek, A. (2007). On the law of large numbers for (geometrically) ergodic Markov chains. Econometric Theory, 23(4), 761–766.
- Kedem, B., & Fokianos, K. (2002). Regression models for time series analysis. Hoboken, NJ: John Wiley & Sons.
10.1002/0471266981 Google Scholar
- Lambert, D. (1992). Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics, 34(1), 1–14.
- Li, W. K. (1994). Time series models based on generalized linear models: Some further results. Biometrics, 50(2), 506–511.
- Lu, L., Lin, H., Tian, L., Yang, W., Sun, J., & Liu, Q. (2009). Time series analysis of dengue fever and weather in Guangzhou, China. BMC Public Health, 9(1), 395.
- Luz, P. M., Mendes, B. V., Codeço, C. T., Struchiner, C. J., & Galvani, A. P. (2008). Time series analysis of dengue incidence in Rio de Janeiro, Brazil. The American Journal of Tropical Medicine and Hygiene, 79(6), 933–939.
- Martinez, E. Z., & da Silva, E. A. S. (2011). Predicting the number of cases of dengue infection in Ribeirão Preto, São Paulo State, Brazil, using a Sarima model. Cadernos de Saude Publica, 27, 1809–1818.
- Minh An, D. T., & Rocklöv, J. (2014). Epidemiology of dengue fever in Hanoi from 2002 to 2010 and its meteorological determinants. Global Health Action, 7(1), 23074.
- Perumean-Chaney, S. E., Morgan, C., McDowall, D., & Aban, I. (2013). Zero-inflated and overdispersed: What's one to do? Journal of Statistical Computation and Simulation, 83, 1671–1683.
- Piegorsch, W. (1990). Maximum likelihood estimation for the negative binomial dispersion parameter. Biometrics, 46(3), 863–867.
- Qi, X., Li, Q., & Zhu, F. (2019). Modeling time series of count with excess zeros and ones based on INAR (1) model with zero-and-one inflated Poisson innovations. Journal of Computational and Applied Mathematics, 346, 572–590.
- Schmidt, A. M., & Pereira, J. B. M. (2011). Modelling time series of counts in epidemiology. International Statistical Review, 79(1), 48–69.
- Shepard, D. S., Halasa, Y. A., Tyagi, B. K., Adhish, S. V., Nandan, D., Karthiga, K., … I. S. Group. (2014). Economic and disease burden of dengue illness in India. The American Journal of Tropical Medicine and Hygiene, 91(6), 1235–1242.
- Shephard, N. (1995). Generalized linear autoregressions. In Economics papers 8. Oxford, UK: Economics Group, Nuffield College, University of Oxford.
- Shephard, N., & Pitt, M. K. (1997). Likelihood analysis of non-Gaussian measurement time series. Biometrika, 84, 653–667.
- Siriyasatien, P., Phumee, A., Ongruk, P., Jampachaisri, K., & Kesorn, K. (2016). Analysis of significant factors for dengue fever incidence prediction. BMC Bioinformatics, 17(1), 166.
- Vuong, Q. (1989). Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica, 57, 307–333.
- West, M., Harrison, P. J., & Migon, H. S. (1985). Dynamic generalized linear models and Bayesian forecasting. Journal of the American Statistical Association, 80, 73–96.
- Wongkoon, S., Jaroensutasinee, M., & Jaroensutasinee, K. (2012). Development of temporal modeling for prediction of dengue infection in northeastern Thailand. Asian Pacific Journal of Tropical Medicine, 5(3), 249–252.
- Xu, H.-Y., Fu, X., Lee, L. K. H., Ma, S., Goh, K. T., Wong, J., … Lim, C. L. (2014). Statistical modeling reveals the effect of absolute humidity on dengue in Singapore. PLoS Neglected Tropical Diseases, 8(5), e2805.
- Xu, X., Chen, Y., Chen, C. W., & Lin, X. (2020). Adaptive log-linear zero-inflated generalized Poisson autoregressive model with applications to crime counts. Annals of Applied Statistics, 14(3), 1493–1515.
- Yang, M., Zamba, G. K., & Cavanaugh, J. E. (2013). Markov regression models for count time series with excess zeros: A partial likelihood approach. Statistical Methodology, 14, 26–38.
10.1016/j.stamet.2013.02.001 Google Scholar
- Yau, K. K., Lee, A. H., & Carrivick, P. J. (2004). Modeling zero-inflated count series with application to occupational health. Computer Methods and Programs in Biomedicine, 74(1), 47–52.
- Zeger, S. L., & Qaqish, B. (1988). Markov regression models for time series: A quasi-likelihood approach. Biometrics. Journal of the Biometric Society, 44(4), 1019–1031.
- Zheng, T., Xiao, H., & Chen, R. (2015, December). Generalized ARMA models with martingale difference errors. Journal of Econometrics, 189(2), 492–506.
- Zhu, F. (2012). Zero-inflated Poisson and negative binomial integer-valued GARCH models. Journal of Statistical Planning and Inference, 142(4), 826–839.