Yule-Walker Estimation for the Moving-Average Model
Abstract
The standard Yule-Walker equations, as they are known for an autoregression, are generalized to involve the moments of a moving-average process indexed on any number of dimensions. Once observations become available, new moments estimators are set to imitate the theoretical equations. These estimators are not only consistent but also asymptotically normal for any number of indexes. Their variance matrix resembles a standard result from maximum Gaussian likelihood estimation. A simulation study is added to conclude on their efficiency.
1. Introduction
At first, processes taking place on more than one dimension were considered on surfaces or in the three-dimensional space for the sake of spatial dynamics. Next, the time series Autoregressive Moving-Average model could be as well used to clothe the covariance dependence of spatial point processes. Nevertheless, the assumptions of causality and invertibility were based on conventional orderings of the vector indexes, as these were introduced by Whittle [1] and generalized by Guyon [2], and they failed to look natural on the spatial axes. The more general bilateral equations considered by Whittle [1] also related to serious problems during the estimation of the parameters. Besag’s [3] automodels as well as Besag’s [4] equations for observations on irregular sites moved in a new direction but the weaknesses during the estimation remained.
The inclusion of the time axis has provided more convenience for the spatial analysis that may be summarized in two ways. On the one hand when the observations are collected on irregular sites, the asymptotic properties of the estimators for the spatial dependence parameters can now be established as the number of timings only increases to infinity; for example, Dimitriou-Fakalou [5] has proposed a methodology on how to proceed. On the other hand when the regular recordings are increasing towards all the spatial as well as the time axes, the unilateral ordering of indexes can now be arranged in a meaningful way and the causal and invertible ARMA model may be preferred as a natural spatiotemporal equation.
Special cases of the ARMA model are the pure autoregressive and moving-average equations and they both relate to useful properties as in the spectral density of interest. The autoregressive process uses a spectral density with “finite denominator”, which for Gaussian processes [3] translates into a finite conditional expectation of any value based on all other values. A moving-average process uses a “finite numerator” in the spectral density, resulting in the autocovariance function with nonzero values on a finite set of “lags” only.
For a causal and finite autoregressive equation, the method of moments estimators according to the Yule-Walker equations almost coincides with the least squares or maximum Gaussian likelihood estimators and their consistency as well as their asymptotic normality may be established. Nevertheless, if the estimation for a pure moving-average or ARMA multi-indexed process takes place using its infinite autoregressive representation, a problem known as the “edge-effect” [2] resurrects and the asymptotic normality cannot be guaranteed. A modification of the Gaussian likelihood, which has been written from the autoregressive representation of each observation based on its “previous” ones, has been proposed by Yao and Brockwell [6] to confine the edge-effect with a weak condition of a finite second moment, but this works for the case of processes indexed on two dimensions only.
In this paper, for the multidimensional processes that may be clothed via an invertible and finite moving-average equation, a new method of estimation is proposed imitating the Yule-Walker equations. The finite number of nonzero autocovariances of the moving-average process is used most advantageously to eliminate the edge-effect. As a result, the proposed estimators are not only consistent but also asymptotically normal using relaxed conditions of finite second moments.
2. Theoretical Yule-Walker Equations
3. Method of Moments Estimators
For 𝒮 ⊂ ℤd being a set of finite cardinality N, there are available {Y(v), v ∈ 𝒮} from (2.2) with true parameters that are to be estimated. The following condition will be assumed to be true.
Condition C. The parameter space Θ ⊂ ℝq is a compact set containing the true value θ0 as an inner point. Further, for any the moving-average model (2.2) is invertible.
Based on 𝒮, a set ℱv will be defined for any v ∈ ℤd. More specifically, it will hold that i ∈ ℱv only if v − i ∈ 𝒮. Next, for the corrected set 𝒮* with N* elements, it will hold v ∈ 𝒮* if v + i ∈ 𝒮 for every i ∈ ℱ. Thus for any v ∈ 𝒮*, it holds that ℱ⊆ℱv. The reduction from 𝒮 to 𝒮* is essential as it will guarantee later that the edge-effect will fade away; this is because the set 𝒮* ⊂ 𝒮 includes the locations v that have all their “neighbors” v − i with E(Y(v)Y(v − i)) ≠ 0 being available. As the source of the edge-effect is the speed of the absolute bias to zero, 𝒮* will guarantee that whatever “is missing” from the finite sample will not add at all to the bias of the estimators.
Proposition 3.1. The polynomials and θ0(z) −1 = 1 + ∑i>0Θi,0zi with ∑i>0|Θi,0| < ∞ as well as γ0(z) = θ0(z)θ0(z−1) ≡ ∑j∈ℱγj,0zj and are considered. For any , such that it holds that and θ(z)−1 = 1 + ∑i>0Θizi with ∑i>0|Θi| < ∞, as well as γ(z) = θ(z)θ(z−1) ≡ ∑j∈ℱγjzj and , the unique solution that satisfies the q⋆ equations , n = 1, …, q, and , n, m = 1, …, q, n > m, is , n = 1, …, q.
Example 3.2. For the simplest one-dimensional case when q* = q = 1, it is demonstrated in detail how to compute the new estimator; some other cases will briefly be considered later. Suppose that are available {Y(1), …, Y(N)} from Y(v) = e(v) + θe(v − 1), where |θ| < 1, θ ≠ 0, and {e(v)} are uncorrelated random variables with variance unity. The original set 𝒮 ≡ {1,2, …, N} will be corrected to 𝒮* ≡ {2,3, …, N − 1} that it is the maximal set, such that for all v ∈ 𝒮* it holds that E(Y(v)Y(v*)) = 0, v* ∉ 𝒮. Note that the “lags” on which the autocorrelations of Y are not equal to zero are as in the set ℱ = {1,0, −1}; further, for any v ∈ ℤ it holds that ℱv = {v − 1, …, v − N} and, thus, for v ∈ 𝒮* it is true that ℱ⊆ℱv.
The autocovariance of the relevant auto-regression at the “lag” j ∈ ℤ is (−θ)|j|/(1 − θ2) . As a result, the estimator will be the solution of the following equation:
4. Properties of Estimators
The asymptotic normality of standard estimators for the parameters of stationary processes indexed on ℤd, d ≥ 2, has not been established yet. More specifically as Guyon [2] demonstrated, the bias of the genuine Gaussian likelihood estimators, computed from N regular recordings increasing to infinity on all dimensions and at equal speeds, is of order N−1/d. Nevertheless in order to secure the asymptotic normality, the absolute bias when multiplied by N1/2 should tend to zero, which is only true for d = 1 as in Brockwell and Davis [9].
Regarding the proposed estimators of this paper, their bias will stem from (3.3) as in the expected value of , n = 1, …, q, which expresses what is “missing” from the sample. Even when the bias is multiplied by N*1/2, the random variable , n = 1, …, q has a zero mean due to the selection 𝒮* and the fact that the “finite” autocovariance function of a moving-average process is being used. Thus, the edge-effect will not be an obstacle to establish the asymptotic normality of the estimators.
A series of ARMA model-based arguments have been used before to deal with the edge-effect by Yao and Brockwell [6] for two-dimensional processes as well as a weak condition of finite second moments. Guyon [2], on the other hand, used the periodogram as in the Gaussian likelihood of observations proposed by Whittle [1] and required that the fourth moments would be finite. Dahlhaus and Künsch [10] improved certain weaknesses of Guyon′s [2] modification on the likelihood but paid the price of having to reduce the dimensionality to secure their results. The proposed estimators of this paper will also use the weak condition of finite second moments, which is a great advantage, since they will refer to moving-average processes on any number of dimensions.
Condition 4. (i) For a set 𝒮 ≡ 𝒮N ⊂ ℤd of cardinality N, it is written N → ∞ if the length M of the minimal hypercube including 𝒮, say 𝒮⊆𝒞M, and the length m of the maximal hypercube included in 𝒮, say 𝒞m⊆𝒮, are such that M, m → ∞. (ii) Further, as M, m → ∞ it holds that M/m is bounded away from ∞.
Theorem 4.1. If {ε(v)} ~ IID(0, σ2), then under Conditions C1 and C2(i) and as N → ∞, it holds that
Proposition 4.2. Let the polynomial . If {ε(v)} ~ IID(0, σ2), then under Conditions C1 and C2(i) and as N → ∞, it holds that
Theorem 4.3. For {W(v)} ~ IID(0,1), the auto-regression {η(v)} is defined by θ0(B)η(v) ≡ W(v). Also the vector and the variance matrix
5. Empirical Results
In this section, an empirical comparison of the proposed estimators to maximum Gaussian likelihood estimators for moving-average processes is presented in detail. The theoretical foundations in favour of the new estimators have been provided already when their asymptotic normality on any number of dimensions has been established based on finite second moments only. As a result, the speed of the bias to zero is not expected to cause them to perform worse than maximum likelihood estimators, especially when the dimensionality is large.
On the other hand, Theorem 4.3 attributes to the new estimators the variance matrix when, according to Hannan [11], efficient estimation results in with Wq ≡ Var (ξ) and the same notation as in Theorem 4.3. Wq and are defined similarly but they are not, in general, equal. It seems that as the number of moving-average parameters of the process increases, the two types elements get closer. Nevertheless, as the pure moving-average model will be preferred when it is parsimonious, a decision is made here whether the new estimators are efficient then.
The investigation has started with the one-dimensional case by generating {Y(1), …, Y(N)} from the model with one parameter only. The moments estimator has been approximated as in the example earlier, while the minimizer of with respect to the parameter θ has been considered to be the Gaussian likelihood estimator. The true values θ = 0.5, −0.8 have been considered and the sample size has been set equal to N = 30, 100, 200, 300. Very encouraging results for the efficiency of the proposed estimator are presented in Table 1 as even when the sample size is still small, extreme differences in the variances of the two types estimators cannot be detected. It is the bias of the moments estimator that seems to be the only reason why it might be outperformed by the likelihood estimator in terms of the Mean Squared Error. Nevertheless, the speed with which the bias tends to zero is much faster as one would expect and, eventually, the new estimator for θ = 0.5 performs better altogether.
θ = 0.5 | θ = −0.8 | |||||
---|---|---|---|---|---|---|
N = 30 | Bias | Var | MSE | Bias | Var | MSE |
MME | −0.403345 | 0.00020972 | 0.162897 | 0.0931052 | 0.00097912 | 0.00964771 |
MLE | 0.01795 | 0.00015984 | 0.00048204 | 0.020949 | 0.00062347 | 0.00104331 |
N = 100 | Bias | Var | MSE | Bias | Var | MSE |
MME | −0.123812 | 0.00021444 | 0.0155439 | 0.0217928 | 0.00097912 | 0.00145404 |
MLE | 0.00521 | 0.00031329 | 0.00034043 | 0.00599 | 0.00063936 | 0.00067524 |
N = 200 | Bias | Var | MSE | Bias | Var | MSE |
MME | −0.0422653 | 0.00021173 | 0.00199808 | 0.0123275 | 0.00097912 | 0.00113109 |
MLE | −0.00116 | 0.00022068 | 0.00022202 | 0.00275 | 0.00062348 | 0.00063104 |
N = 300 | Bias | Var | MSE | Bias | Var | MSE |
MME | −0.00708814 | 0.00023768 | 0.00028792 | −0.00166597 | 0.00097912 | 0.00098189 |
MLE | 0.00313 | 0.00029131 | 0.00036115 | 0.00386 | 0.00065544 | 0.00067034 |
Table 2 verifies once more the conclusions drawn from the one-dimensional case. It is safe to consider that the new estimators are efficient and this is very apparent in the case that the parameters are in absolute value further away from zero (β = 0.5 and γ = 0.45). The striking case with the small sample size of around 50 points observed in the plane reveals that the variances of the moments estimators may even happen to be smaller. On the other hand, the bias heavily affects the results for the MSE for small sample sizes. Nevertheless as the sample size increases, the absolute bias of the likelihood estimators does not seem to decrease at all versus the bias of the proposed estimators that speedily reaches the zero value. As a result, the new estimators eventually equalize the MSE performance of the standard estimators.
β = 0.5 | γ = 0.45 | |||||
---|---|---|---|---|---|---|
n = 7 | Bias | Var | MSE | Bias | Var | MSE |
MME | 0.02842 | 0.00010879 | 0.00091649 | −0.07858 | 0.00001209 | 0.0061869 |
MLE | 0.0006 | 0.0002023 | 0.00020266 | −0.00604 | 0.00006244 | 0.00009892 |
n = 30 | Bias | Var | MSE | Bias | Var | MSE |
MME | 0.0149 | 0.00037173 | 0.00059374 | −0.00746 | 0.0002023 | 0.00025795 |
MLE | −0.00184 | 0.0003022 | 0.00030558 | 0.00066 | 0.00016793 | 0.00016837 |
n = 100 | Bias | Var | MSE | Bias | Var | MSE |
MME | 0.00576 | 0.00025984 | 0.00029302 | 0.0065 | 0.00023986 | 0.00028211 |
MLE | −0.00044 | 0.00022068 | 0.00022087 | −0.00024 | 0.0002023 | 0.00020236 |
β = 0.15 | γ = −0.3 | |||||
n = 30 | Bias | Var | MSE | Bias | Var | MSE |
MME | 0.02638 | 0.00004406 | 0.00073996 | 0.03274 | 0.00001688 | 0.00108879 |
MLE | 0.00012 | 0.00002248 | 0.00002249 | 0.00004 | 0.00010879 | 0.00010879 |
n = 100 | Bias | Var | MSE | Bias | Var | MSE |
MME | −0.0422653 | 0.00021173 | 0.00199808 | 0.0123275 | 0.00097912 | 0.00113109 |
MLE | −0.00058 | 0.00002248 | 0.00002281 | 0.00026 | 0.00000402 | 0.00008408 |
Acknowledgments
Some results of this paper are part of the Ph.D. thesis titled “Statistical inference for spatial and spatiotemporal processes,” which has been submitted to the University of London. The author would like to thank the Leverhulme Trust, which funded her studies and Professor Q. Yao.