Volume 44, Issue 4 pp. 374-392
Original Article
Open Access

Regime switching models for circular and linear time series

Andrew Harvey

Andrew Harvey

Faculty of Economics, University of Cambridge, Cambridge, UK

Search for more papers by this author
Dario Palumbo

Corresponding Author

Dario Palumbo

Department of Economics, Ca' Foscari University of Venice, Venice, Italy

Homerton College, University of Cambridge, Cambridge, UK

Correspondence to: Department of Economics, Ca' Foscari University of Venice, Fondamenta San Giobbe, Cannaregio 873, Venice 30121, Italy.

Email: [email protected]

Search for more papers by this author
First published: 16 January 2023

Abstract

The score-driven approach to time series modelling is able to handle circular data and switching regimes with intra-regime dynamics. Furthermore it enables a dynamic model to be fitted to a linear and a circular variable when their joint distribution is a cylinder. The viability of the new method is illustrated by estimating models for hourly data on wind direction and speed in Galicia, north-west Spain. The modelling of intra-regime dynamics is shown to be of critical importance.

1 INTRODUCTION

Many areas of environmental statistics involve applications where circular data are collected and statistically analysed. For example, modelling and forecasting wind direction is relevant for tracking pollution and wildfires. Harvey et al. (2022) show how a score-driven approachprovides a solution to time series modelling of circular data. The present article extends this approach to handle data from time series where there is switching between different regimes and shows how dynamic bivariate models for speed and direction can be constructed.

The viability and effectiveness of the new methods is illustrated with data on wind direction and speed at a wind farm site in Galicia, North-West Spain. The observations were taken every minute over the month of January 2004. These data are used by García-Portugués et al. (2013) in a study of pollution. Figure 1 shows the time series of observations, measured in radians from 0 to 2 π , obtained by taking the last observation in each hour. Zero radians corresponds to due east with the subsequent coding being anti-clockwise so π / 2 is north, π is west and 3 π / 2 is south. Because of circularity some of the measurements in the North East (NE) orbit appear at the top, close to 2 π , rather than near the bottom. Serious distortions can arise if circularity is not taken into account and standard linear time series methods are used.

Details are in the caption following the image
Hourly wind direction and speed in January 2004 at a site in Galicia

García-Portugués et al. (2013) note that the prevailing wind comes from two directions: SW and NE. The two dominant directions are apparent in Figure 1 with NE being a little less than one radian and SW around four. Hence a regime switching model may be appropriate. Holtzmann et al. (2006) propose a switching regime model for wind direction and the same approach is used by Zucchini et al. (2016, pp. 228–35), for modeling the change in direction of flight for fruit flies. The basic formulation assumes a finite number of regimes and introduces dynamics by a Markov chain in which there is a fixed probability of staying in the current regime or moving to another; see Hamilton (1989, 1994). The regime is not observed: hence the term hidden Markov model (HMM) as in Zucchini et al. (2016). The probability of being in a particular regime is given by a filter that depends on past observations. These probabilities yield a conditional mixture distribution for the current observation and hence a likelihood function. Catania (2021) proposes a different approach that by-passes the hidden Markov chain and instead sets up filters for the regime probabilities in the conditional distribution directly by using their scores. He calls these dynamic adaptive mixture models (DAMMs). The score-driven approach leads naturally to a solution of how to model intra-regime dynamics. The key point is that the forcing variables in dynamic equations are a function of past observations weighted according to the probability that they are in a given regime. Intra-day dynamics have not been modelled in this way in standard HMMs. Yet in many environmental applications there are marked intra-regime dynamics. The evidence presented here shows that neglecting these dynamics can be a major drawback.

Figure 1 also shows wind speed. Wind speed is a (non-negative) linear variable and a joint model distribution of a linear and a circular variable takes the form of a cylinder. Abe and Ley (2017) proposes a general distribution for cylindrical data. A key feature of their distribution is that the circular concentration is allowed to increase with the linear component, a phenomenon first identified in the seminal article by Fisher and Lee (1992, p. 666). The challenge is to move from the static to the dynamic case. We show here that a score-driven approach facilitates the construction of a model that allows the location of the circular variable and the level (scale) of the linear variable to change over time. The concentration may also change.

The bivariate score-driven model can be incorporated in a regime switching model. Switching bivariate models have been used by Lagona et al. (2015) for modeling the joint distribution of wave height and direction in the Adriatic; they employ an HMM for fitting a cylindrical distribution when there are three distinct regimes. Other applications involving speed and direction include tracking and forecasting the movements of animals, boats and wildfires, as in García-Portugués et al. (2014).

Section 2 reviews the score-driven model for circular data proposed by Harvey et al. (2022) and fits it to wind direction in Galicia. The theory underlying regime switching models is described in section 3 and it is shown how the score-driven approach set out in Section 2 is able to model dynamic location and concentration within regimes. The methods are applied to the Galicia data on wind direction in Section 4. Dynamic cylindrical distributions are described in Section 5 and extended to handle switching regimes. These time series models are fitted to the Galicia data on direction and speed in Section 6. Section 7 concludes.

2 SCORE-DRIVEN MODELS FOR CIRCULAR DATA

Circular observations measured in radians are usually taken to have a von Mises (vM) distribution with probability density function (PDF)
f ( y ; μ , υ ) = 1 2 π I 0 ( υ ) exp { υ cos ( y μ ) } , π y , μ < π , υ 0 , (1)
where I k ( υ ) denotes a modified Bessel function of order k , μ is mean direction and υ is a non-negative concentration parameter that is inversely related to dispersion. When υ = 0 the distribution is uniform whereas y is approximately N ( μ , 1 / υ ) for large υ . The maximum likelihood (ML) estimator of location, μ , is the sample mean direction, y d ; see Mardia and Jupp (2000). A class of general circular distributions, in which the cardioid and wrapped Cauchy are special cases, is described in Jones and Pewsey (2005).

Circular time series models currently in use are almost all based on the autoregressive-moving average (ARMA) type models proposed in Fisher and Lee (1994) or are regime switching models; see Pewsey and García-Portugués (2021, section 10.1). Score-driven models offer considerable advantages and allow regime switching to be combined with intra-regime dynamics.

2.1 Dynamic Location

Data generated by a time series model over the real line, that is < z t < , can be converted into wrapped circular time series observations in the range [ π , π ) by letting
y t = z t mod ( 2 π ) π , t = 1 , , T ; (2)
see Breckling (1989) and Fisher and Lee (1994). The score-driven model for circular data is
z t = μ t | t 1 + ε t , π ε t < π , t = 1 , . , T , (3)
where the ε t s are independent and identically distributed (IID) random variables from a circular distribution with location zero and μ t | t 1 is a filter for location at time t based on information available at time t 1 . The basic stationary first-order filter is
μ t + 1 | t = ( 1 ϕ ) ω + ϕ μ t | t 1 + κ u t , ϕ < 1 , t = 1 , , T , (4)
where μ 1 | 0 = ω is the unconditional location of μ t | t 1 and the forcing variable, u t , is defined as being (proportional to) the conditional score for location, that is ln f ( y t ; μ t | t 1 , υ ) / μ t | t 1 , t = 1 , , T .

A defining property of a (continuous) circular distribution is that it satisfies the periodicity condition f ( z ± 2 π k ; ψ ) = f ( z ; ψ ) , where k is an integer and ψ denotes parameters. Provided the derivatives of the log-density with respect to the elements of ψ are continuous, they too are circular in that the periodicity condition is satisfied. Thus the path of μ t | t 1 is the same irrespective of whether u t is defined in terms of z t or y t . The conditional distribution of y t in a model defined by (3), (4) and (2) is therefore the same as that of z t and so the likelihood function of the wrapped observations, the y t s , is the same as that of the unobserved variables, the z t s .

In the case of the von Mises distribution, that is ε t v M ( 0 , υ ) in (3), the score is
u t = υ sin ( z t μ t | t 1 ) = υ sin ( y t μ t | t 1 ) , u t I I D ( 0 , A ( υ ) / υ ) . (5)
The general continuous circular distribution of Jones and Pewsey (2005) has a score that, like (5), is invariant to wrapping as well as being IID.

Harvey et al. (2022) provide further details and derive the asymptotic distribution of the ML estimators of ϕ , κ and ω in (4) for the stationary vM model.

2.2 Tests

The Lagrange multiplier, or score, test against serial correlation in location is based on the portmanteau or Box–Ljung statistic constructed from the autocorrelations of the scores; see Harvey (2013, section 2.5). For a vM distribution with υ > 0 , the scores under the null hypothesis of constant location are proportional to sin ( y t y d ) . Hence the sample autocorrelations correspond to the circular autocorrelations (CACFs) in Jammalamadaka and SenGupta (2001, pp. 176–9).

2.3 Heteroscedasticity

Score-driven models can be extended to allow for dynamic heteroscedasticity by setting up a filter for the conditional concentration. Thus ε t in (3) is distributed as v M ( 0 , υ t | t 1 ) with the dynamics dependent on the score with respect to υ t | t 1 , that is
u t υ = cos ( y t μ t | t 1 ) A ( υ t | t 1 ) .
The scores are a martingale difference sequence with mean zero and variance 1 A ( υ t | t 1 ) 2 A ( υ t | t 1 ) / υ t | t 1 . An exponential link function can be used to ensure the concentration remains positive. Thus υ t | t 1 = exp ( ζ t | t 1 ) . The first-order dynamic model for ζ t | t 1 is then
ζ t + 1 | t = ( 1 ϕ ζ ) ω ζ + ϕ ζ t | t 1 + κ u t ζ , t = 1 , , T , (6)
where u t ζ = υ t | t 1 u t υ and ζ 1 | 0 = ω ζ . The likelihood function is
ln L ( ψ ) = T ln ( 2 π I 0 ( υ t | t 1 ) ) + t = 1 T υ t | t 1 cos ( y t μ t | t 1 ) ,
where ψ denotes the parameters in the dynamic equations for υ t | t 1 as well as μ t | t 1 . The forcing variable for location is now u t = υ t | t 1 sin ( y t μ t | t 1 ) .

2.4 Application to Circular Data from Galicia

When a basic first-order dynamic model, (4) is fitted to the Galicia data, the result is ϕ ˜ = 1 . 0 , κ ˜ = 0 . 19 and υ ˜ = exp ( ω ˜ ζ ) = 4 . 78 . The maximized log-likelihood is ln L = 510 . 62 . The residual CACF shows there is considerable serial correlation remaining and the fit, as measured by dispersion (circular variance), is no better than that of a random walk in which μ t | t 1 = y t 1 ; see Mardia and Jupp (2000, pp. 18–19 and 30). Adding heteroscedasticity gives a better fit, but although this reduces the serial correlation in dispersion, the serial correlation in location increases.

3 SWITCHING REGIMES

The plot of wind direction in Galicia provides a clear motivation for a switching model. The first subsection below gives a short description of the static mixture model before moving on to review the dynamic adaptive mixture model (DAMM) of Catania (2021). The third subsection then shows how the score-driven approach can handle intra-regime dynamics and in the subsection following the associated diagnostic tests are set out. The last subsection discusses circular observations.

3.1 Static Mixture Model

The PDF of a mixture of K distributions is
f ( y t ; ψ ) = i = 1 K ξ i f i ( y t ; ψ i ) , i = 1 K ξ i = 1 , t = 1 , , T ,
where 0 ξ i 1 , i = 1 , 2 , , K , with ξ i , denoting the probability of being in i th regime; the parameters in the i th regime are contained in the vector ψ i and ψ includes all these parameters, together with ξ i , i = 1 , 2 , , K 1 . When the observations are independent, the probability of being in regime i given observation y t is
ξ i ( y t ) = ξ i f i ( y t ; ψ i ) / f ( y t ; ψ ) , i = 1 , 2 , . K . (7)

The maximum likelihood (ML) estimates for the parameters in each of the ψ i s , together with the unconditional probabilities, ξ i , i = 1 , , K , can be computed iteratively by what turns out to be a special case of the EM algorithm; see Hamilton (1994, pp. 688–9), or Zucchini et al. (2016).

3.2 Dynamic Mixture Model

In the dynamic mixture model, the PDF of y t , conditional on information at time t 1 , is
f t | t 1 ( y t ; ψ ) = i = 1 K ξ i , t | t 1 f i , t | t 1 ( y t ; ψ i ) , t = 1 , , T , (8)
where 0 ξ i , t | t 1 1 , i = 1 , 2 , , K , and i = 1 K ξ i , t | t 1 = 1 . The probabilities, ξ i , t | t 1 , are given by filters constructed from past observations. There are two ways in which this may be done. The first is implicitly, with the filters derived as a consequence of setting up an HMM for the unobserved probabilities, ξ i , t ; see Hamilton (1994, pp. 690–3). The second is by explicitly writing down a filter as in DAMM. Here we concentrate on the DAMM option because it is relatively simple and transparent. It also embodies a score-driven approach and as such it leads naturally to the formulation of dynamic nonlinear models for time-varying parameters within each regime.
The DAMM can be described most easily by restricting attention to the case where there are only two regimes. For simplicity of notation, the parameter vector ψ (which now includes parameters associated with the switching filters) will be dropped from f t | t 1 ( y t ; ψ ) and similarly for the regime conditional distributions, f i , t | t 1 ( y t ; ψ i ) , i = 1 , , K . The conditional distribution can now be written
f t | t 1 ( y t ) = ξ t | t 1 f 1 , t | t 1 ( y t ) + ( 1 ξ t | t 1 ) f 2 , t | t 1 ( y t ) , t = 1 , , T , (9)
where ξ t | t 1 = ξ 1 , t | t 1 is the probability of being in the first regime, implying that 1 ξ t | t 1 = ξ 2 , t | t 1 is the probability of being in the second. The probability, ξ t | t 1 , can be confined to the range 0 < ξ t | t 1 < 1 by a logistic link function, ξ t | t 1 = exp γ t | t 1 / ( 1 + exp γ t | t 1 ) , < γ t | t 1 < . The dynamics of γ t | t 1 are then driven by the score with respect to γ t | t 1 which is
w t = ln f t | t 1 γ t | t 1 = ln f t | t 1 ξ t | t 1 ξ t | t 1 γ t | t 1 = f 1 , t | t 1 f 2 , t | t 1 f t | t 1 exp γ t | t 1 ( 1 + exp γ t | t 1 ) 2 ,
or equivalently
w t = f 1 , t | t 1 f 2 , t | t 1 f t | t 1 ξ t | t 1 ( 1 ξ t | t 1 ) . (10)
The basic first-order dynamic equation is
γ t + 1 | t = ( 1 ϕ γ ) ω γ + ϕ γ γ t | t 1 + κ γ w t , t = 1 , , T , (11)
where γ 1 | 0 is set to the unconditional mean, ω γ , and the condition ϕ γ < 1 is all that is required to ensure that γ t + 1 | t , and hence, ξ t + 1 | t , is stationary. No restrictions are imposed on κ γ . As κ γ there will be an abrupt change in regime when w t changes sign, whereas when κ γ is close to zero any change will be gradual.

3.3 Dynamics Within Regimes

When the location parameter within the i th regime is time-varying, its dynamics can be captured by a filter that depends on the score. The PDFs in (8) now depend on μ i , t | t 1 so f i , t | t 1 ( y t ; ψ i ) becomes f i , t | t 1 ( y t ; μ i , t | t 1 , ψ i ) and ψ i is redefined accordingly. Differentiating the logarithm of (8) gives
ln f t | t 1 μ i , t | t 1 = ln f t | t 1 f t | t 1 f t | t 1 f i , t | t 1 f i , t | t 1 ln f i , t | t 1 ln f i , t | t 1 μ i , t | t 1 = ξ i , t | t ( y t ) ln f i , t | t 1 μ i , t | t 1 = u i , t , i = 1 , , K , (12)
where, following on from (7),
ξ i , t | t = ξ i , t | t 1 f i , t | t 1 / f t | t 1 , i = 1 , 2 , , K , (13)
is the estimated probability of being in the i th regime given y t and ξ i , t | t 1 . The smaller is ξ i , t | t , the more the contribution of y t to the corresponding score is downweighted. These scores drive dynamic filters such as
μ i , t + 1 | t = ( 1 ϕ i ) ω i + ϕ i μ i , t | t 1 + κ i u i t , ϕ i < 1 , i = 1 , , K , (14)
with μ i , 1 | 0 = ω i , i = 1 , , K .
ML estimates are obtained by maximizing the log-likelihood function
ln L ( ψ ) = t = 1 T ln f t | t 1 = t = 1 T ln i = 1 K ξ i , t | t 1 f i , t | t 1 , (15)
where ψ now includes ω i , ϕ i , κ i , i = 1 , , K , as well as the corresponding parameters for any dynamic dispersion models, the parameters in (11) and any constant shape parameters.

Remark 1.In the classic Markov switching model, dynamics are introduced into the location and/or scale of each regime by letting them depend directly on past observations. For example the conditional mean in the i th regime is often given by μ i + ϕ i ( y t 1 μ i ) , t = 1 , , T , i = 1 , 2 ; see Hamilton (1994, p. 691). By contrast, the score-driven approach leads to a filters driven by a forcing variable that is weighted by the probability of being in the i th regime. This defining feature of the score-driven model also distinguishes it from the treatment of switching conditional heteroscedasticity in the financial literature; see, for example, Haas et al. (2004) and the discussion in Catania (2021, sections 2 and 3).

3.4 Model Selection and Diagnostics

Following on from the work of Hamilton (1996), Smith (2008) finds LM tests to have the best size and power properties for Markov switching models. The structure of the DAMM is such that LM diagnostic tests are easily formulated. When a static mixture model has been fitted, evidence for serial correlation in the regime dynamics and the intra-regime dynamics is separated out. Formal tests against dynamics can be constructed and the pattern of serial correlation displayed in correlograms. As shown by the application in Section 4.1, this can be of great benefit for model specification. When dynamics have been estimated, diagnostics designed to assess the possibility of omitted dynamic effects can be formulated using the same principles.

Under the null hypothesis that the model is a static mixture with no dynamics, LM tests against dynamics in regime switching and in the parameters within each regime may be constructed. In a two state model, the test is against dynamic switching of the form γ t | t 1 = ω γ + κ 1 w t 1 + + κ P w t P , where w t is as in (10). When the model is static w t = ξ ( y t ) ξ , t = 1 , , T , because, from (7), f 1 ( y t ) = ξ ( y t ) f ( y t ) / ξ , where, as before, the subscript is dropped from ξ 1 . Thus, following Harvey (2013, section 2.5), the LM test of the null hypothesis that the model is static, that is H 0 : κ 1 = = κ P = 0 against H 1 : κ i 0 for some i = 1 , , P , is equivalent to a portmanteau Q -test based on the correlogram of the estimated probabilities, ξ ( y t ) , t = 1 , , T . The critical values are taken from a χ P 2 distribution.

An LM test against level dynamics in the i th regime can be based on the scores in expression (12) with ξ i , t | t ( y t ) and μ i , t | t 1 fixed under the null hypothesis at ξ i ( y t ) and μ i respectively. When the test is against dynamics in the i th location only, and μ j , j i is fixed, the LM statistic is equivalent to the Q -statistic formed from sample autocorrelations, r i ( τ ) = c i ( τ ) / c i ( 0 ) , where c i ( τ ) = t = τ + 1 T u i , t u i , t τ / T , i = 1 , , K , τ = 1 , , P , that is
Q i ( P ) = T τ = 1 P r i 2 ( τ ) , i = 1 , 2 , (16)
As noted by Harvey and Thiele (2016, pp. 578–9), estimating fixed parameters makes no difference to the distribution of Q i ( P ) , which is asymptotically distributed as χ P 2 under the null hypothesis.

When dynamic models have been fitted to the regime probabilities, as in the basic DAMM, LM test statistics for location dynamics in individual regimes can be constructed. Similarly, when dynamics have been estimated within regimes, LM tests for omitted dynamics can be set up. Tests for heteroscedasticity can be similarly formed; see, for example, Calvori et al. (2017). However, if the effect of fitting dynamics to the ξ i s and to location and/or scale, is ignored, simple Q i ( P ) tests may be used to give an indication of serial correlation in each regime. Harvey and Thiele (2016) show that this can often be a good strategy and it is the one we adopt here.

3.5 Circular Mixture Models

Static and dynamic circular mixture models can be estimated as outlined in Sections 3.13.3. The variable w t driving the switching equation, (11), is obviously circular so invariance to wrapping is retained. The scores driving the intra-regime dynamics, the u i t s of (12), are as in (5). The filtered location is given by the mean direction of the filtered locations in the individual regimes, computed as
μ t | t 1 = a tan 2 i = 1 K ξ i , t | t 1 sin μ i , t | t 1 , i = 1 K ξ i , t | t 1 cos μ i , t | t 1 , i = 1 , 2 , , (17)
rather than by a simple average, i = 1 K ξ i , t | t 1 μ i , t | t 1 , t = 1 , , T   . Multi-step forecasts can be computed from recursions for the γ i , t | t 1 s and μ i , t | t 1 s and the distributions of future observations can be simulated.

A portmanteau test, (16), against dynamics in the level of the i th regime is based on the (circular) correlogram of the residuals ξ i ( y t ) sin ( y t μ ˜ i ) , i = 1 , , K . Note that although the score is υ i ξ i ( y t ) sin ( y t μ ˜ i ) , the concentration parameter, υ i , cancels out.

4 SWITCHING MODELS FOR GALICIA

Tables I and II show ML estimates for switching models applied to the Galicia data. The numerical maximization was carried out in Matlab using an interior-point method that follows a barrier approach to solve the subproblems occurring in each Newton–Raphson iteration; see, for example, Waltz et al. (2006). Estimates of the asymptotic standard errors, obtained from the numerical Hessian, are shown in brackets. The parameter ω ζ is the logarithm of the concentration, υ . When heteroscedastic models are fitted, it is ω ζ in (6).

Table I. Goodness of fit of the two regime vM model with the estimated dynamic parameters of the switching probability from fitting: (1) the static mixture model, (2) the pure DAMM model, (3) the DAMM model with intra regime dynamic location and (4) the DAMM model with intra regime dynamic location and concentration
Model AIC BIC Logl ω γ ϕ γ κ γ ξ
(1) 1,643.580 1,666.640 816 . 790 0.680 0.664
(0.087)
(2) 980.255 1,012.539 483 . 127 2.194 0.959 4.766 0.900
(0.808) (0.012) (0.631)
(3) 490.554 541.286 234 . 277 1.192 0.959 5.774 0.767
(0.838) (0.013) (0.690)
(4) 459.400 528.581 214 . 700 1.724 0.944 5.360 0.849
(0.840) (0.021) (0.895)
  • Note: The switching parameters are shown in the last four columns, with ξ denoting the logistic transformation of ω γ .
Table II. Estimated dynamic parameters for the two regime vM model from fitting: (1) the static mixture model, (2) the pure DAMM model, (3) the DAMM model with intra regime dynamic location and (4) the DAMM model with intra regime dynamic location and concentration
Model ω μ 1 ϕ μ 1 κ μ 1 ω ζ 1 ϕ ζ 1 κ ζ 1 ω μ 2 ϕ μ 2 κ μ 2 ω ζ 2 ϕ ζ 2 κ ζ 2
(1) 4.051 2.540 1.055 0.735
(0.014) (0.101) (0.056) (0.130)
(2) 4.041 2.437 1.032 0.848
(0.014) (0.066) (0.049) (0.090)
(3) 4.136 0.914 0.013 3.318 3.165 0.989 0.257 1.243
(0.040) (0.026) (0.002) (0.067) (0.403) (0.005) (0.026) (0.073)
(4) 4.107 0.917 0.015 3.060 0.856 0.255 2.305 0.983 0.173 1.426 0.309 0.774
(0.038) (0.023) (0.002) (0.104) (0.069) (0.068) (0.156) (0.004) (0.023) (0.125) (0.068) (0.143)

Diagnostic test statistics for assessing residual serial correlation in different components are in Table III. We originally calculated Q -statistics for P = 1 , 5 and 20 but the message in all three is essentially the same, so only P = 5 is given here. When dynamics are fitted, the tests should not be treated formally because, as indicated earlier, the distribution is no longer χ P 2 under the null hypothesis. Furthermore, because the sample size is large, with T = 744 , Q -statistics based on relatively small sample autocorrelations can still be big. Their main value is to convey a strong message about which models are most effective.

Table III. Box–Ljung test with P = 5 , Q ( 5 ) , for residual correlation on the fitted scores from fitting: (1) the static mixture model, (2) the pure DAMM model, (3) the DAMM model with intra regime dynamic location and (4) the DAMM model with intra regime dynamic location and concentration
Model μ 1 ζ 1 μ 2 ζ 2 γ
(1) 726 . 15 2 184 . 37 0 794 . 44 7 371 . 26 5 2392 . 1 5
(2) 852 . 97 2 302 . 54 7 807 . 59 1 226 . 34 4 13 . 58 8
(3) 4.176 36 . 53 6 10 . 93 1 71 . 25 7 43 . 85 1
(4) 7.776 19 . 36 9 27 . 26 6 25 . 94 1 26 . 29 1
  • Note: (Nominal) significance levels: 0.01 ‘***’, 0.05 ‘**’, 0.1 ‘*’.

4.1 Static Mixture

Although the static mixture model can be ruled out from the correlogram of the raw data, it is nevertheless informative about the regimes. Numerically optimizing the log-likelihood function gave the values in the first row of Table II, that is μ ˜ 1 = 4 . 051 ( 0 . 014 ) , μ ˜ 2 = 1 . 055 ( 0 . 056 ) , υ ˜ 1 = 12 . 68 , υ ˜ 2 = 2 . 08 and ξ ˜ = 0 . 66 ; note that υ = ln ω ζ . The log-likelihood, in Table I, is 816 . 8 , which is far lower than we obtained for the single regime dynamic model reported in Section 2.4, and the Q -statistics are huge.

The plot of ξ ( y t ) in Figure 2 shows how the contrast between the distributions in the two regimes gives a clear indication of which regime is operative at any one time. The regimes are obviously not determined randomly and the ACF of the ξ i ( y t ) s indicates that a fairly persistent first-order filter, as in (11), is likely to give a good fit. The circular ACFs (CACFs) for the individual regimes in the lower panels indicate persistent dynamics in the location. The correlations between the three scores are not far from zero.

Details are in the caption following the image
Regime probabilities and ACFs for a static mixture model of wind direction in Galicia

4.2 Dynamic Mixtures

Although the tests indicate dynamics within each regime, it is useful to begin by fitting a pure DAMM, that is one without intra-regime dynamics but with a dynamic equation for γ t | t 1 as in (11). The estimates of ω μ 1 , ω μ 2 , ω ζ 1 and ω ζ 1 are similar to the estimates found for the static mixture model except that the estimate of ξ is, at 0.90, somewhat higher than the static estimate of 0.66. The log-likelihood is 483 . 1 so there is a clear improvement over the static mixture model.

The diagnostic test based on the switching residuals, shown in the column headed γ in Table III, indicates that most of the dynamic movements have been captured by the regime-switching equation. However, the Q -statistics for location dynamics remain very high in both regimes; indeed the correlograms are not dissimilar to those in Figure 2. Both are consistent with a first-order dynamic model, (14), for each regime.

Fitting an HMM gives a result very close to that of the DAMM. The log-likelihood is slightly lower, at 490 . 5 , but the AIC and BIC are slightly bit smaller, reflecting the fact that the HMM has one fewer parameter. The finding that the pure switching model is inadequate is important because most, if not all, of the research in this area has been restricted to pure Markov switching models. Indeed only three pages are devoted to intra-regime dynamics in the book by Zucchini et al. (2016, pp. 150–2).

4.3 Dynamic Mixture Model with Intra-regime Dynamics

The first-order score-driven models for the location in each regime are as in (14) with
u i , t = ξ i , t | t υ i , t | t 1 sin ( y t μ i , t | t 1 ) , i = 1 , 2 ,
where υ i , t | t 1 is the filtered concentration when the model takes account of heteroscedasticity. The results, for the models labelled (3) and (4), show the dynamics to be fairly persistent in both regimes, with ϕ 1 and ϕ 2 of (14) always above 0.9. The mean of location in the second regime has increased to 3.165 but if it is constrained to its value for the static model, that is 1.05, the log-likelihood is much lower at 279 . 3 as opposed to 234 . 3 for the unconstrained model. It seems that the parameters in the second regime are more difficult to estimate than those in the first, perhaps because the concentration is lower. Nevertheless there is a huge increase in the likelihood as compared with the pure DAMM.

The diagnostics show that serial correlation in location has been eliminated. However, the scores for concentration indicate dynamics. When, in the last line, the model is extended to allow for heteroscedasticity, there is a further improvement in goodness of fit and the level in regime 2 falls to 2.31. On the other hand the underlying probability of being in regime 1, that is ξ , rises from 0.767 to 0.849.

Finally the likelihoods for the two switching models with intra-regime dynamics are, as expected, much bigger than those of the corresponding single regime models and, despite the extra parameters, the AICs and BICs are much smaller.

5 MODELING THE CYLINDER

A bivariate distribution for a circular and a linear variable takes the form of a cylinder. This section shows how a dynamic model can be constructed. The last subsection makes the extension to a bivariate regime switching model.

5.1 Weibull–von Mises (Abe-Ley) Distribution

The Weibull–Sine Skewed–von Mises (WeiSSVM) proposedby Abe and Ley (2017) combines a von Mises circular distribution for a circular variable, y , with a Weibull distribution for a non-negative linear variable, x . The skewing term will be dropped here to simplify the exposition, giving the Weibull–von Mises (WeiVM) distribution
f ( y , x ) = α exp ( α λ ) 2 π cosh υ x α 1 exp { ( x / exp λ ) α ( 1 tanh υ . cos ( y μ ) ) } , π y , μ < π , x 0 , α > 0 , υ 0 ,
where exp ( λ ) is the scale, φ , for the linear variable and υ is a parameter that determines concentration for y .

5.2 Dynamic Model

In the dynamic score-driven cylinder model the parameters μ and λ change over time and the logarithm of the joint density of the conditional WeiVM distribution is
ln f ( y t , x t ; μ t | t 1 , λ t | t 1 , ψ ) = ln ( α / 2 π ) ln cosh υ α λ t | t 1 + ( α 1 ) ln x t ( x t / exp λ t | t 1 ) α ( 1 tanh υ cos ( y t μ t | t 1 ) ) , t = 1 , , T , (18)
with π y t < π , but with no corresponding restriction on μ t | t 1 ; ψ denotes the parameters υ and α and those in the dynamic equations. The conditional scores are
ln f t | t 1 μ t | t 1 = u t μ = tanh ( υ ) ( x t / exp λ t | t 1 ) α sin ( y t μ t | t 1 ) (19)
and
ln f t | t 1 λ t | t 1 = u t λ = α ( x t / exp λ t | t 1 ) α ( 1 tanh ( υ ) cos ( y t μ t | t 1 ) ) α . (20)
The filters for μ t | t 1 and λ t | t 1 are driven by u t μ and u t λ and so for first-order dynamics
μ t + 1 | t = ( 1 ϕ μ ) ω μ + ϕ μ μ t | t 1 + κ μ u t μ , λ t + 1 | t = ( 1 ϕ λ ) ω λ + ϕ λ λ t | t 1 + κ λ u t λ . (21)
with μ 1 | 0 = ω μ and λ 1 | 0 = ω λ .

Both u t μ and u t λ retain the univariate circularity property of being unchanged when multiples of 2 π are added or subtracted from y t . The circularity of the scores confirms that when a dynamic score-driven model for a WeiVM distribution, f ( z t , x t ) , allows z t to range over the whole real line, it may be wrapped, as in (2), to give a likelihood function, based on (18), that is the same as that of the (infeasible) likelihood function for f ( z t , x t ) .

Basing the dynamics for μ t | t 1 and λ t | t 1 on scores means that their movements interact with each other in a way that makes sense given the structure of the WeiVM bivariate distribution. It follows from Abe and Ley (2017) that the distribution of y t conditional on x t , together with all the information at time t 1 , is v M with mean μ t | t 1 and concentration
υ ( x t ) = ( tanh υ ) . ( x t / exp λ t | t 1 ) α , (22)
so the more x t exceeds its expected value, the higher the concentration. Thus (19) can be written as u t μ = υ ( x t ) sin ( y t μ t | t 1 ) . When x t is close to zero, there is no clear direction so the concentration is low. The conditional distribution of x t given y t together with all the information at time t 1 is Weibull with scale
φ ( y t ) = ( 1 tanh υ . cos ( y t μ t | t 1 ) ) 1 / α φ t | t 1 , (23)
where φ t | t 1 = exp ( λ t | t 1 ) . Substituting in (20) gives u t λ = α [ ( x t / φ ( y t ) ) α 1 ] . When y t is close to μ t | t 1 it will boost the effect of x t .

5.3 Heteroscedasticity

As the model stands, concentration, υ ( x t ) , changes only with x t , depending on whether x t is higher or lower than expected given λ t | t 1 . Using a result in Abe and Ley (2017, p. 95), the expected value of υ ( x t ) based on information at time t 1 is
E t 1 υ ( x t ) = tanh ( υ ) E x ( x t / exp λ t | t 1 ) α = tanh υ cosh υ . P 1 0 ( cosh υ ) = tanh υ cosh 2 υ = 0 . 5 sinh ( 2 υ ) ,
where P ν 0 ( . ) is the associated Legendre function of the first kind with degree ν and order zero. Thus the prediction of υ ( x t ) is constant. It is not dependent on λ t | t 1 and so if, in the context of wind, speed has been high for some time, a value of x t lower than its expectation will imply that concentration is suddenly lower than average. This seems implausible and it points to the need to introduce dynamic heteroscedasticity into the model by letting υ be dynamic. The score with respect to this new dynamic parameter, denoted, υ t | t 1 , is
u t υ = ( x t / exp λ t | t 1 ) α [ 1 tanh 2 υ t | t 1 ] cos ( y t μ t | t 1 ) tanh υ t | t 1 . (24)
The score u t υ is very close to that of λ t | t 1 , in (20), but it differs in that when y t is close to μ t | t 1 it increases whereas u t λ reacts in the opposite direction. Note that u t λ is now defined with tanh υ t | t 1 replacing tanh υ . Using the filter for υ t | t 1 now gives E t 1 υ ( x t ) = 0 . 5 sinh ( 2 υ t | t 1 ) .

The heteroscedastic dynamic model includes an equation for ζ t | t 1 = ln υ t | t 1 to complement those in (21). The information matrix for μ , λ and ζ is given in the Supporting information. Its availability raises the possibility of pre-multiplying the scores by its inverse, as is often done in the dynamic score literature.

Remark 2.Abe and Ley (2017, pp. 96–7), p 96-7, give a generalization, the GGSSVM, in which the generalized gamma (GG) distribution replaces the Weibull; the circular marginal distribution is then the Jones-Pewsey distribution. Imoto et al. (2019) propose a generalized Pareto-type cylindrical distribution that can handle heavier tails. In both cases a score driven model can again be formulated.

5.4 Forecasts

Forecasts are based on information at T so for T + 1 we plug μ T + 1 | T , υ T + 1 | T and λ T + 1 | T into the joint distribution. The (marginal) distribution of y T + 1 , conditional on information at time T , is wrapped Cauchy that is
f T ( y T + 1 ) = 1 2 π 1 tanh 2 ( υ T + 1 | T / 2 ) 1 + tanh 2 ( υ T + 1 | T / 2 ) 2 tanh ( υ T + 1 | T / 2 ) cos ( y T + 1 μ T + 1 | T ) , (25)
where π y T + 1 < π . The one-step ahead forecast for direction, E T ( y T + 1 ) , is just the predicted location μ T + 1 | T . The marginal distribution for the linear variable is given in Abe and Ley (2017, p. 94) as
f T ( x T + 1 ) = V T + 1 | T ( x T + 1 ) α e λ T + 1 | T x T + 1 e λ T + 1 | T α 1 exp ( x T + 1 / e λ T + 1 | T ) α ,
where 0 x T + 1 < and
V T + 1 | T ( x T + 1 ) = I 0 ( ( x T + 1 / e λ T + 1 | T ) α tanh υ T + 1 | T ) cosh υ T + 1 | T .
The one-step ahead forecast of x T + 1 is
E T ( x T + 1 ) = exp ( λ T + 1 | T ) Γ ( 1 + 1 / α ) [ ( cosh υ T + 1 | T ) 1 / α P 1 / α ( cosh υ T + 1 | T ) ] .
Except for the normalizing term V T + 1 | T ( x T + 1 ) , the form of f T ( x T + 1 ) is that of a Weibull distribution and likewise E T ( x T + 1 ) is as for a Weibull distribution, apart from the term in square brackets. Multi-step forecasts can be obtained by simulation. Abe and Ley (2017, p. 94), provide details on how to simulate from the WeiSSVM distribution.

5.5 Switching Cylinders

DAMMs can be applied to multivariate series as in Catania (2021, eq. (3)). In a bivariate model the switching filter for ξ t | t 1 depends on the joint PDF f ( y t , x t ) . All parameters, including those that are fixed, such as α , are regime dependent. Following on from (17 ), the score for location within a regime is then given by
u i t μ = ln f t | t 1 μ i , t | t 1 = ξ i , t | t ( y t , x t ) υ i ( x t ) sin ( y t μ i , t | t 1 ) , i = 1 , 2 ,
where υ i ( x t ) = tanh ( υ i ) ( x t / exp λ i , t | t 1 ) α i , and similarly for the other scores.

The graphs in Figure 1 suggest that the regimes for direction are more clearly defined than those for speed. Thus it is worth considering whether to model regime switching only in terms of the marginal distribution of direction, y t . To implement such a regime switching mechanism, the PDF in the score with respect to the dynamic switching probability, that is (11), is taken to be wrapped Cauchy, as in (25), and the same density is used in the contemporaneous probability (13).

6 WIND DIRECTION AND SPEED IN GALICIA

The scatter plot of speed and direction for the Galicia data in Figure 3 highlights the regimes in direction and confirms the impression gained from Figure 1 that speed tends to be higher when the wind is coming from the SW.

Details are in the caption following the image
Scatter diagram of wind speed and direction in Galicia

As might be expected from the univariate results on wind direction, a static mixture model fares badly, with ln L = 3732 . 2 as opposed to ln L = 3371 . 2 for the single regime model without heteroscedasticity and ln L = 3355 . 7 with heteroscedasticity. The pure DAMM model, shown in the second line of Table IV and labelled model (6), is much better, with ln L = 3413 . 7 but it too fails to beat the single regime models. The score-based Q -statistics for residual serial correlation shown in Table VI are huge.

Table IV. Goodness of fit of the two regime WeiVM model with the estimated dynamic parameters of the switching probability from fitting, (5), the static mixture model, (6), the pure DAMM model, (7), the DAMM model with intra regime dynamic location and scale and (8), DAMM model with intra regime dynamic location, scale and concentration
Model AIC BIC Logl ω γ ϕ γ κ γ ξ
(5) 7,482.446 7,523.954 3 , 732 . 223 0.994 0.730
(0.097)
(6) 6,849.460 6,900.193 3 , 413 . 730 1.084 0.846 11.079 0.747
(0.552) (0.022) (1.224)
(7) 6,034.761 6,122.389 2 , 998 . 380 1.151 0.924 7.729 0.760
(0.664) (0.015) (1.114)
(8) 5,992.685 6,098.762 2 , 973 . 343 1.669 0.870 27.487 0.842
(0.561) (0.020) (0.833)
(9) 6,018.408 6,106.037 2 , 990 . 204 1.336 0.966 5.594 0.792
(0.828) (0.010) (1.076)
(10) 5,939.650 6,045.727 2 , 946 . 825 0.032 0.879 15.945 0.508
(0.552) (0.016) (2.524)
  • Note: Models (9) and (10) are specified in the same way as (7) and (8) except that the switching probability is driven by the marginal distribution with respect to location. The switching parameters are shown in the last four columns, with ξ denoting the logistic transformation of ω γ .

The inclusion of dynamics within regimes offers considerable improvement. As before the fit is better with heteroscedasticity dynamics. The main issue to resolve is whether the dynamics in the switching equation should depend on both direction and speed or on direction only. The results favour the second possibility, especially when the dynamics include heteroscedasticity. Thus the estimates reported in the last lines of Tables IV–VI are for the preferred model. As can be seen, ln L = 2946 . 8 . There is still some residual serial correlation in some of the components, but, as noted earlier, this is not unusual with large sample sizes. The estimates of α in the Weibull parameter are well above one in almost all cases.

Table V. Estimated dynamic parameters for the two regime WeiVM model from fitting, (5), the static mixture model, (6), the pure DAMM model, (7), the DAMM model with intra regime dynamic location and scale and (8), DAMM model with intra regime dynamic location, scale and concentration
Model ω μ 1 ϕ μ 1 κ μ 1 ω λ 1 ϕ λ 1 κ λ 1 ω γ 1 ϕ γ 1 κ γ 1 α 1 ω μ 2 ϕ μ 2 κ μ 2 ω λ 2 ϕ λ 2 κ λ 2 ω γ 2 ϕ γ 2 κ γ 2 α 2
(5) 4.046 2.418 0.695 2.115 0.886 1.561 0.543 2.021
(0.012) (0.054) (0.024) (0.033) (0.028) (0.096) (0.051) (0.057)
(6) 4.048 2.422 0.735 2.171 0.917 1.732 0.381 2.002
(0.011) (0.045) (0.022) (0.029) (0.032) (0.025) (0.047) (0.041)
(7) 4.002 0.927 0.010 2.218 0.978 0.021 0.748 3.575 1.694 0.992 0.043 1.622 0.811 0.089 0.489 2.329
(0.032) (0.025) (0.002) (0.080) (0.005) (0.002) (0.021) (0.029) (0.110) (0.003) (0.007) (0.016) (0.047) (0.010) (0.041) (0.031)
(8) 4.048 0.941 0.008 2.161 0.976 0.023 0.592 0.912 0.015 3.398 0.854 0.982 0.013 1.798 0.802 0.046 1.261 0.998 0.067 2.342
(0.001) (0.024) (0.001) (0.124) (0.000) (0.002) (0.027) (0.027) (0.003) (0.054) (0.071) (0.019) (0.004) (0.002) (0.068) (0.006) (0.133) (0.000) (0.014) (0.033)
(9) 3.998 0.919 0.010 2.274 0.972 0.027 0.744 3.478 1.408 0.967 0.052 1.482 0.924 0.051 0.485 2.481
(0.030) (0.025) (0.002) (0.110) (0.007) (0.003) (0.020) (0.031) (0.082) (0.013) (0.011) (0.166) (0.038) (0.010) (0.053) (0.047)
(10) 4.068 0.957 0.007 2.326 0.972 0.018 –0.447 0.941 0.025 3.237 1.786 0.993 0.039 1.401 0.943 0.038 –0.479 0.756 0.021 2.559
(0.042) (0.016) (0.001) (0.091) (0.008) (0.002) (0.061) (0.015) (0.005) (0.033) (0.200) (0.004) (0.007) (0.170) (0.023) (0.006) (0.053) (0.103) (0.010) (0.048)
  • Note: Models (9) and (10) are specified in the same way as (7) and (8) except that the switching probability is driven by the marginal distribution with respect to location.
Table VI. Box–Ljung test with P = 5 , Q ( 5 ) , for residual correlation on the fitted scores from fitting, (5), the static mixture model, (6), the pure DAMM model, (7), the DAMM model with intra regime dynamic location and scale and (8), DAMM model with intra regime dynamic location, scale and concentration
Model μ 1 λ 1 ζ 1 μ 2 λ 2 ζ 2 γ
(5) 767 . 82 6 774 . 36 6 2037 . 3 6 550 . 02 1 649 . 58 6 1248 . 8 6 2642 . 1 3
(6) 753 . 42 5 678 . 76 6 1918 . 0 3 587 . 87 0 516 . 35 5 1339 . 6 6 26 . 82 7
(7) 5.093 3.303 219 . 27 0 25 . 64 6 14 . 79 2 159 . 49 4 11 . 66 6
(8) 3.479 3.737 38 . 26 5 53 . 01 2 6.806 41 . 57 9 1.069
(9) 3.459 1.528 158 . 64 4 25 . 68 6 10 . 49 6 141 . 56 6 120 . 99 5
(10) 10 . 29 8 2.551 26 . 70 6 26 . 68 4 5.073 54 . 87 5 43 . 29 4
  • Note: Models (9) and (10) are specified in the same way as (7) and (8) except that the switching probability is driven by the marginal distribution with respect to location. (Nominal) significance levels: 0.01 ‘***’, 0.05 ‘**’, 0.1 ‘*’.

The observations and combined filters for direction are shown in Figure 4. The filtered estimates are smoother than the raw observations. The filtered estimates stay well within the range [ 0 , 2 π ) whereas the observations sometimes move rapidly between the top and bottom of the graph. Note that the combined filter for mean direction is computed using the a tan 2 function, as in (17), while the corresponding filter for scale is ξ 1 , t | t 1 exp ( λ 1 , t | t 1 ) + ξ 2 , t | t 1 exp ( λ 2 , t | t 1 ) , t = 1 , , T .

Details are in the caption following the image
Wind direction (WD) and fitted wind direction (MuFilter) in Galicia with two regime heteroscedastic dynamic WeiVM model

Figures 5 and 6 show filtered wind direction and speed in individual regimes. It can be seen that when the probability of being in a given regime is small, the movements in the underlying filter change only gradually. For wind direction, the combined filter of Figure 4 is also shown. The combined filter for speed is not shown as it is just the sum of the individual filters.

Details are in the caption following the image
Filtered wind direction in individual regimes, mu1 and mu2, together with the combined filter
Details are in the caption following the image
Filtered scale for wind speed in individual regimes

Remark 3.When there is no wind, it has no direction. In such cases x t = 0 and so the model gives υ ( x t ) = 0 which implies the (unobserved) wind direction is distributed uniformly. It is evident from (20) that the score for location, u t μ , is zero. Thus the observation is effectively ignored as in the naive solution for dealing with an observation that is missing. This is not the case for the scale of the linear variable because u t λ = α and the concentration score where u t υ = tanh υ t | t 1 . As regards the likelihood, the difficulty is that f ( y t , 0 ) = 0 for α > 1 , indicating that x t = 0 is impossible. For α < 1 , f ( 0 ) = which is also unhelpful. Only for α = 1 is there a viable solution as in this case f ( y t , 0 ) = 1 / 2 π . The simplest solution is to assume there is no contribution to the likelihood.

7 CONCLUSION

Score-driven regime switching models can be extended to handle circular observations and diagnostic tests can be constructed. The models allow for changing concentration as well as changing location. When fitted to hourly wind direction in a site in Galicia, pure regime switching models, without intra-regime dynamics, are unable to outperform the single regime model when both location and scale are dynamic. Although the diagnostic test based on the switching residuals indicates that there are no omitted dynamics in the regime-switching equation, the Q -statistics for location dynamics are still highly significant in both regimes. Fitting a score-driven switching model with location dynamics in each regime gives a big increase in the likelihood function.

The score-driven approach is then used to construct dynamic bivariate models for circular and linear variables with a conditional cylindrical distribution. The preferred specification for the Galicia data has dynamic location and concentration for wind direction and dynamic location/scale for its speed. Estimating a restricted regime switching model, in which the regime dynamics depend only on direction, gives a good fit when heteroscedasticity is included and the resulting filter for direction tracks the observations remarkably well. Again the modelling of intra-regime dynamics is crucial.

There is further scope for research extending the score-driven approach to bivariate dynamic cylindrical models based on copulas, as used by García-Portugués et al. (2013) and Lagona (2019), and to directional data on a sphere or torus.

ACKNOWLEDGEMENTS

We are grateful to Eduardo García-Portugués for providing the Galicia data used in García-Portugués et al. (2013). Earlier versions of some of the ideas in this article were presented at the Econometric Models of Climate Change conference in Milan in August 2019, at the 22nd Oxmetrics conference at Nuffield college, Oxford in September, 2019, and at workshops in Cambridge, Bologna, Ecole Polytechnique Féd érale de Lausanne, the QUT Centre for Data Sciences and the University of Konstanz. Later versions were given at a plenary (virtual) session of the 45th NBER-NSF conference in October 2021 at Rice University, Houston and at the ADISTA22 (Advances in Directional Statistics) conference in Santiago de Compostela in June 2022. We are grateful to Anthony Davison, Jurgen Doornik, David Hendry, Stan Hurn, Peter Jupp, John Kent, Francisco Lagona, Rutger-Jan Lange, Christophe Ley, Ken Lindsay, Oliver Linton, Alessandra Luati, Paul Myer, Alexiy Onatski, Richard Smith, Howell Tong and two referees for helpful comments.

    DATA AVAILABILITY STATEMENT

    The data that support the findings of this study are available from the corresponding author on reasonable request.

    • 1 Score-driven time series models were developed by Harvey (2013) and Creal et al. (2013), where they were called DCS and GAS models respectively.
    • 2 The fact that the range of the observations is [ 0 , 2 π ) rather than [ π , π ) makes no substantive difference to the results.
    • 3 Abe and Ley (2017) have β = exp ( λ ) and κ = υ .
    • 4 Note that they have α replacing our α γ and γ replacing α .
    • 5 There is a run of missing observations on both wind and velocity around observation 250. In such cases we set all scores to zero: hence the slight dip in Figures 5 and 6. There are a few more missing observations around 455 and 680.

    The full text of this article hosted at iucr.org is unavailable due to technical difficulties.