Volume 45, Issue 6 pp. 910-930
Original Article
Open Access

Threshold Network GARCH Model

Yue Pan

Yue Pan

Department of Mathematics and Statistics, University of Strathclyde, Glasgow, UK

Search for more papers by this author
Jiazhu Pan

Corresponding Author

Jiazhu Pan

Department of Mathematics and Statistics, University of Strathclyde, Glasgow, UK

Correspondence to: Jiazhu Pan, Department of Mathematics and Statistics, University of Strathclyde, Glasgow G1 1XH, UK. Email: [email protected]

Search for more papers by this author
First published: 13 May 2024

Abstract

Generalized Autoregressive Conditional Heteroscedasticity (GARCH) model and its variations have been widely adopted in the study of financial volatilities, while the extension of GARCH-type models to high-dimensional data is always difficult because of over-parameterization and computational complexity. In this article, we propose a multi-variate GARCH-type model that can simplify the parameterization by utilizing the network structure that can be appropriately specified for certain types of high-dimensional data. The asymmetry in the dynamics of volatilities is also considered as our model adopts a threshold structure. To enable our model to handle data with extremely high dimension, we investigate the near-epoch dependence (NED) of our model, and the asymptotic properties of our quasi-maximum-likelihood-estimator (QMLE) are derived from the limit theorems for NED random fields. Simulations are conducted to test our theoretical results. At last we fit our model to log-returns of four groups of stocks and the results indicate that bad news is not necessarily more influential on volatility if the network effects are considered.

1 Introduction

To pursue maximum return or to circumvent potential risk, investors constantly revise their portfolio according to any related information. Understanding how the volatility of financial assets responds to new information is crucial in risk management and a widely studied area in econometrics and statistics. In the literature, statistical models that describe the formation of financial risks have been developed and conducted in practice. The Autoregressive Conditional Heteroscedasticity (ARCH) model was proposed by Engle (1982) for estimating the variance of United Kingdom's inflation. In an ARCH( p $$ p $$ ) model, the volatilities of returns are affected by up to p $$ p $$ lags of past observations. Bollerslev (1986) then proposed a generalized ARCH model (GARCH), to accommodate longer memory of past observations. It has become one of the most popular models in econometrics ever since, and numerous variations of GARCH model have been developed for modeling volatility with complicated structures. See Teräsvirta (2009) for a survey of different GARCH-type models.

When we study risks of multiple assets simultaneously, the conditional variances that represent individual risks are of interest as well as conditional covariances that represent risk-sharing relationships. On the other hand, risk of a particular individual could be affected by covariates of itself, and of those who closely related to it. This leads to the need of extending the GARCH-type models into the multi-variate case. For an N $$ N $$ dimensional time series y t $$ \left\{{\mathbf{y}}_t\right\} $$ , a canonical expression of multi-variate GARCH would be
y t = H t 1 / 2 z t , H t = g ( y t 1 , H t 1 ) , $$ {\displaystyle \begin{array}{ll}\hfill & {\mathbf{y}}_t={H}_t^{1/2}{\mathbf{z}}_t,\\ {}\hfill & {H}_t=g\left({\mathbf{y}}_{t-1},{H}_{t-1}\right),\end{array}} $$ (1)
where random vector z t $$ {\mathbf{z}}_t $$ satisfies 𝔼 ( z t ) = 0 and v a r ( z t ) = I N $$ \mathit{\operatorname{var}}\left({\mathbf{z}}_t\right)={I}_N $$ and there could be various specifications of the function g ( · ) $$ g\left(\cdotp \right) $$ as it represents the structure of the conditional covariance matrix H t $$ {H}_t $$ . For more details on this subject, an excellent survey paper by Bauwens et al. (2006) on the family of multi-variate GARCH models is recommended.

However, in terms of parameter estimation, there are major challenges that could make multi-variate GARCH (MGARCH) models inapplicable in empirical analysis when it comes to dealing with high-dimensional data. For example, the number of parameters rises at the speed of 𝒪 ( N 4 ) in vectorized GARCH model (VEC-GARCH) proposed by Bollerslev et al. (1988). Over parameterized specification causes high computational complexity, and makes it problematic to derive conditions for positive definiteness of the conditional covariance matrix H t $$ {H}_t $$ . Plenty of efforts have been made in the literature, diagonal VEC-GARCH (DVEC-GARCH) model by Bollerslev et al. (1988) and Baba-Engle-Kraft-Kroner GARCH (BEKK-GARCH) model by Engle and Kroner (1995) are proposed with the aim of simplifying the conditions of positive definite by imposing structural restrictions on the conditional covariance matrix. The number of parameters could also be significantly reduced to 𝒪 ( N 2 ) in the Constant Conditional Correlation GARCH (CCC-GARCH) by Bollerslev (1990) and Dynamic Conditional Correlation GARCH (DCC-GARCH) by Engle (2002) and Tse and Tsui (2001). On the other hand, as an alternative way to overcome the over-parameterization problem, the idea of factor variables was imposed by Engle et al. (1990) on the multi-variate ARCH model as a dimension reduction technique. This idea is later introduced to MGARCH model by Bollerslev and Engle (1993), as well as succeeding work of Pan et al. (2010), Hu and Tsay (2014) and Li et al. (2016). At certain application scenarios when there is network structure behind the data we are interested in, multiple variables are connected and a multi-variate GARCH-type model could be fitted. These variations of MGARCH models solved the over-parameterization problem to some extent, but the number of parameters still expands along with the number of dimension nevertheless. With this flaw, MGARCH models could only be imposed on data with a small number of dimensions, such as stock indices of multiple markets or exchange rates of two currencies (see Karolyi, 1995 and Tse and Tsui, 2001).

Despite aforementioned difficulties due to dimensionality, for some specific types of multi-variate data where the connections between different components are actually observable, it is still possible to significantly simplify the model setup in the following aspects:
  1. Instead of considering both volatilities and co-volatilities, we focus on studying the dynamics of volatilities only,
  2. And instead of parameterizing every cross-individual effect, appropriate network structure can be embedded into the model.

In many cases such network structure can provide sufficient information about how the influence of pulses travel through edges between individual nodes. For instance, Nitzan and Libai (2011) found that customers connected with a defecting neighbor are 80% more likely to cancel their cellular service, and Goel and Goldstein (2014) concluded that the accuracy of individual behavior prediction can be significantly improved based on network data compared with conventional marketing practices.

Zhou et al. (2020) proposed a network GARCH model (see (2) for detailed specification of this model) that significantly reduces the parameterization complexity – the number of parameters remains fixed in their model no matter how large is the dimension N $$ N $$ . However, they did not fully utilize such an advantage as their discussion on parameter estimation is limited to the case when N $$ N $$ is fixed. Such setting narrows the variety of scenarios where their model could be applied since the size of network is often extremely large. In a study on social network that consists of 2982 users, Zhu et al. (2017) proposed a network AR model and the corresponding least squares estimator is proved to remain valid when the sample size T $$ T\to \infty $$ and the dimension N $$ N\to \infty $$ . Compared with their AR-type model, the unobserved volatility processes raise difficulties in extending such properties to GARCH-type models. We manage to address this problem by considering our network model as a spatial process on a two-dimensional lattice and adopting the asymptotic theorems for random fields proposed by Jenish and Prucha (2012) in the estimation of parameters. Since their limit theorems require NED, we will show such properties under certain restrictions on parameters and network structure. The idea of using limit theorems of spatial processes in the inference of high-dimensional time series has been considered by Xu et al. (2022) in the instrumental variable quantile regression estimation of their dynamic network quantile regression model. In this article, we will first introduce this idea into the estimation of high-dimensional GARCH-type models.

Aside from data with a diverging number of dimensions, we also aim to enable our model to handle data with asymmetry that was observed in empirical work such that positive and negative pulses affect volatilities differently in magnitude as well as in direction. While most GARCH-type models have an implicit assumption that the volatilities respond equally to the magnitude of positive and negative returns, Glosten et al. (1993) proposed a Glosten-Jagannathan-Runkle GARCH (GJR-GARCH) model with a threshold structure, allowing the volatility to act asymmetrically in magnitude responding to positive and negative pulses. The threshold GARCH (TGARCH) by Zakoïan (1994) also accommodates the asymmetry, but in magnitude of influence on conditional standard deviation. The exponential GARCH (EGARCH) of Nelson (1991) takes log transformation on the conditional variances, lifting the limitation of non-negative coefficients in conventional GARCH-type models and making it possible for their model to explain asymmetry in the direction of how volatility change corresponding to positive and negative news.

To study the asymmetrical dynamics in the volatilities of high-dimensional financial data, we propose a threshold network GARCH (TNGARCH) in Section 2. Stationarity conditions of this model are derived in Section 3.1 with fixed N $$ N $$ . In Section 3.2 we prove the L 2 $$ {L}_2 $$ -NED of proposed model under certain restrictions. The asymptotic properties of QMLE are investigated in Section 4, in the case when T $$ T\to \infty $$ and N $$ N\to \infty $$ at a lower rate. Then we propose a Wald statistic in Section 5.1, to test the existence of threshold effect, and in Section 5.2 we introduce a test for high-dimensional white noise proposed by Li et al. (2019). In Section 6, our methodology is tested on simulated data that are generated based on four different kinds of network structure. We observed an asymmetry that is different from existing literature, in how much the volatility responds to good news and bad news at individual level by applying our model to high-dimensional time series of log returns in Section 7. At last in Section 8, conclusions and potential directions for future research are summarized.

2 Model Setup

Consider an undirected and weightless network with N $$ N $$ nodes. Define the adjacency matrix A = ( a i j ) 1 i , j N $$ A={\left({a}_{ij}\right)}_{1\le i,j\le N} $$ , where a i j = 1 $$ {a}_{ij}=1 $$ if there is a connection between node i $$ i $$ and node j $$ j $$ , otherwise a i j = 0 $$ {a}_{ij}=0 $$ . Besides, self-connection is not allowed for any node i $$ i $$ by letting a i i = 0 $$ {a}_{ii}=0 $$ .

The connection can be defined differently with respect to practical scenarios, such as two social network accounts in a mutual followship, or two stocks who share at least one of top shareholders. As an interpretation of the network structure, A $$ A $$ is symmetric since a i j = a j i $$ {a}_{ij}={a}_{ji} $$ , hence for any node i $$ i $$ , the out-degree d i ( out ) = j = 1 N a i j $$ {d}_i^{\left(\mathrm{out}\right)}={\sum}_{j=1}^N{a}_{ij} $$ is equal to the in-degree d i ( in ) = j = 1 N a j i $$ {d}_i^{\left(\mathrm{in}\right)}={\sum}_{j=1}^N{a}_{ji} $$ and we use d i $$ {d}_i $$ to denote both for convenience.

For any node i $$ i $$ in this network, let y i t $$ {y}_{it} $$ be the observation at time t $$ t $$ , and h i t $$ {h}_{it} $$ be the unobservable conditional heteroscedasticity of y i t $$ {y}_{it} $$ , i.e. h i t : = var ( y i t | t 1 ) $$ {h}_{it}:= \operatorname{var}\left({y}_{it}|{\mathscr{H}}_{t-1}\right) $$ where t 1 $$ {\mathscr{H}}_{t-1} $$ denotes the σ $$ \sigma $$ -algebra consisting of all available information up to t 1 $$ t-1 $$ . A network GARCH(1, 1) specification of the conditional variance incorporates the network effect:
h i t = ω + α y i , t 1 2 + λ j = 1 N w i j y j , t 1 2 + β h i , t 1 , i = 1 , 2 , , N . $$ {h}_{it}=\omega +\alpha {y}_{i,t-1}^2+\lambda \sum \limits_{j=1}^N{w}_{ij}{y}_{j,t-1}^2+\beta {h}_{i,t-1},\kern1em i=1,2,\dots, N. $$ (2)
Model (2) indicates that the volatility h i t $$ {h}_{it} $$ of each stock i $$ i $$ , is influenced by not only its own previous change of price measured by y i , t 1 2 $$ {y}_{i,t-1}^2 $$ , but also the average (with weight w i j = a i j d i $$ {w}_{ij}=\frac{a_{ij}}{d_i} $$ ) of y j , t 1 2 $$ {y}_{j,t-1}^2 $$ for all node j $$ j $$ that are related to node i $$ i $$ . To ensure the positiveness of conditional variance, it is constrained that ω > 0 $$ \omega >0 $$ while α , λ , β 0 $$ \alpha, \lambda, \beta \ge 0 $$ .
To model the asymmetry in volatility, our TNGARCH model contains the threshold structure comparing with model (2). A TNGARCH(1,1) model is specified as follows:
y i t = ε i t h i t , h i t = ω + α ( 1 ) 1 { y i , t 1 0 } + α ( 2 ) 1 { y i , t 1 < 0 } y i , t 1 2 + λ j = 1 N w i j y j , t 1 2 + β h i , t 1 , i = 1 , 2 , , N , $$ {\displaystyle \begin{array}{ll}\hfill & {y}_{it}={\varepsilon}_{it}\sqrt{h_{it}},\\ {}\hfill & {h}_{it}=\omega +\left({\alpha}^{(1)}{1}_{\left\{{y}_{i,t-1}\ge 0\right\}}+{\alpha}^{(2)}{1}_{\left\{{y}_{i,t-1}<0\right\}}\right){y}_{i,t-1}^2+\lambda \sum \limits_{j=1}^N{w}_{ij}{y}_{j,t-1}^2+\beta {h}_{i,t-1},\\ {}\hfill & i=1,2,\dots, N,\end{array}} $$ (3)
where 1 { · } $$ {1}_{\left\{\cdotp \right\}} $$ is the indicator function. To assure the positiveness of h i t $$ {h}_{it} $$ , the coefficients ω , α ( 1 ) , α ( 2 ) , λ $$ \omega, {\alpha}^{(1)},{\alpha}^{(2)},\lambda $$ and β $$ \beta $$ are assumed to have the same constraints as in (2). ε i t $$ \left\{{\varepsilon}_{it}\right\} $$ is a white noise process satisfying the following assumption:

Assumption 1. ε i t $$ \left\{{\varepsilon}_{it}\right\} $$ is i.i.d. across i $$ i $$ and t $$ t $$ , with non-degenerate distribution, mean 0 and variance 1.

This assumption allows us to investigate, in the next section, the conditions for our model to have a unique strictly stationary solution, which serves as a precondition for further discussion on parameter estimation and statistical inference.

3 Stationarity and Near-Epoch Dependence

To derive the conditions under which model (3) is strictly stationary, we rewrite the conditional variance process in vector form
h t = ω 1 N + B t 1 h t 1 , $$ {\mathbf{h}}_t=\omega {\mathbf{1}}_N+{B}_{t-1}{\mathbf{h}}_{t-1}, $$ (4)
with notations as follows:
h t = h 1 t , h 2 t , , h N t N , 1 N = 1 , 1 , , 1 N , B t 1 = α ( 1 ) R t 1 E t 1 + α ( 2 ) ( I N R t 1 ) E t 1 + λ D 1 A E t 1 + β I N , R t 1 = diag 1 { y 1 , t 1 0 } , 1 { y 2 , t 1 0 } , , 1 { y N , t 1 0 } , E t 1 = diag ε 1 , t 1 2 , ε 2 , t 1 2 , , ε N , t 1 2 , D = diag d 1 , d 2 , , d N . $$ {\displaystyle \begin{array}{ll}\hfill & {\mathbf{h}}_t={\left({h}_{1t},{h}_{2t},\dots, {h}_{Nt}\right)}^{\prime}\in {\mathbb{R}}^N,\\ {}\hfill & {\mathbf{1}}_N={\left(1,1,\dots, 1\right)}^{\prime}\in {\mathbb{R}}^N,\\ {}\hfill & {B}_{t-1}={\alpha}^{(1)}{R}_{t-1}{E}_{t-1}+{\alpha}^{(2)}\left({I}_N-{R}_{t-1}\right){E}_{t-1}+\lambda {D}^{-1}A{E}_{t-1}+\beta {I}_N,\\ {}\hfill & {R}_{t-1}=\operatorname{diag}\left\{{1}_{\left\{{y}_{1,t-1}\ge 0\right\}},{1}_{\left\{{y}_{2,t-1}\ge 0\right\}},\dots, {1}_{\left\{{y}_{N,t-1}\ge 0\right\}}\right\},\\ {}\hfill & {E}_{t-1}=\operatorname{diag}\left\{{\varepsilon}_{1,t-1}^2,{\varepsilon}_{2,t-1}^2,\dots, {\varepsilon}_{N,t-1}^2\right\},\\ {}\hfill & D=\operatorname{diag}\left\{{d}_1,{d}_2,\dots, {d}_N\right\}.\end{array}} $$

In Section 3.1, the stationarity of (4) when N $$ N $$ is a fixed number will be discussed. However, to estimate the parameters when N $$ N\to \infty $$ , limit theorems based on stationarity and ergodicity of fixed-dimensional time series are not sufficient. Therefore, in Section 3.2 we will discuss the near-epoch dependence for a random field that supports the adoption of limit theorems for spatial processes in the subsequent sections.

3.1 Stationarity with N being Fixed

Since y i t = ε i t h i t $$ {y}_{it}={\varepsilon}_{it}\sqrt{h_{it}} $$ , y i t 0 $$ {y}_{it}\ge 0 $$ is equivalent to ε i t 0 $$ {\varepsilon}_{it}\ge 0 $$ . Hence
R t 1 = diag 1 { ε 1 , t 1 0 } , 1 { ε 2 , t 1 0 } , , 1 { ε N , t 1 0 } . $$ {R}_{t-1}=\operatorname{diag}\left\{{1}_{\left\{{\varepsilon}_{1,t-1}\ge 0\right\}},{1}_{\left\{{\varepsilon}_{2,t-1}\ge 0\right\}},\dots, {1}_{\left\{{\varepsilon}_{N,t-1}\ge 0\right\}}\right\}. $$
In this case, the random matrices B t $$ \left\{{B}_t\right\} $$ are i.i.d. and model (4) is a generalized autoregressive equation by definition 1.4 in Bougerol and Picard (1992). It is easy to verify that 𝔼 ( log + B 0 ) < . Therefore, the top Lyapunov exponent associated to B t $$ \left\{{B}_t\right\} $$ is well-defined as follows:
γ : = inf 𝔼 1 t + 1 log B t B t 1 B 0 , t N , (5)
where · $$ {\left\Vert \cdotp \right\Vert}_{\ast } $$ is an operator norm of N × N $$ N\times N $$ matrices, corresponding to any norm on N $$ {\mathbb{R}}^N $$ through
M = sup M x / x ; x N , x 0 . $$ {\left\Vert M\right\Vert}_{\ast }=\sup \left\{\left\Vert M\mathbf{x}\right\Vert /\left\Vert \mathbf{x}\right\Vert; \kern0.3em \mathbf{x}\in {\mathbb{R}}^N,\kern0.3em \mathbf{x}\ne 0\right\}. $$
According to theorem 3.2 in Bougerol and Picard (1992), the series
h t = ω 1 N + ω k = 1 B t 1 B t k 1 N , $$ {\mathbf{h}}_t=\omega {\mathbf{1}}_N+\omega \sum \limits_{k=1}^{\infty }{B}_{t-1}\dots {B}_{t-k}{\mathbf{1}}_N, $$ (6)
is the unique strictly stationary and ergodic solution of model (4) if and only if the Lyapunov exponent γ < 0 $$ \gamma <0 $$ . Under this condition, process y t $$ \left\{{\mathbf{y}}_t\right\} $$ is also strictly stationary and ergodic where y t = y 1 t , y 2 t , , y N t N $$ {\mathbf{y}}_t={\left({y}_{1t},{y}_{2t},\dots, {y}_{Nt}\right)}^{\prime}\in {\mathbb{R}}^N $$ since we could easily construct a continuous function Λ : N N $$ \Lambda :{\mathbb{R}}^N\to {\mathbb{R}}^N $$ according to (3) such that y t = Λ ( h t ) $$ {\mathbf{y}}_t=\Lambda \left({\mathbf{h}}_t\right) $$ . Besides, since y i t = ε i t h i t $$ {y}_{it}={\varepsilon}_{it}\sqrt{h_{it}} $$ , the almost sure convergence of (6) guarantees that 𝔼 ( h i t ) < for any i $$ i $$ . Thus, 𝔼 y t 2 = i = 1 N 𝔼 ( h i t ) < with · $$ \left\Vert \cdotp \right\Vert $$ being an Euclidean norm.
By the subadditive ergodic theorem in Kingman (1973),
γ = lim t 1 t + 1 log B t B t 1 B 0 $$ \gamma =\underset{t\to \infty }{\lim}\frac{1}{t+1}\log {\left\Vert {B}_t{B}_{t-1}\dots {B}_0\right\Vert}_{\ast } $$
almost surely. In this case, γ $$ \gamma $$ could be approximated through computer simulation technique given a specific distribution of ε i t $$ {\varepsilon}_{it} $$ . For the purpose of reducing computation complexity, we derive a sufficient condition that is simple and much easier to verify.

Theorem 1.Under Assumption 1, model (4) has a unique strictly stationary and ergodic solution in the form (6) if

max α ( 1 ) , α ( 2 ) + β + λ < 1 . $$ \max \left\{{\alpha}^{(1)},{\alpha}^{(2)}\right\}+\beta +\lambda <1. $$ (7)

3.2 Near-Epoch Dependence for Random Fields

Let D : = { ( i , t ) : i + , t } $$ D:= \left\{\left(i,t\right):i\in {\mathbb{N}}_{+},t\in \mathbb{Z}\right\} $$ be a lattice on space 2 $$ {\mathbb{R}}^2 $$ , and ρ ( ( i , t ) , ( j , τ ) ) : = max { | i j | , | t τ | } $$ \rho \left(\left(i,t\right),\left(j,\tau \right)\right):= \max \left\{|i-j|,|t-\tau |\right\} $$ measures the distance between any two locations ( i , t ) , ( j , τ ) D $$ \left(i,t\right),\left(j,\tau \right)\in D $$ . Assume we have observations from model (3) { y i t , 1 i N , 1 t T } $$ \left\{{y}_{it},1\le i\le N,1\le t\le T\right\} $$ , then these observations could be regarded as triangular array of random fields { y i t : ( i , t ) D N T , N T 1 } $$ \left\{{y}_{it}:\left(i,t\right)\in {D}_{NT}, NT\ge 1\right\} $$ with { D N T , N T 1 } $$ \left\{{D}_{NT}, NT\ge 1\right\} $$ being a series of finite rectangular lattices D N T : = { ( i , t ) : 1 i N , 1 t T } $$ {D}_{NT}:= \left\{\left(i,t\right):1\le i\le N,1\le t\le T\right\} $$ . Then the growth of sample size is ensured by unbounded expansion of D N T $$ {D}_{NT} $$ as N T $$ NT\to \infty $$ . Such expansion is represented as | D N T | c $$ {\left|{D}_{NT}\right|}_c\to \infty $$ , where | · | c $$ {\left|\cdotp \right|}_c $$ is the cardinality of D N T $$ {D}_{NT} $$ . The discussions on the asymptotic behaviors of random fields concern only the expansion of sample region, therefore the theoretical results derived in this section will apply as long as | D N T | c = N T $$ {\left|{D}_{NT}\right|}_c= NT\to \infty $$ .

Let · p $$ {\left\Vert \cdotp \right\Vert}_p $$ denote the L p $$ {L}_p $$ -norm, i.e. X p : = ( 𝔼 | X | p ) 1 / p for an arbitrary random variable X $$ X $$ . The definition of NED random fields is given as follows (see definition 1 in Jenish and Prucha, 2012):

Definition 1.A triangular array of random fields 𝒴 : = { y i t : ( i , t ) D N T , N T 1 } is said to be L p $$ {L}_p $$ -NED ( p 1 $$ p\ge 1 $$ ) on = { ε i t : ( i , t ) D } $$ \mathcal{E}=\left\{{\varepsilon}_{it}:\left(i,t\right)\in D\right\} $$ if sup ( i , t ) D y i t p < $$ {\sup}_{\left(i,t\right)\in D}{\left\Vert {y}_{it}\right\Vert}_p<\infty $$ , and

y i t E ( y i t | i t ( s ) p d i t ψ ( s ) , $$ {\left\Vert {y}_{it}-E\Big({y}_{it}|{\mathcal{F}}_{it}(s)\right\Vert}_p\le {d}_{it}\psi (s), $$
where i t ( s ) : = σ { ε j τ : ρ ( ( i , t ) , ( j , τ ) ) s } $$ {\mathcal{F}}_{it}(s):= \sigma \left\{{\varepsilon}_{j\tau}:\rho \left(\left(i,t\right),\left(j,\tau \right)\right)\le s\right\} $$ , ψ ( s ) $$ \psi (s) $$ is some non-negative sequence with lim s ψ ( s ) = 0 $$ {\lim}_{s\to \infty}\psi (s)=0 $$ , and { d i t : ( i , t ) D N T , N T 1 } $$ \left\{{d}_{it}:\left(i,t\right)\in {D}_{NT}, NT\ge 1\right\} $$ is an array of finite positive constants.

Remark.If ψ ( s ) = 𝒪 ( s μ ) for some μ > 0 $$ \mu >0 $$ , then 𝒴 is said to be L p $$ {L}_p $$ -NED on $$ \mathcal{E} $$ of size- μ $$ \mu $$ ; If ψ ( s ) = 𝒪 ( ρ s ) for some 0 < ρ < 1 $$ 0<\rho <1 $$ , then 𝒴 is said to be L p $$ {L}_p $$ -NED on $$ \mathcal{E} $$ geometrically; If sup ( i , t ) D d i t < $$ {\sup}_{\left(i,t\right)\in D}{d}_{it}<\infty $$ , then 𝒴 is said to be uniformly L p $$ {L}_p $$ -NED on random field $$ \mathcal{E} $$ . Note that geometric NED means NED of size- μ $$ \mu $$ for all μ > 0 $$ \mu >0 $$ .

We need following assumptions before discussing the NED property of 𝒴 . Assumption 2 is needed to prove that sup ( i , t ) D h i t 2 < $$ {\sup}_{\left(i,t\right)\in D}{\left\Vert {h}_{it}\right\Vert}_2<\infty $$ ; Assumption 3 put restriction on the sparsity of the network: the power of connections between two nodes decays with their distance in case (a), or two nodes are only connected if they are sufficiently close in case (b). Similar restrictions on the network structure could also be seen in assumption 3 by Xu and Lee (2015) and assumption 3.2 by Xu et al. (2022).

Assumption 2.There exists κ 4 : = 𝔼 ε i t 4 < , such that

κ 4 max { α ( 1 ) , α ( 2 ) } + β + λ 2 < 1 . $$ {\kappa}_4{\left(\max \left\{{\alpha}^{(1)},{\alpha}^{(2)}\right\}+\beta +\lambda \right)}^2<1. $$

Assumption 3.The row-normalized adjacency matrix W $$ W $$ satisfies one of following conditions:

  • (a).

    w i j = 𝒪 ( | i j | μ + 2 2 ) for some μ > 0 $$ \mu >0 $$ ;

  • (b).

    w i j 0 $$ {w}_{ij}\ne 0 $$ if | i j | K $$ \mid i-j\mid \le K $$ for some constant K 1 $$ K\ge 1 $$ , and w i j = 0 $$ {w}_{ij}=0 $$ otherwise.

Theorem 2.If condition (7) holds, under Assumptions 1, 2 and 3(a), { h i t : ( i , t ) D N T , N T 1 } $$ \left\{{h}_{it}:\left(i,t\right)\in {D}_{NT}, NT\ge 1\right\} $$ is uniformly L 2 $$ {L}_2 $$ -NED on { ε i t : ( i , t ) D } $$ \left\{{\varepsilon}_{it}:\left(i,t\right)\in D\right\} $$ of size- μ $$ \mu $$ , where the NED size μ $$ \mu $$ is the constant in Assumption 3(a). Moreover, if Assumption 3(b) holds instead of 3(a), { h i t : ( i , t ) D N T , N T 1 } $$ \left\{{h}_{it}:\left(i,t\right)\in {D}_{NT}, NT\ge 1\right\} $$ is uniformly and geometrically L 2 $$ {L}_2 $$ -NED on { ε i t : ( i , t ) D } $$ \left\{{\varepsilon}_{it}:\left(i,t\right)\in D\right\} $$ .

Remark.Note that y i t 2 𝔼 ( y i t 2 | i t ( s ) ) 2 = ε i t 2 2 h i t 𝔼 ( h i t 2 | i t ( s ) ) 2 , then Assumption 2 facilitates the L 2 $$ {L}_2 $$ -NED of y i t 2 $$ {y}_{it}^2 $$ 's given the L 2 $$ {L}_2 $$ -NED of h i t $$ {h}_{it} $$ 's. Besides, since h i t ω > 0 $$ {h}_{it}\ge \omega >0 $$ , it is easy to verify that h i t $$ \sqrt{h_{it}} $$ is a Lipschitz transformation of h i t $$ {h}_{it} $$ using mean value theorem. Then proposition 2 in Jenish and Prucha (2012) allows h i t $$ \sqrt{h_{it}} $$ 's to inherit the NED properties from h i t $$ {h}_{it} $$ 's, therefore we could also verify that y i t $$ {y}_{it} $$ 's are also L 2 $$ {L}_2 $$ -NED.

4 Parameter Estimation

From model (3) we have observations { y i t : ( i , t ) D N T , N T 1 } $$ \left\{{y}_{it}:\left(i,t\right)\in {D}_{NT}, NT\ge 1\right\} $$ with respect to true parameters θ 0 : = ( ω 0 , α 0 ( 1 ) , α 0 ( 2 ) , λ 0 , β 0 ) 5 $$ {\theta}_0:= {\left({\omega}_0,{\alpha}_0^{(1)},{\alpha}_0^{(2)},{\lambda}_0,{\beta}_0\right)}^{\prime}\in {\mathbb{R}}^5 $$ . Based on the infinite past of observations, the quasi log-likelihood function is
L N T ( θ ) = 1 N T i = 1 N t = 1 T l i t ( θ ) , l i t ( θ ) = log σ i t 2 ( θ ) + y i t 2 σ i t 2 ( θ ) , $$ {\displaystyle \begin{array}{ll}\hfill & {L}_{NT}\left(\theta \right)=\frac{1}{NT}\sum \limits_{i=1}^N\sum \limits_{t=1}^T{l}_{it}\left(\theta \right),\\ {}\hfill & {l}_{it}\left(\theta \right)=\log {\sigma}_{it}^2\left(\theta \right)+\frac{y_{it}^2}{\sigma_{it}^2\left(\theta \right)},\end{array}} $$ (8)
where σ i t 2 $$ {\sigma}_{it}^2 $$ is generated from model (3) as
σ i t 2 = ω + α ( 1 ) 1 { y i , t 1 0 } + α ( 2 ) 1 { y i , t 1 < 0 } y i , t 1 2 + λ d i 1 j = 1 N a i j y j , t 1 2 + β σ i , t 1 2 , $$ {\sigma}_{it}^2=\omega +\left\{{\alpha}^{(1)}{1}_{\left\{{y}_{i,t-1}\ge 0\right\}}+{\alpha}^{(2)}{1}_{\left\{{y}_{i,t-1}<0\right\}}\right\}{y}_{i,t-1}^2+\lambda {d}_i^{-1}\sum \limits_{j=1}^N{a}_{ij}{y}_{j,t-1}^2+\beta {\sigma}_{i,t-1}^2, $$
and θ : = ( ω , α ( 1 ) , α ( 2 ) , λ , β ) 5 $$ \theta := {\left(\omega, {\alpha}^{(1)},{\alpha}^{(2)},\lambda, \beta \right)}^{\prime}\in {\mathbb{R}}^5 $$ is the parameter vector.
Since the evaluation of the exact value of (8) is infeasible in practice, it is convenient to approximate (8) with
L ˜ N T ( θ ) = 1 N T i = 1 N t = 1 T l ˜ i t ( θ ) , l ˜ i t ( θ ) = log σ ˜ i t 2 ( θ ) + y i t 2 σ ˜ i t 2 ( θ ) , $$ {\displaystyle \begin{array}{ll}\hfill & {\tilde{L}}_{NT}\left(\theta \right)=\frac{1}{NT}\sum \limits_{i=1}^N\sum \limits_{t=1}^T{\tilde{l}}_{it}\left(\theta \right),\\ {}\hfill & {\tilde{l}}_{it}\left(\theta \right)=\log {\tilde{\sigma}}_{it}^2\left(\theta \right)+\frac{y_{it}^2}{{\tilde{\sigma}}_{it}^2\left(\theta \right)},\end{array}} $$ (9)
where σ ˜ i t 2 $$ {\tilde{\sigma}}_{it}^2 $$ is also generated from model (3) but with initial value σ ˜ i 0 2 = 0 $$ {\tilde{\sigma}}_{i0}^2=0 $$ . And the QMLE of θ Θ $$ \theta \in \Theta $$ is given by
θ ^ N T : = argmin θ Θ L ˜ N T ( θ ) . $$ {\hat{\theta}}_{NT}:= \underset{\theta \in \Theta}{\mathrm{argmin}}\kern0.20em {\tilde{L}}_{NT}\left(\theta \right). $$
To prove the asymptotic properties of θ ^ N T $$ {\hat{\theta}}_{NT} $$ , we need following assumptions aside from those required by Theorem 2:

Assumption 4. Θ $$ \Theta $$ is a compact subset of θ : ω > 0 , α ( 1 ) > 0 , α ( 2 ) > 0 , λ > 0 , β > 0 $$ \left\{\theta :\omega >0,{\alpha}^{(1)}>0,{\alpha}^{(2)}>0,\lambda >0,\beta >0\right\} $$ such that all θ Θ $$ \theta \in \Theta $$ satisfy (7) and Assumption 2, and the true parameter θ 0 Θ $$ {\theta}_0\in \Theta $$ is an interior point of Θ $$ \Theta $$ .

Assumption 5. sup ( i , t ) D sup θ Θ σ i t 2 ( θ ) p < $$ {\sup}_{\left(i,t\right)\in D}{\sup}_{\theta \in \Theta}{\left\Vert {\sigma}_{it}^2\left(\theta \right)\right\Vert}_p<\infty $$ for some p > 1 $$ p>1 $$ .

Assumption 6. 𝔼 ε i t 4 r < for some r > 2 $$ r>2 $$ , and following bounds exists:

sup ( i , t ) D σ i t 2 ( θ 0 ) 2 r < ; sup ( i , t ) D θ k σ i , t 2 ( θ 0 ) 2 r < ; sup ( i , t ) D 2 θ j k θ σ i t 2 ( θ 0 ) 2 < , $$ {\displaystyle \begin{array}{ll}\hfill & \underset{\left(i,t\right)\in D}{\sup }{\left\Vert {\sigma}_{it}^2\left({\theta}_0\right)\right\Vert}_{2r}<\infty; \\ {}\hfill & \underset{\left(i,t\right)\in D}{\sup }{\left\Vert \frac{\partial }{\partial {\theta}_k}{\sigma}_{i,t}^2\left({\theta}_0\right)\right\Vert}_{2r}<\infty; \\ {}\hfill & \underset{\left(i,t\right)\in D}{\sup }{\left\Vert \frac{\partial^2}{\partial {\theta}_j{\partial}_k\theta }{\sigma}_{it}^2\left({\theta}_0\right)\right\Vert}_2<\infty, \end{array}} $$
where θ k $$ {\theta}_k $$ denotes the k $$ k $$ th component of parameter vector θ $$ \theta $$ .

Assumption 7.The NED-size μ $$ \mu $$ in Theorem 2 satisfies r 2 2 r 2 μ > 2 $$ \frac{r-2}{2r-2}\mu >2 $$ with r $$ r $$ being the one in Assumption 6.

Assumptions 4 is also required by Zhou et al. (2020) to prove the asymptotic properties in the case when N $$ N $$ being fixed. With both T $$ T\to \infty $$ and N $$ N\to \infty $$ , additional assumptions as above are required to adopt the limit theorems of random fields. Specifically, Assumption 5 is required for l i t ( θ ) $$ {l}_{it}\left(\theta \right) $$ to satisfy the bound condition of law of large numbers (LLN) for random fields (assumption 2(a) in Jenish and Prucha, 2012); Assumption 6 facilitates the heredity of NED property from σ i t 2 ( θ 0 ) $$ {\sigma}_{it}^2\left({\theta}_0\right) $$ to the more complicated forms of first-order and second-order derivatives of L N T ( θ 0 ) $$ {L}_{NT}\left({\theta}_0\right) $$ ; Assumption 7 is a constraint on the decaying rate of NED coefficients, which is required by the central limit theorem (CLT) for random fields (assumption 4(c) in Jenish and Prucha, 2012). Of course, as we have remarked after Definition 1 that geometric NED means NED of size- μ $$ \mu $$ for all μ > 0 $$ \mu >0 $$ , therefore Assumption 7 would be trivial under geometric NED.

Theorem 3.Under Assumptions required by Theorem 2, Assumption 4 and Assumption 5, the quasi-maximum likelihood estimator θ ^ N T $$ {\hat{\theta}}_{NT} $$ is consistent, i.e.

θ ^ N T p θ 0 , $$ {\hat{\theta}}_{NT}\overset{p}{\to }{\theta}_0, $$
as N T $$ NT\to \infty $$ ; If Assumptions 6 and 7 also hold, and the smallest eigenvalue λ min ( N T ) $$ {\lambda}_{\mathrm{min}}\left({\Sigma}_{NT}\right) $$ of
N T : = κ 4 1 N T ( i , t ) D N T 𝔼 1 σ i t 4 ( θ 0 ) θ σ i t 2 ( θ 0 ) θ σ i t 2 ( θ 0 ) ,
satisfies that
inf N T 1 λ m i n ( N T ) > 0 , $$ \underset{NT\ge 1}{\operatorname{inf}}{\lambda}_{min}\left({\Sigma}_{NT}\right)>0, $$ (10)
then θ ^ N T $$ {\hat{\theta}}_{NT} $$ is asymptotically normal as N T $$ NT\to \infty $$ and N = o ( T ) $$ N=o(T) $$ :
N T N T 1 / 2 ( θ ^ N T θ 0 ) d N ( 0 , ( κ 4 1 ) 2 I 5 ) , $$ \sqrt{NT}{\Sigma}_{NT}^{1/2}\left({\hat{\theta}}_{NT}-{\theta}_0\right)\overset{d}{\to}\mathrm{N}\left(0,{\left({\kappa}_4-1\right)}^2{I}_5\right), $$
where I 5 $$ {I}_5 $$ is the 5 × 5 $$ 5\times 5 $$ identity matrix.

Remark.Condition (10) can be implied if the smallest eigenvalues λ min ( i , t ) $$ {\lambda}_{\mathrm{min}}^{\left(i,t\right)} $$ of 𝔼 1 σ i t 4 ( θ 0 ) θ σ i t 2 ( θ 0 ) θ σ i t 2 ( θ 0 ) satisfy that inf N T 1 inf ( i , t ) D N T λ m i n ( i , t ) > 0 . $$ {\operatorname{inf}}_{NT\ge 1}{\operatorname{inf}}_{\left(i,t\right)\in {D}_{NT}}{\lambda}_{min}^{\left(i,t\right)}>0. $$

As we will show in the proof of Proposition 5.1, κ 4 $$ {\kappa}_4 $$ and N T $$ {\Sigma}_{NT} $$ above could be approximated by
κ ^ 4 : = 1 N T i = 1 N t = 1 T y i t 4 σ ˜ i t 4 ( θ ^ N T ) , $$ {\hat{\kappa}}_4:= \frac{1}{NT}\sum \limits_{i=1}^N\sum \limits_{t=1}^T\frac{y_{it}^4}{{\tilde{\sigma}}_{it}^4\left({\hat{\theta}}_{NT}\right)}, $$ (11)
and
^ N T : = κ ^ 4 1 N T i = 1 N t = 1 T 1 σ ˜ i t 4 ( θ ^ N T ) σ ˜ i t 2 ( θ ^ N T ) θ σ ˜ i t 2 ( θ ^ N T ) θ , $$ {\hat{\Sigma}}_{NT}:= \frac{{\hat{\kappa}}_4-1}{NT}\sum \limits_{i=1}^N\sum \limits_{t=1}^T\left[\frac{1}{{\tilde{\sigma}}_{it}^4\left({\hat{\theta}}_{NT}\right)}\frac{\partial {\tilde{\sigma}}_{it}^2\left({\hat{\theta}}_{NT}\right)}{\partial \theta}\frac{\partial {\tilde{\sigma}}_{it}^2\left({\hat{\theta}}_{NT}\right)}{\partial {\theta}^{\prime }}\right], $$ (12)
respectively. The later could be calculated recursively as θ σ ˜ i t 2 ( θ ^ N T ) = u ˜ i , t 1 + β ^ θ σ ˜ i , t 1 2 ( θ ^ N T ) $$ \frac{\partial }{\partial \theta }{\tilde{\sigma}}_{it}^2\left({\hat{\theta}}_{NT}\right)={\tilde{\mathbf{u}}}_{i,t-1}+\hat{\beta}\frac{\partial }{\partial \theta }{\tilde{\sigma}}_{i,t-1}^2\left({\hat{\theta}}_{NT}\right) $$ where
u ˜ i , t 1 = 1 y i , t 1 2 1 { ε ^ i , t 1 0 } y i , t 1 2 1 { ε ^ i , t 1 < 0 } j = 1 N w i , j y j , t 1 2 σ ˜ i , t 1 2 ( θ ^ N T ) . $$ {\tilde{\mathbf{u}}}_{i,t-1}=\left(\begin{array}{l}1\\ {}{y}_{i,t-1}^2{1}_{\left\{{\hat{\varepsilon}}_{i,t-1}\ge 0\right\}}\\ {}{y}_{i,t-1}^2{1}_{\left\{{\hat{\varepsilon}}_{i,t-1}<0\right\}}\\ {}\sum \limits_{j=1}^N{w}_{i,j}{y}_{j,t-1}^2\\ {}{\tilde{\sigma}}_{i,t-1}^2\left({\hat{\theta}}_{NT}\right)\end{array}\right). $$

5 Tests on Threshold Effect and Residuals

5.1 A Wald Test for the Threshold Effect

Given a null hypothesis
H 0 : Γ θ 0 = η , $$ {H}_0:\Gamma {\theta}_0=\eta, $$ (13)
where Γ $$ \Gamma $$ is an s × 5 $$ s\times 5 $$ matrix with rank s $$ s $$ and η $$ \eta $$ is an s $$ s $$ -dimensional vector, we could define a Wald test statistic as follows:
W N T : = ( Γ θ ^ N T η ) Γ N T ( κ ^ 4 1 ) 2 ^ N T 1 Γ 1 ( Γ θ ^ N T η ) , $$ {W}_{NT}:= {\left(\Gamma {\hat{\theta}}_{NT}-\eta \right)}^{\prime }{\left\{\frac{\Gamma}{NT}{\left({\hat{\kappa}}_4-1\right)}^2{\hat{\Sigma}}_{NT}^{-1}{\Gamma}^{\prime}\right\}}^{-1}\left(\Gamma {\hat{\theta}}_{NT}-\eta \right), $$ (14)
where κ ^ 4 $$ {\hat{\kappa}}_4 $$ and ^ N T $$ {\hat{\Sigma}}_{NT} $$ are defined in (11) and (12).

By the asymptotic normality of θ ^ N T $$ {\hat{\theta}}_{NT} $$ , W N T $$ {W}_{NT} $$ could also be proved to follow a canonical asymptotic distribution as in Proposition 5.1.

Proposition 5.1.Under the same assumptions required by Theorem 3, as T $$ T\to \infty $$ and N = o ( T ) $$ N=o(T) $$ , the Wald test statistic defined in (14) asymptotically follows a χ 2 $$ {\chi}^2 $$ distribution with degree of freedom s $$ s $$ , i.e.

W N T d χ s 2 . $$ {W}_{NT}\overset{d}{\to }{\chi}_s^2. $$

5.2 A White Noise Test on the Residuals

There has been a large literature investigating high-dimensional time series models, including Xu and Lee (2015), Zhu et al. (2017) and Xu et al. (2022) among others, but none of them has used diagnostic tools to check the model adequacy. In this section, we will introduce a high-dimensional white noise test developed by Li et al. (2019) that can be applied to the diagnostic of high-dimensional models including ours.

Assume we have residuals { r t : 1 t T } $$ \left\{{\mathbf{r}}_t:1\le t\le T\right\} $$ , where r t : = ( r 1 t , , r N t ) $$ {\mathbf{r}}_t:= {\left({r}_{1t},\dots, {r}_{Nt}\right)}^{\prime } $$ . We want to test whether { r t : 1 t T } $$ \left\{{\mathbf{r}}_t:1\le t\le T\right\} $$ are high-dimensional white noises, i.e. there exists a matrix P $$ P $$ such that
H 0 : r t = P z t , $$ {H}_0:{\mathbf{r}}_t=P{\mathbf{z}}_t, $$ (15)
where z t = ( ε 1 t , , ε N t ) $$ {\mathbf{z}}_t={\left({\varepsilon}_{1t},\dots, {\varepsilon}_{Nt}\right)}^{\prime } $$ . The test statistic is the sum of squared singular values of first q $$ q $$ lagged sample autocovariance matrices:
G q : = τ = 1 q t r ( Ŝ τ Ŝ τ ) , $$ {G}_q:= \sum \limits_{\tau =1}^q tr\left({\hat{S}}_{\tau }{\hat{S}}_{\tau}^{\prime}\right), $$ (16)
where Ŝ τ = 1 T t = 1 T r t r t τ $$ {\hat{S}}_{\tau }=\frac{1}{T}{\sum}_{t=1}^T{\mathbf{r}}_t{\mathbf{r}}_{t-\tau}^{\prime } $$ with r t = r t + T $$ {\mathbf{r}}_t={\mathbf{r}}_{t+T} $$ when t 0 $$ t\le 0 $$ .
If A $$ A $$ is unknown, the sample covariance matrix of r t $$ {\mathbf{r}}_t $$ is Ŝ 0 = 1 T t = 1 T r t r t $$ {\hat{S}}_0=\frac{1}{T}{\sum}_{t=1}^T{\mathbf{r}}_t{\mathbf{r}}_t^{\prime } $$ . According to (2.8) in Li et al. (2019), we reject (15) if
G q N 2 q T ŝ 1 2 2 N 2 q T 2 ŝ 2 N T ŝ 1 2 2 > Z α , $$ \frac{G_q-\frac{N^2q}{T}{\hat{s}}_1^2}{\sqrt{\frac{2{N}^2q}{T^2}{\left({\hat{s}}_2-\frac{N}{T}{\hat{s}}_1^2\right)}^2}}>{Z}_{\alpha }, $$
where ŝ 1 = 1 N t r ( Ŝ 0 ) $$ {\hat{s}}_1=\frac{1}{N} tr\left({\hat{S}}_0\right) $$ , ŝ 2 = 1 N t r ( Ŝ 0 2 ) $$ {\hat{s}}_2=\frac{1}{N} tr\left({\hat{S}}_0^2\right) $$ and Z α $$ {Z}_{\alpha } $$ is the upper- α $$ \alpha $$ quantile of standard normal distribution.
Note that { r t : 1 t T } $$ \left\{{\mathbf{r}}_t:1\le t\le T\right\} $$ being white-noise means that the residuals are uncorrelated over t $$ t $$ . However, it doesn't indicate that the residuals are uncorrelated over both i $$ i $$ and t $$ t $$ . The later indicates a stronger adequacy of high-dimensional model. We could assume that P = I N $$ P={I}_N $$ in the null hypothesis, and by (2.5) in Li et al. (2019), we reject H 0 : r t = z t $$ {H}_0:{\mathbf{r}}_t={\mathbf{z}}_t $$ if
G q N 2 q T 2 N 2 q T 2 + 4 N 3 q 2 ( κ 4 3 ) T 3 + 8 N 3 q 2 T 3 > Z α . $$ \frac{G_q-\frac{N^2q}{T}}{\sqrt{\frac{2{N}^2q}{T^2}+\frac{4{N}^3{q}^2\left({\kappa}_4-3\right)}{T^3}+\frac{8{N}^3{q}^2}{T^3}}}>{Z}_{\alpha }. $$

6 Simulation Study

6.1 Network Simulation

The symmetric matrix A $$ A $$ in model 4 represents an undirected network structure, the pattern of which varies over different application scenarios. In this simulation study, we tend to use four different mechanisms of simulating corresponding network. The network structure in Example 1 adapts to Assumption 3, which is required by geometric NED as we have shown in Theorem 2. Simulation mechanisms introduced in Examples 2–4 are for testing the robustness of our estimation, against network structures that may violate Assumption 3.

Example 1.For each node i 1 , 2 , , N $$ i\in \left\{1,2,\dots, N\right\} $$ , it is connected to node j $$ j $$ only if j $$ j $$ is inside i $$ i $$ 's D $$ D $$ -neighborhood. That is, in the adjacency matrix, a i j = 1 $$ {a}_{ij}=1 $$ if 0 < | i j | D $$ 0<\mid i-j\mid \le D $$ and a i j = 0 $$ {a}_{ij}=0 $$ otherwise. Figure 1(a) is a visualization of such a network with N = 100 $$ N=100 $$ and D = 10 $$ D=10 $$ .

Details are in the caption following the image
Visualized network structures with N = 100. (a) Example 1 (D = 10) (b) Example 2; (c) Example 3; (d) Example 4 (K = 10)

Example 2. (Network structure with random distribution)For each node i 1 , 2 , , N $$ i\in \left\{1,2,\dots, N\right\} $$ , we generate D i $$ {D}_i $$ from uniform distribution U ( 0 , 5 ) $$ U\left(0,5\right) $$ , and then draw [ D i ] $$ \left[{D}_i\right] $$ samples randomly from 1 , 2 , , N $$ \left\{1,2,\dots, N\right\} $$ to form a set S i $$ {S}_i $$ ( [ x ] $$ \left[x\right] $$ denotes the integer part of x $$ x $$ ). A = ( a i j ) $$ A=\left({a}_{ij}\right) $$ could be generated by letting a i j = 1 $$ {a}_{ij}=1 $$ if j S i $$ j\in {S}_i $$ and a i j = 0 $$ {a}_{ij}=0 $$ otherwise. In a network simulated with such mechanism, as it is indicated in Figure 1(b), there is no significantly influential node (i.e. node with extremely large in-degree).

Example 3. (Network structure with power-law distribution)According to Clauset et al. (2009), for each node i $$ i $$ in such a network, D i $$ {D}_i $$ is generated the same way as in Example 2. Instead of uniformly selecting [ D i ] $$ \left[{D}_i\right] $$ samples from 1 , 2 , , N $$ \left\{1,2,\dots, N\right\} $$ , these samples are collected w.r.t. probability p i = s i / i = 1 N s i $$ {p}_i={s}_i/{\sum}_{i=1}^N{s}_i $$ where s i $$ {s}_i $$ is generated from a discrete power-law distribution s i = x x a $$ \mathbb{P}\left\{{s}_i=x\right\}\propto {x}^{-a} $$ with scaling parameter a = 2 . 5 $$ a=2.5 $$ . As shown in Figure 1(c), a few nodes have much larger in-degrees while most of them have less than 2. Compared to Example 2, network structure with power-law distribution exhibits larger gaps between the influences of different nodes. This type of network is suitable for modeling social media such as Twitter and Instagram, where celebrities have huge influence while the ordinary majority has little.

Example 4. (Network structure with stochastic blocks)As it was proposed in Nowicki and Snijders (2001), in a network with stochastic block structure, all nodes are divided into blocks and nodes from the same block are more likely to be connected comparing to those from different blocks. To simulate such structure, these N $$ N $$ nodes are randomly divided into K $$ K $$ groups by assigning labels 1 , 2 , , K $$ \left\{1,2,\dots, K\right\} $$ to every nodes with equal probability. For any two nodes i $$ i $$ and j $$ j $$ from the same group, let ( a i j = 1 ) = 0 . 5 $$ \mathbb{P}\left({a}_{ij}=1\right)=0.5 $$ while for those two from different groups, ( a i j = 1 ) = 0 . 001 / N $$ \mathbb{P}\left({a}_{ij}=1\right)=0.001/N $$ . Hence, it is very unlikely for nodes to be connected across groups. Our simulated network successfully mimics this characteristic as Figure 1(d) shows clear boundaries between groups. Block network also has its advantage in practical perspective. For instance, the price of one stock is highly relevant to those in the same industry sector.

In the next section, the simulation study is carried out on datasets that are generated according to the process (3) in conjunction with three types of adjacency matrices in Examples 1–4.

6.2 Simulation Results

Setting the true parameters θ 0 $$ {\theta}_0 $$ as ( 0 . 1 , 0 . 1 , 0 . 2 , 0 . 2 , 0 . 2 ) $$ {\left(0.1,0.1,0.2,0.2,0.2\right)}^{\prime } $$ , we generate data according to process (3) with different sample sizes T $$ T $$ and number of dimensions N $$ N $$ . In our setting, T $$ T $$ increases from 50 to 4000, while N $$ N $$ also increases at relatively slower rates of 𝒪 ( T ) and 𝒪 ( T / log ( T ) ) respectively, as it is showed in the following table:

T $$ T $$ 50 100 200 500 800 1000 1500 2000 2500 3000 4000
N T $$ N\approx \sqrt{T} $$ 7 10 14 22 28 31 38 44 50 54 63
N T / log ( T ) $$ N\approx T/\log (T) $$ 12 21 37 80 119 144 205 263 319 374 482

For each combination of ( T , N ) $$ \left(T,N\right) $$ , M = 1000 $$ M=1000 $$ datasets will be simulated independently, according to (3). Based on the m $$ m $$ th ( m = 1 , 2 , , M $$ m=1,2,\dots, M $$ ) dataset, the estimation of θ 0 $$ {\theta}_0 $$ will be carried out and the estimation result is denoted as θ ^ m = ( θ ^ k m ) = ( ω ^ m , α ^ m ( 1 ) , α ^ m ( 2 ) , λ ^ m , β ^ m ) $$ {\hat{\theta}}_m={\left({\hat{\theta}}_{km}\right)}^{\prime }={\left({\hat{\omega}}_m,{\hat{\alpha}}_m^{(1)},{\hat{\alpha}}_m^{(2)},{\hat{\lambda}}_m,{\hat{\beta}}_m\right)}^{\prime } $$ . For k = 1 , 2 , 3 , 4 , 5 $$ k=\left\{1,2,3,4,5\right\} $$ , the following two measurements are used to evaluate the performance of simulation results:
  1. root-mean-square error: RMSE k = M 1 m = 1 M ( θ ^ k m θ k 0 ) 2 $$ {\mathrm{RMSE}}_k=\sqrt{M^{-1}{\sum}_{m=1}^M{\left({\hat{\theta}}_{km}-{\theta}_{k0}\right)}^2} $$ ,
  2. coverage probability: CP k = M 1 m = 1 M 1 θ k 0 C I k m $$ {\mathrm{CP}}_k={M}^{-1}{\sum}_{m=1}^M{1}_{\left\{{\theta}_{k0}\in C{I}_{km}\right\}} $$ .
CI k m $$ {\mathrm{CI}}_{km} $$ is the 95% confidence interval defined as
CI k m = θ ^ km z 0 . 975 SE ^ km , θ ^ km + z 0 . 975 SE ^ km , $$ {\mathrm{CI}}_{km}=\left({\hat{\theta}}_{\mathrm{km}}-{z}_{0.975}{\hat{\mathrm{SE}}}_{\mathrm{km}},{\hat{\theta}}_{\mathrm{km}}+{z}_{0.975}{\hat{\mathrm{SE}}}_{\mathrm{km}}\right), $$
where the estimated SE SE ^ km $$ {\hat{\mathrm{SE}}}_{\mathrm{km}} $$ could be calculated as the square root of k $$ k $$ th diagonal element of ( N T ) 1 ( κ ^ 4 1 ) ^ 1 $$ {(NT)}^{-1}\left({\hat{\kappa}}_4-1\right){\hat{\Sigma}}^{-1} $$ and z 0 . 975 $$ {z}_{0.975} $$ is the 0.975th quantile of standard normal distribution. To eliminate the effect of starting points, a different initial guess of θ $$ \theta $$ is used for each m $$ m $$ .

As it is demonstrated in line graphs (c) and (d) in Figures 2-5, the consistency of the estimator is obvious since RMSE drops toward zero when T $$ T $$ and N $$ N $$ increases. Additionally, SE ^ $$ \hat{\mathrm{SE}} $$ is proved to provide reliable estimation of true SE since the coverage probability (CP) converges to its theoretical value of 95% in graphs (a) and (b) in Figures 2-5. In conclusion, the asymptotic properties of our estimator in Theorem 3 are well supported by our simulation results, even for network structures in Examples 2–4 that may violate Assumption 3.

Details are in the caption following the image
Results of simulation (Example 1). (a) N T $$ N\approx \sqrt{T} $$ ; (b) N T / log ( T ) $$ N\approx T/\log (T) $$ ; (c) N T $$ N\approx \sqrt{T} $$ ; (d) N T / log ( T ) $$ N\approx T/\log (T) $$
Details are in the caption following the image
Results of simulation (Example 2). (a) N T $$ N\approx \sqrt{T} $$ ; (b) N T / log ( T ) $$ N\approx T/\log (T) $$ ; (c) N T $$ N\approx \sqrt{T} $$ ; (d) N T / log ( T ) $$ N\approx T/\log (T) $$
Details are in the caption following the image
Results of simulation (Example 3). (a) N T $$ N\approx \sqrt{T} $$ ; (b) N T / log ( T ) $$ N\approx T/\log (T) $$ ; (c) N T $$ N\approx \sqrt{T} $$ ; (d) N T / log ( T ) $$ N\approx T/\log (T) $$
Details are in the caption following the image
Results of simulation (Example 4). (a) N T $$ N\approx \sqrt{T} $$ ; (b) N T / log ( T ) $$ N\approx T/\log (T) $$ ; (c) N T $$ N\approx \sqrt{T} $$ ; (d) N T / log ( T ) $$ N\approx T/\log (T) $$

Remark.It is worth noticing that the CPs show a lower efficiency of convergence in general when N = 𝒪 ( T / log ( T ) ) , comparing with the case when N = 𝒪 ( T ) . Such phenomenon raises an assumption that the performance of the estimator SE ^ $$ \hat{\mathrm{SE}} $$ is highly related to the ratio of T $$ T $$ and N $$ N $$ . We repeat the simulation for 61 different combinations of ( T , N ) $$ \left(T,N\right) $$ and the scatter graph Figure 6 indicates that such assumption could be true, and the convergence of SE ^ $$ \hat{\mathrm{SE}} $$ only requires T / N $$ T/N\to \infty $$ , which includes what we have in Theorem 3, where T $$ T\to \infty $$ and N $$ N\to \infty $$ at a lower rate.

Details are in the caption following the image
Coverage probabilities varies with the ratio of T $$ T $$ and N $$ N $$

7 Empirical Data Analysis

In addition to simulation studies, we want to test our model using real data from Chinese Shanghai Stock Exchange (SSE) and Shenzhen Stock Exchange (SZSE). The dataset consists of daily log returns of 286 stocks, which are observed in two consecutive years of 2019 and 2020 ( T = 487 $$ T=487 $$ except for closing days). These stocks come from four industry sectors as follows:
  • 75 stocks from automotive industry sector;
  • 73 stocks from financial industry sector;
  • 68 stocks from information industry sector;
  • 70 stocks from pharmaceutical industry sector.

And our model is tested within each sector, in which the number of stocks is approximately T / log ( T ) 79 $$ T/\log (T)\approx 79 $$ . Hence the estimates and inferences could be trusted according to the simulation study.

As an initial impression of data from each category the time plots of daily average log returns are presented in Figure 7. We also have the shareholder information of each stock, based on which two stocks are considered as connected when they share at least one common shareholder among their top ten shareholders. By this principle, four adjacency matrices are constructed and visualized as Figure 8 for four different industry sectors. Although it is quite intuitive to tell from Figure 8 the sparsity of these four networks, we tend to use the network density (ND) as a quantified measurement, which is defined by the ratio of the number of existing edges to the number of potential connections:
ND : = 100 % × i = 1 N d i N ( N 1 ) . $$ \mathrm{ND}:= 100\%\times \frac{\sum \limits_{i=1}^N{d}_i}{\mathrm{N}\left(N-1\right)}. $$
Details are in the caption following the image
Average log returns of stocks from different industry sectors. (a) Automotive industry; (b) Financial industry; (c) Information industry; (d) Pharmaceutical industry
Details are in the caption following the image
Visualization of networks for stocks from different industry sectors. (a) Automotive industry (ND = 1 . 26 % $$ 1.26\% $$ ); (b) Financial industry (ND = 8 . 11 % $$ 8.11\% $$ ); (c) Information industry (ND = 1 . 58 % $$ 1.58\% $$ ); (d) Pharmaceutical industry (ND = 2 . 82 % $$ 2.82\% $$ )

The results of parameter estimation is summarized in Table I. It is worth noting that the estimated network effect λ $$ \lambda $$ for automotive industry sector is not statistically significant while the other estimates for other coefficients or estimates from other sectors are all significant at 5% level. As indicated in Figure 8(a), this could be caused by the sparsity of the network structure as the data from automotive industry has the lowest network density comparing to others. Positive estimates of λ $$ \lambda $$ indicate positive correlation between the return of a stock and the returns of its neighbors. Comparing with other parameters, the estimates of β $$ \beta $$ are much larger for all four categories. Strong memory of volatility has been observed in many econometric studies on daily data, and such persistence would be stronger with data sampled at higher frequency according to Nelson (1991).

Table I. Estimation results based on daily log-returns (2019&2020) of stocks from four industries.
Parameter Estimation SE p-Value Parameter Estimation SE p-Value
Automotive industry Financial industry
ω $$ \omega $$ 0.000099 5.83e $$ - $$ 07 < $$ < $$ 0.05 ω $$ \omega $$ 0.000043 3.12e $$ - $$ 06 < $$ < $$ 0.05
α ( 1 ) $$ {\alpha}^{(1)} $$ 0.199408 1.08e $$ - $$ 02 < $$ < $$ 0.05 α ( 1 ) $$ {\alpha}^{(1)} $$ 0.247765 1.41e $$ - $$ 02 < $$ < $$ 0.05
α ( 2 ) $$ {\alpha}^{(2)} $$ 0.136423 1.01e $$ - $$ 02 < $$ < $$ 0.05 α ( 2 ) $$ {\alpha}^{(2)} $$ 0.202237 1.47e $$ - $$ 02 < $$ < $$ 0.05
λ $$ \lambda $$ 0.004591 4.71e $$ - $$ 03 0.16465 λ $$ \lambda $$ 0.010469 5.35e $$ - $$ 03 < $$ < $$ 0.05
β $$ \beta $$ 0.727756 1.17e $$ - $$ 02 < $$ < $$ 0.05 β $$ \beta $$ 0.737272 1.09e $$ - $$ 02 < $$ < $$ 0.05
Information industry Pharmaceutical industry
ω $$ \omega $$ 0.000105 6.39e $$ - $$ 06 < $$ < $$ 0.05 ω $$ \omega $$ 0.000063 4.15e $$ - $$ 06 < $$ < $$ 0.05
α ( 1 ) $$ {\alpha}^{(1)} $$ 0.172737 9.34e $$ - $$ 03 < $$ < $$ 0.05 α ( 1 ) $$ {\alpha}^{(1)} $$ 0.180950 1.05e $$ - $$ 02 < $$ < $$ 0.05
α ( 2 ) $$ {\alpha}^{(2)} $$ 0.122312 8.86e $$ - $$ 03 < $$ < $$ 0.05 α ( 2 ) $$ {\alpha}^{(2)} $$ 0.131722 1.06e $$ - $$ 02 < $$ < $$ 0.05
λ $$ \lambda $$ 0.009475 4.03e $$ - $$ 03 < $$ < $$ 0.05 λ $$ \lambda $$ 0.012929 4.06e $$ - $$ 03 < $$ < $$ 0.05
β $$ \beta $$ 0.745699 1.11e $$ - $$ 02 < $$ < $$ 0.05 β $$ \beta $$ 0.753305 1.11e $$ - $$ 02 < $$ < $$ 0.05
We now conduct a Wald test on the existence of threshold effect based on the estimated parameters. By letting Γ : = ( 0 , 1 , $$ \Gamma := \Big(0,1, $$ - 1 , 0 , 0 ) $$ 1,0,0\Big) $$ and η : = 0 $$ \eta := 0 $$ in (13), we can make a null hypothesis as follows:
H 0 : α 0 ( 1 ) = α 0 ( 2 ) . $$ {H}_0:{\alpha}_0^{(1)}={\alpha}_0^{(2)}. $$
As it is indicated in Table II, we could reject the null hypothesis with strong confidence and conclude that there exists extremely significant threshold effect within each industry sector.
Table II. p-Values of Wald test on H 0 : α 0 ( 1 ) = α 0 ( 2 ) $$ {H}_0:{\alpha}_0^{(1)}={\alpha}_0^{(2)} $$
Automotive industry Financial industry Information industry Pharmaceutical industry
1.09e $$ - $$ 10 2.16e $$ - $$ 07 3.8e $$ - $$ 06 3.17e $$ - $$ 06

Using the diagnostic tool introduced in Section 5.2, we could check the model adequacy by inspecting the correlations between residual vectors r t = y 1 t σ ˜ 1 t ( θ ^ N T ) , , y N t σ ˜ N t ( θ ^ N T ) $$ {\mathbf{r}}_t={\left[\frac{y_{1t}}{{\tilde{\sigma}}_{1t}\left({\hat{\theta}}_{NT}\right)},\dots, \frac{y_{Nt}}{{\tilde{\sigma}}_{Nt}\left({\hat{\theta}}_{NT}\right)}\right]}^{\prime } $$ . We will test null hypothesis H 0 : r t = P z t $$ {H}_0:{\mathbf{r}}_t=P{\mathbf{z}}_t $$ with P $$ P $$ being unknown and P = I N $$ P={I}_N $$ respectively, the results are summarized in Table III. In all sectors, we cannot reject the hypothesis that the residual vectors are high-dimensional white noises with 𝔼 r t = 0 and Var ( r t ) = P P $$ \mathrm{Var}\left({\mathbf{r}}_t\right)=P{P}^{\prime } $$ over t $$ t $$ . However, the stronger hypothesis H 0 : r t = z t $$ {H}_0:{\mathbf{r}}_t={\mathbf{z}}_t $$ is rejected, as there exist correlations between residuals y i t σ ˜ i t ( θ ^ N T ) $$ \left\{\frac{y_{it}}{{\tilde{\sigma}}_{it}\left({\hat{\theta}}_{NT}\right)}\right\} $$ with different i $$ i $$ . We might be able to eliminate such deficiency in the adequacy of our model by heterogeneous parameterization with coefficients as ω i $$ {\omega}_i $$ . α i ( 1 ) $$ {\alpha}_i^{(1)} $$ , α i ( 2 ) $$ {\alpha}_i^{(2)} $$ , λ i $$ {\lambda}_i $$ and β i $$ {\beta}_i $$ , or by considering a dynamic network structure. However, the purpose of the introduction of network structure is to reduce the number of parameters of high-dimensional time series. Besides, deriving limit theorems for models with heterogeneous parameters or dynamic network could be theoretically challenging.

Table III. Results of high-dimensional white noise test on H 0 : r t = P z t $$ {H}_0:{\mathbf{r}}_t=P{\mathbf{z}}_t $$ with q = 3 $$ q=3 $$ and α = 0 . 01 $$ \alpha =0.01 $$
Automotive industry Financial industry Information industry Pharmaceutical industry
P $$ P $$ is unknown Not rejected Not rejected Not rejected Not rejected
P = I N $$ P={I}_N $$ Rejected Rejected Rejected Rejected

On the other hand, our results on asymmetric effect of positive and negative news are quite different compared to what was derived from univariate data in the literature. For instance, in a study by Engle and Ng (1993) on daily returns of Japanese stock index TOPIX, it was found that negative news would have larger impact on future volatility. Such phenomenon is reasonable in stock market since investors would lose confidence to a certain asset when it performs badly, hence they would adjust their portfolio and add more uncertainty to the future. However, it is not necessarily the case if we take into consideration the whole picture instead of looking at one individual and ignoring possible impact of its neighbors in the same system. In our estimation results, α ( 1 ) $$ {\alpha}^{(1)} $$ are uniformly larger than α ( 2 ) $$ {\alpha}^{(2)} $$ , indicating a larger impact of good news on volatility. A more precise conclusion would be that the volatility of one individual is more sensitive to its own good news, which actually does not contradict the conclusion of Engle and Ng (1993), since in the univariate case, how much proportion of the ‘bad news’ effect is actually contributed by bad performance in systematic perspective remains unknown. Our results show that good news has larger ‘local influence’ as it is indicated by α ( 1 ) $$ {\alpha}^{(1)} $$ , while there is a possibility that bad news, despite of having less ‘local influence’, spreads faster and has larger ‘global influence’ on the neighbors through network connection. Such potential leads to a future extension of our model that the threshold effect could be further applied on the coefficient λ $$ \lambda $$ , allowing good news and bad news to have asymmetric network effect.

8 Conclusion

In this article, we propose a TNGARCH model by taking consideration of network effect, as well as the threshold effect of shocks on volatilities. Our model can be applied to describe asymmetric properties for volatilities of high dimensional time series without increasing the parameterization complexity. Strict stationarity when N $$ N $$ is fixed, as well as near-epoch dependence when N $$ N\to \infty $$ are discussed. Then the parameters are estimated by quasi-maximum likelihood estimation, the consistency and the asymptotic normality of the proposed estimator are proved as well. The results of simulation study support the theoretical properties of QMLE. At last, our model is fitted to real stock data containing 710 stocks of four industries from SSE and SZSE. Empirical results reveal that although volatility is more sensitive to bad news in univariate case, with network structure being considered, there is a possibility that majority of the revision of individual volatility is due to the impact of bad news of its neighbors, hence the ‘local influence’ of bad news is not necessarily larger than that of good news in such case.

There is room for extension of our methodology, which could lead to interesting topics for future research. In Theorem 3 we have derived asymptotic properties when T $$ T\to \infty $$ and N $$ N\to \infty $$ at a lower rate, our estimation method enables us to make reliable inference on parameters even when the data has hundreds of dimensions according to the simulation study. The limitation is also obvious, for example, as shown in Figures 2-5, to get a decent approximation of standard errors, we need to collect 4000 samples even though the number of dimensions is about 482, which could be even higher in real-world situations: user data collected from a social network often consists of millions of individual accounts whereas it may be impossible to collect sufficient number of samples over time even at daily frequency. Therefore our model would be applicable in a much larger scale if its statistical properties could be derived when N $$ N $$ increases at the same rate, or even higher rate compared to T $$ T $$ . As far as we know, there is no published work in the literature to solve this problem theoretically for GARCH-type models. Another limitation of our model is that the way we consider the network effect is simplified in two aspects: The network structure is deterministic rather than stochastic over time, embedding a random network in our model would make more sense, it would nevertheless raise the complexity of the model, and may cause problems in the estimation of parameters (see Chandrasekhar and Lewis, 2011); Moreover, there is only one type of individual-to-individual relation considered since the network structure is constructed solely based on common shareholders. Zhu et al. (2023) constructed their factor-augmented network using several types of relations, including individual-to-individual relations and factor-to-individual relations. Bringing more information into consideration would possibly improve the adequacy of the model and we will leave it for future research.

Acknowledgments

Many thanks to the Editor, the Co-Editor and the Referees for their insightful comments. The referees' suggestions lead to significant improvement of our article. The second author's work was partially supported by the National Natural Science Foundation of China (Grant No.12171161).

    Data Availability Statement

    Our dataset consists of daily log returns of 286 stocks, which are observed in two consecutive years of 2019 and 2020 (except for holidays) from Chinese Shanghai Stock Exchange (SSE) and Shenzhen Stock Exchange (SZSE). The data are available at request. The data that support the findings of this study are available at request.

      The full text of this article hosted at iucr.org is unavailable due to technical difficulties.