We would like to thank three anonymous referees for their helpful comments. We also benefit from discussions with Yacine Aït-Sahalia, Torben Andersen, Peter Boswijk, Rob Engle, Christian Gouriéroux, Peter Reinhard Hansen, Jean Jacod, Ilze Kalnina, Frank Kleibergen, Eben Lazarus, Jia Li, Yingying Li, Albert Menkveld, Per Mykland, George Tauchen, Michel Vellekoop, Bas Werker, Dacheng Xiu, Xiye Yang, Xinghua Zheng, and seminar participants at the 2017 North American Summer Meeting (St. Louis, June 2017), 2017 European Meeting (Lisbon, August 2017) and 2018 China Meeting (Shanghai, June 2018) of the Econometric Society, International Conference on Quantitative Finance and Financial Econometrics (Marseille, June 2018), the 12th Annual SoFiE Conference (Shanghai, June 2019), the Econometric Society and Bocconi University Virtual World Congress (August, 2020), the Econometric Society European Winter Meeting (Virtual, 2020) and the North American Winter Meeting (Virtual, 2021), the Chinese University of Hong Kong, Hong Kong University of Science and Technology, NYU Shanghai, Peking University HSBC Business School, the University of Amsterdam, and the University of Cambridge. The project is partially sponsored by the Keynes Fund (JHUL).

About

Sections

PDF

Tools

Share a link

Email
Wechat
Bluesky

Abstract

We introduce the Realized moMents of Disjoint Increments (ReMeDI) paradigm to measure microstructure noise (the deviation of the observed asset prices from the fundamental values caused by market imperfections). We propose consistent estimators of arbitrary moments of the microstructure noise process based on high-frequency data, where the noise process could be serially dependent, endogenous, and nonstationary. We characterize the limit distributions of the proposed estimators and construct confidence intervals under infill asymptotics. Our simulation and empirical studies show that the ReMeDI approach is very effective to measure the scale and the serial dependence of microstructure noise. Moreover, the estimators are quite robust to model specifications, sample sizes, and data frequencies.

1 Introduction

Economic time series are often modeled as the sum of a latent process obtained from an underlying economic model and another term that reflects a variety of adjustments to or departures from the frictionless theoretical model, thus

$urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0001$ (1)

The two processes X and ε are generated by different mechanisms, and can have quite distinct statistical properties and economic interpretations. Both quantities may be of interest as they give interpretation of some underlying economic theory and its relevance for the observed data. However, since only the sum process Y is observable, this makes the estimation and inference about the underlying signal X and noise ε challenging.

We are concerned with applications of this framework in financial markets where the observed asset price¹ (Y) subsumes both the market microstructure noise (ε) and the efficient price (or fundamental value) (X). The fundamental theorem of asset pricing says that X should be a semimartingale process (Delbaen and Schachermayer (1994)). In practice, however, many market frictions, such as transaction costs, price discreteness, inventory holdings, information asymmetry, or measurement errors, may cause the observed prices to deviate from this ideal price. One may also want to allow for temporary mis-pricing (French and Roll (1986)) or fad effects (Lehmann (1990)); see also O'Hara (1995) and Hasbrouck (2007) for insightful reviews. A lot of early work proceeded on the basis that the microstructure noise process was i.i.d., but recently this assumption has been shown to be too strong; both theoretically and empirically, the microstructure noise may exhibit rich dynamics depending on its origin. If the microstructure effects are negligible, the observed price should be close to the efficient price and be unpredictable. Therefore, the dispersion and persistence of the microstructure noise serve as natural measures of market quality. Market quality is of concern to regulators and practitioners as well as academics; proxies for market quality are widely used in empirical analysis; see Linton and Mahmoodzadeh (2018).

We introduce a general econometric approach to measure microstructure noise in a nonparametric setting. Specifically, we propose a new estimator of the moments of a general dependent noise process based on the observed noisy high-frequency transaction prices; we call our estimator the Realized moMents of Disjoint Increments (ReMeDI). The estimation method is based on the differencing paradigm, which is widely used in microeconometrics to eliminate nuisance parameters; see, for example, Angrist and Pischke (2008).² We build on the general setup introduced in the seminal work of Jacod, Li, and Zheng (2017). Specifically, we assume that the underlying efficient price follows a semimartingale, which may accommodate stochastic volatility, jumps, etc. We allow the microstructure noise to be weakly dependent and to have a serial correlation of an unknown form that may decay at an algebraic rate; this may capture, for instance, the effects of clustered (or hidden) order flows or herding (Park and Sabourian (2011)). The microstructure noise is allowed to have time-varying and stochastic heteroscedasticity, which allows for intraday variation in the scale of the noise. The general setting we consider allows for random and endogenous observation schemes. We develop estimators of arbitrary moments of the microstructure noise; this includes the autocovariance function of powers of the noise process as well as other quantities of interest. We derive the stable convergence in law of the estimated quantities as the sample size increases on a given domain. We provide a consistent estimator of the asymptotic variance that allows us to quantify the accuracy of our estimator.

We present some simulation studies comparing the ReMeDI approach with the method of Jacod, Li, and Zheng (2017). We find that the ReMeDI approach is relatively robust to: the data frequency, the sample size, the tuning parameter, and the model specification. We provide an empirical study on an individual stock price, which reveals that the microstructure noise has nontrivial serial dependence, but that the dependence structure falls short of being long memory. This is consistent with leading microstructure models,³ and differs from the findings in Jacod, Li, and Zheng (2017).

The robustness of the ReMeDI approach as demonstrated in our simulation and empirical studies has an intuitive explanation. The differencing method works because the increments of X over disjoint intervals (the efficient returns) are small and/or uncorrelated, and what remains is attributed to ε. This property distinguishes the ReMeDI approach from alternative high-frequency estimators that rely structurally on the infill asymptotics.

There are a number of methods for estimation of the moments of noise and the parameters of the efficient price, specifically: the two-scale/multi-scale realized volatility by Zhang, Mykland, and Aït-Sahalia (2005), Zhang (2006), Aït-Sahalia, Mykland, and Zhang (2011); the optimal-sampling realized variance by Bandi and Russell (2008); the maximum likelihood estimators by Aït-Sahalia, Mykland, and Zhang (2005), Xiu (2010); the pre-averaging method developed in Jacod, Li, Mykland, Podolskij, and Vetter (2009), see also Li (2013); and the realized kernel by Barndorff-Nielsen, Hansen, Lunde, and Shephard (2008). Most of this literature only considers i.i.d. microstructure noise.

Several recent papers explore richer microstructure models by allowing for autocorrelated noise. The estimators of the second moments of noise in Da and Xiu (2019) and Li, Laeven, and Vellekoop (2020) are by-products of the integrated volatility estimators in the presence of autocorrelated noise. In a recent seminal paper, Jacod, Li, and Zheng (2017) introduced the first feasible procedure, called the local averaging (LA) method, to estimate arbitrary moments of microstructure noise using high-frequency data. They also introduced a general framework allowing for a stochastic observation scheme and a microstructure noise with a semimartingale “size process.” We follow their general setup and derive asymptotic properties of our estimators under this general framework. We differentiate our paper from Jacod, Li, and Zheng (2017) as follows. First, the ReMeDI method is based on differencing, while the LA method is based on deviations from local averages; both ideas are widely used in other contexts such as panel data and semiparametric estimation to eliminate nuisance parameters. Second, the ReMeDI approach works beyond the infill framework. Specifically, in the working paper version, Li and Linton (2019), we proved that the ReMeDI estimator is consistent and has an associated CLT in a long-span, non-infill setting. In this case, the method works provided the efficient price is a martingale, in which case its increments are uncorrelated at any horizon. The LA method, however, is inconsistent when applied to low-frequency data. Next, the finite sample performance of the LA estimators heavily depends on the sample size and the noise-to-signal ratio (the ratio of noise variance to the integrated volatility of the efficient price); see an analysis in Jacod, Li, and Zheng (2017). This may cause many issues in the implementations with real data.⁴ The bias of the ReMeDI estimators, by contrast, only depends on the slope of the autocovariance function of the microstructure noise, and in short memory contexts this bias can be very small. Last, the ReMeDI approach has another two advantages in real implementations: it is computationally very efficient,⁵ and it is very robust to a wide range of tuning parameters.

2 Continuous-Time Framework and Assumptions

We follow the general framework of Jacod, Li, and Zheng (2017) to specify the continuous-time efficient price process, the observation scheme, and the microstructure noise.⁶

2.1 Efficient Price Process

We assume that the efficient price process X is an Itô semimartingale defined on a filtered probability space $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0002$ with the Grigelionis representation

$urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0003$ (2)

where W, $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0004$ are a Wiener process and a Poisson random measure on $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0005$ and E, respectively. Here, $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0006$ is a measurable Polish space on $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0007$ and the predictable compensator of $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0008$ is $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0009$ for some given σ-finite measure on $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0010$ ; see Jacod and Shiryaev (2003) for detailed introduction of the last two integrals. Moreover, X satisfies the following regularity condition:⁷

Assumption H.The process b is locally bounded, the process σ is càdlàg, there is a localizing sequence $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0011$ of stopping times, and, for each n, a deterministic nonnegative function $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0012$ on E satisfying $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0013$ such that $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0014$ for all $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0015$ satisfying $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0016$ .

The efficient price process is very general; it allows for stochastic volatility and jumps in both the price and volatility processes.

2.2 Observation Scheme

For each n, let $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0017$ be a sequence of random finite observed times (usually when a transaction or quote occurs) with $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0018$ , where $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0019$ is the set of nonnegative integers. We denote

$urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0020$ (3)

Here, $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0021$ is the stochastic number of observations recorded on the interval $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0022$ for $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0023$ , while $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0024$ is the ith spacing of the observation times. For any process V, we denote $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0025$ .

Let $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0026$ be a positive sequence of real numbers satisfying $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0027$ as $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0028$ . We may think of $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0029$ as the average magnitude of the spacings between successive observation times: if the observation times were equally spaced (the regular observation scheme), then $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0030$ would be proportional to that spacing. The difference between the regular observation scheme and the general scheme is characterized by two semimartingale intensity processes α, $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0031$ . Conditional upon an appropriate σ-algebra, the expectations of $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0032$ and $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0033$ are approximately equal to $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0034$ and $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0035$ , respectively. Specifically, we assume the following:

Assumption O.α, $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0036$ are two Itô semimartingales defined on $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0037$ satisfying Assumption H. We further assume there is a localizing sequence $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0038$ of stopping times and positive constants $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0039$ and κ such that:

1. For $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0040$ , we have $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0041$ and $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0042$ , where $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0043$ and $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0044$ are the left limits of $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0045$ and $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0046$ .
2.
Let $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0047$ be the smallest filtration satisfying
- (a) $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0048$ ,
- (b) $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0049$ is a $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0050$ stopping time for $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0051$ ,
- (c) $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0052$ , conditional $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0053$ , is independent of $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0054$ for $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0055$
3. With the restriction $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0056$ , and for all $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0057$ ,
$urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0058$ (4)

A useful consequence of our setting is the following convergence in probability:

$urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0059$ (5)

The observation times framework is very general, and includes, inter alia: regular sampling scheme, time-changed regular sampling scheme, modulated Poisson sampling scheme, and predictably-modulated random walk sampling scheme; see the discussion in Jacod, Li, and Zheng (2017).

2.3 Microstructure Noise

We suppose that the microstructure noise has a multiplicative form that allows for serial dependence, stochastic scale, and dependence of the scale on the efficient price process.

Assumption N.Let $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0060$ be a stationary ρ-mixing random sequence with mixing coefficients $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0061$ .⁸ We further assume that $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0069$ is centered at 0 with variance 1 and finite moments of all orders, and is independent of $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0070$ . Moreover, there is some $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0071$ , $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0072$ such that

$urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0073$ (6)

At stage n, the noise at time $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0074$ is given by

$urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0075$ (7)

where γ is a nonnegative Itô semimartingale on $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0076$ satisfying Assumption H and is not identically zero on any interval.

Remark 1.To obtain limit results, we shall suppose that $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0077$ for consistency and that $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0078$ to derive the limit distribution, which allows for quite strong dependence close to the long memory boundary. Jacod, Li, and Zheng (2017) required $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0079$ for consistency and $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0080$ to establish the limit distribution.

2.4 The Observed Noisy Price

Finally, the observed noisy price $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0081$ is given by (for $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0082$ )

$urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0083$ (8)

Note that both X and ε are latent; only Y is observable. Our purpose is to estimate the moments of ε using Y only.

3 The Design and the Intuition of the ReMeDI Estimators

3.1 The Estimator of the Autocovariance Function

The intuition of the ReMeDI design can best be seen in a simpler setting. Let $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0084$ be a stationary mixing sequence with mean zero and finite variance; we would like to estimate its autocovariance $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0085$ . The natural estimator is the sample analogue

$urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0086$ (9)

which is consistent and asymptotically normal under very mild conditions.

We consider instead an estimator that replaces the “observations” $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0087$ , $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0088$ by the “long differences”, that is,

$urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0089$ (10)

where $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0090$ , $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0091$ are integers that grow at certain rates as the sample size increases. The estimator $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0092$ follows the ReMeDI design and it provides another consistent estimator of $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0093$ , provided $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0094$ , and $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0095$ . The intuition of the consistency becomes immediate if one rewrites $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0096$ as

$urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0097$ (11)

The first average is (asymptotically) equivalent to the sample analogue (9), thus it converges in probability to $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0098$ ; the remaining three averages are centered at $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0099$ , $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0100$ , and $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0101$ , which themselves converge to zero at a rate depending on (6) as $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0102$ .

Taking differences seems redundant if the time series $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0103$ is observable. However, in our framework, ε is masked by the efficient price X, and we only observe $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0104$ . Taking time differences removes the effect of the efficient price. The intuition of such removal under the infill asymptotics is that the differences of the efficient prices, say, $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0105$ , are much smaller than the differences of the noise as n increases.

3.2 The General ReMeDI Design

We next formally define the ReMeDI estimator of a general class of parameters. First, we provide some notations that we will use below. Let $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0106$ be the set of all finite sequences of integers satisfying $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0107$ . In the sequel, we will assume without loss of generality that $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0108$ for any $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0109$ . The j-moments of χ, the stationary component of microstructure noise, are given by

$urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0110$ (12)

This is our parameter of interest (up to the scaling by the γ heteroscedasticity process); it includes the autocovariance function of the noise process and many other examples as special cases.

Let $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0111$ be a q-tuple of integers. For any $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0112$ and any process V, let $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0113$ be the set of observation indices on $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0114$ for which the following multi-difference operator $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0115$ is well defined:⁹

$urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0119$ (13)

Then the ReMeDI estimator corresponding to $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0120$ based on data $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0121$ and tuning parameters k is defined by

$urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0122$ (14)

Note that we do not normalize yet by $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0123$ .

Remark 2.The estimator (10) can be written in this general form with $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0124$ , $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0125$ , $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0126$ , where the differencing operator $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0127$ is applied to the data $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0128$ . The general ReMeDI approach shares two features with the special case estimator (10) regarding the choices of k: (1) the first entry of k will be negative whereas the remaining ones are positive, that is, the first difference is a forward difference and the remaining ones are backward differences; (2) $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0129$ , $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0130$ as $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0131$ , and we will often write $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0132$ in the sequel to reflect such dependence.

We discuss a little more why the general ReMeDI procedure works under infill asymptotics. For this purpose, suppose that the noise size process γ is constant equal to 1 so that $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0133$ . Suppose that $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0134$ satisfies the two properties in Remark 2. Next, we explain how to connect $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0135$ and $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0136$ with $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0137$ . To see this, we first note that $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0138$ are the “distant” indices of the intervals on which the backward and forward differences are taken. Figure 1 illustrates a simple example with $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0139$ , $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0140$ for some $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0141$ . The forward difference starts at the $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0142$ th observation and ends at the $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0143$ th observation; for the remaining indices in j, the associated differences start from $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0144$ , $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0145$ and end at $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0146$ , $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0147$ , respectively. The intuition of the ReMeDI approach is that the “distant” noise terms are approximately independent of each other, and are also independent of the “clustered” noise $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0148$ (this is because the elements in $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0149$ are quite “sparse;” recall a special case outlined in (11)). Hence any term that has one (or more) of the distant noise as a factor will have a zero expectation approximately. On the other hand, expanding $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0150$ yields

$urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0151$

Therefore, we have by taking expectations that $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0152$ . If $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0153$ is still relatively small such that $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0154$ , the differences/increments of the efficient price over the intervals are asymptotically negligible. That is, $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0155$ . Thus, the averages of $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0156$ will converge in probability to $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0157$ by the law of large numbers. This is the intuition of the identification.

Details are in the caption following the image — **Figure 1**
Open in figure viewer PowerPoint

Illustration of the ReMeDI estimator of j-moments with j = (j₁,j₂,j₃) and k_n = (−k_n,2k_n,4k_n).

Remark 3. (The intuition of the LA method)The ReMeDI method is essentially based on differencing, while the local averaging (LA) method employs deviations from local averages. Specifically, a local average of the observable noisy prices provides a proxy of the efficient price since the noise is averaged out, that is, $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0158$ ; consequently, $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0159$ . Therefore, the moments of noise can be estimated by the sample moments of the proxies $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0160$ . This is the intuition of the LA method.

4 The Asymptotic Properties of the ReMeDI Estimators

4.1 Consistency

We next give the large sample properties of the ReMeDI estimator (for a given choice of $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0161$ ) in our general setting. For a general γ process that satisfies Assumption N, the “average size” of the noise moments $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0162$ is $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0163$ , and this scaling appears in the probability limit of the ReMeDI estimators. Also recall (6) that v is the parameter that controls the degree of serial dependence in the noise.

Theorem 1.Let Assumptions H, O, and N hold, assume $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0164$ and $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0165$ satisfies

$urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0166$ (15)

as $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0167$ . For $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0168$ , we have the following convergence in probability:

$urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0169$ (16)

where $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0170$ is defined in (12) and $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0171$ in (5).

This says that our estimator consistently estimates $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0172$ up to a time t-varying scaling factor that depends on the average scale of the noise and on the stochastic process governing the observation times.

Let $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0173$ be a sequence of integers satisfying $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0174$ , $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0175$ . Let $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0176$ be specified as follows: $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0177$ if $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0178$ , and $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0179$ if $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0180$ . Then, $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0181$ satisfies the conditions in (15).

4.2 Limit Distribution

We first restrict further the values of $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0182$ in order to facilitate the limit theory.¹⁰ Among many possibilities, we propose the following specification of $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0184$ , which is solely determined by a single integer $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0185$ :

$urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0186$ (17)

where $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0187$ is related to v as follows: $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0188$ , $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0189$ , $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0190$ .

Remark 4.Note that (17) implies (15). In the sequel, we will omit $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0191$ and simply write $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0192$ and $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0193$ instead of $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0194$ and $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0195$ when $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0196$ satisfies (17).

We establish the CLT for both the following centered stochastic processes:

$urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0197$

The first process involves unknown but deterministic norming, whereas the second process is normed by the observed stochastic sample size. Thus, the second one is “feasible” in practice.

Theorem 2.Let Assumptions H, O, and N hold, and $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0198$ , v satisfy (17). For any $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0199$ , $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0200$ , we have the following $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0201$ -stable convergence in law:

1. $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0202$ , where the limit is defined on an extension $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0203$ of $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0204$ . Conditionally on $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0205$ , $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0206$ are centered Gaussian with (co)variances $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0207$ that are given by
$urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0208$ (18)
where $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0209$ is given by (26).
2. $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0210$ , where the limit is defined on an extension $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0211$ of $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0212$ . Conditionally on $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0213$ , $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0214$ are centered Gaussian with (co)variances $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0215$ that are given by
$urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0216$ (19)

Remark 5.The term $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0217$ is the asymptotic variance of the ReMeDI estimators contributed by the stationary part of the noise. The explicit form is given by (26) in Appendix A. If the sampling scheme is regular, for example, equally spaced at millisecond frequency, then $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0218$ and the asymptotic variances in (18) and (19) are greatly simplified, since terms other than the first one are zero. We provide further discussion of this in Appendix B.

Remark 6. (Asymptotic variances of ReMeDI and LA)Note that the ReMeDI and LA estimators have very similar asymptotic (co)variances. The only difference lies in the $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0219$ part, which represents the asymptotic variance contributed by the stationary part of the noise. The $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0220$ of the ReMeDI estimators includes the asymptotic (co)variances of the “distant” noise terms (recall the discussion in Section 3.2). It is therefore larger than the counterpart of the LA estimators. Hence, the LA estimators are asymptotically more efficient (although one can improve the efficiency of ReMeDI by taking averages of estimators computed using different $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0221$ ; see Section 4.4.1). However, simulation studies show that the ReMeDI class works better in finite samples with realistic sample sizes (or equivalently, data frequency) — it has smaller finite sample variance and is almost unbiased under various model specifications. Moreover, the ReMeDI approach has greater computational efficiency, which pays off when one is working with massive high-frequency data sets (recall Footnote 5).

Theorem 3.Suppose that all the conditions of Theorem 2 hold. For any $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0222$ , we have the following $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0223$ -stable convergence in law:

$urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0224$ (20)

where Φ is a standard normal random variable that is defined on an extension of the space and is independent of $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0225$ , and $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0226$ is a consistent estimator of the asymptotic variance constructed in (27).

4.3 Estimating the Autocovariances of Microstructure Noise

In this section, we consider the special case concerning the estimation of the autocovariance function of the microstructure noise. Let $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0227$ , $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0228$ , and let

$urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0229$ (21)

The following corollary provides the limit distribution.

Corollary 1. (ReMeDI Estimators of Autocovariances)Under the conditions of Theorem 2, we have $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0230$ , where

$urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0231$ (22)

Moreover, under the assumptions of Theorem 3, we have

$urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0232$ (23)

where Φ is a standard normal random variable as in Theorem 3 and $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0233$ is provided in (27).

Remark 7. $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0234$ represents the variance of the ReMeDI estimators contributed by the stationary part of noise. It has two components: $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0235$ is in fact the asymptotic variance of the sample analogue (recall (9)); the second part $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0236$ is the asymptotic variance of the three additional terms that appear in (11) which arise in differencing.

Remark 8.The last three terms of $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0237$ that appear in (22) arise because of the stochastic sampling scheme; this term is nonnegative, and is zero whenever $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0238$ , $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0239$ , or $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0240$ , where K is a constant.

We note that while the multiplicative structure of the microstructure noise (recall (7)) allows for a time-varying and stochastic size of the noise, the serial correlation of the noise is not affected by the size process. This structure allows us to estimate the autocorrelations of noise directly once we have an estimator of the autocovariances. Define the ReMeDI estimator of the noise autocorrelation, $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0241$ , and its asymptotic variance estimator

$urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0242$

The following corollary spells out the limit distribution of the proposed estimators.

Corollary 2. (ReMeDI Estimators of Autocorrelations)Under the conditions of Theorem 3, we have the following $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0243$ -stable convergence: $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0244$ , where Φ is a standard normal random variable as in Theorem 3.

4.4 Some Extended Discussions

Here, we comment on the efficiency issue and on the behavior of our procedure under the rounding model of discrete prices.

4.4.1 Variance Reduction

The efficiency of the ReMeDI estimator can be improved by combining estimators that use different $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0245$ sequences satisfying the regularity conditions. We explain the procedure for $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0246$ defined in (21). Rewrite $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0247$ as $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0248$ to indicate its dependence on $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0249$ . Let $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0250$ be a sequence of tuning parameters, and define the combined estimator $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0251$ . As we know from other contexts (see, e.g., Abadie and Imbens (2006)), that averaging reduces variance, we just sketch the argument here. It suffices to consider the noise part:¹¹

$urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0252$

The key observation is that sums over the last three terms will have negligible asymptotic variance provided $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0253$ is large. To see this, consider the sum $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0254$ , where

$urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0255$

The sequence $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0256$ has asymptotically negligible covariances if $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0257$ are sparse enough, for example, $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0258$ . As a consequence, the asymptotic variance of $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0259$ . The asymptotic distribution is entirely determined by $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0260$ , that is, the three additional terms that appear in (11) from differencing will not affect the limiting variance. Therefore, $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0261$ (recall (22)) will be reduced since the asymptotic variance of the stationary noise becomes $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0262$ .

4.4.2 Rounding Errors

The additive noise model (1) is the main framework in the literature, but there is a small but growing literature on rounding models that captures the effect of price discreteness; see, for example, Delattre and Jacod (1997), Rosenbaum (2009), and Li and Mykland (2015). Rounding of a continuous-state efficient price process can induce negative firs- order autocovariance in the observed returns similar to that induced by bid-ask bounce and infrequent trading (Schwartz and Whitcomb (1977)), and this effect may be particularly large when the nominal stock price level is low and when trading is frequent. Some applied papers in market microstructure have explicitly allowed for rounding errors, see, for example, Glosten and Harris (1988). Theoretically, the rounding model is difficult to work with in the very general semimartingale setting we have for the efficient price, and we have not so far managed to include this feature in our theoretical analysis. However, we do have simulation evidence suggesting that the ReMeDI estimator also works quite well in this case; see Section E.2 in Li and Linton (2022). Li and Mykland (2015) found that subsampling helps mitigate rounding errors. The ReMeDI approach shares something with subsampling methods in that it takes differences over long intervals. Perhaps this explains the superior performance of the ReMeDI estimators in the presence of rounding errors in the simulation experiments.

5 Simulation Study

5.1 Model Settings

We suppose that the efficient price process has stochastic volatility and jumps that appear in both the price and volatility processes:

$urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0263$ (24)

We set $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0264$ ; $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0265$ ; $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0266$ ; $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0267$ ; $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0268$ ; $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0269$ ; $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0270$ ; $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0271$ . This setting is motivated by some empirical facts that jumps in price levels and volatility tend to occur together; see Todorov and Tauchen (2011).

We further suppose that the stationary component of the microstructure noise follows an AR(1) process with Gaussian innovations $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0272$ , $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0273$ , $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0274$ . Note that χ has unit variance. We set $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0275$ , motivated by the empirical studies in Aït-Sahalia, Mykland, and Zhang (2011) and Li, Laeven, and Vellekoop (2020).

5.2 LA versus ReMeDI

We estimate the autocovariances of microstructure noise using the ReMeDI estimator and the local averaging (LA) estimators (Jacod, Li, and Zheng (2017)). We assume that the noise is stationary so that we can compare the estimates to the true parameters. We also assume that the observation scheme is regular so that we know explicitly the data frequency, which is a key factor that affects the finite sample performance of many high-frequency estimators.

The top and middle panels of Figure 2 present the estimation of the first 20 autocovariances of the noise by ReMeDI and LA.¹² The solid lines are the mean estimates over 1000 replications; the shaded region represents the 95% simulated confidence intervals. We simulate 23,400 observations for each sample path, corresponding to the number of seconds in a business day (6.5 trading hours). The ReMeDI estimators perform well: the estimates are approximately unbiased with narrow confidence bands. Surprisingly, there is a significant average deviation of the LA estimates from the true parameters, and the confidence bands are much larger as well.

The deviation of the LA estimates is elicited by a finite sample bias, which is known to be a fraction of the prior unknown quadratic variation (QV) of the efficient price; see the discussion in Jacod, Li, and Zheng (2017). Thus, to correct the bias, we need an estimate of the QV. But the estimation of QV in the presence of dependent noise is not trivial; see a discussion in Li, Laeven, and Vellekoop (2020). In a simulation context, we can obtain the QV and thus can give the LA estimators the privilege to make the bias correction, which is, of course, not feasible in practice. The bottom panel of Figure 2 displays the bias-corrected estimation of LA. Even with accurate bias correction, however, the ReMeDI estimators still outperform the LA estimators with almost no bias but greater accuracy.

It is interesting to compare ReMeDI and LA when the data frequencies vary. However, increasing the data frequency in a fixed time span has two effects: both the number of observations and the noise-to-signal ratio of tick returns will increase. We design a simulation study to separate the two effects and examine how sensitive ReMeDI and LA are to these changes.

The left panel of Figure 3 presents the mean squared error (MSE) of the ReMeDI and LA estimators for the first 20 autocovariances of noise. The sample size varies from 23,400 (1 trading day) to 117,000 (1 trading week), and 468,000 (1 trading month). The MSE of the ReMeDI estimators remains low and slightly drops when the sample sizes increases. The LA estimators, however, have larger MSE in a larger sample! This is statistically counterintuitive. However, it does make sense if we recall that the integrated volatility contributes to the finite sample bias of the LA estimators. Hence, a longer time span induces larger integrated volatility (relative to the number of observations), which in turn leads to a larger finite sample bias. This is especially so if the sample covers a period of volatility burst, and the likelihood of such an event increases if the sampling period becomes large; see our empirical studies with real transaction prices.

The right panel of Figure 3 compares ReMeDI and LA when noise variance varies from 10⁻⁸ (small noise) to 10⁻⁶ (large noise). We note that the advantage of ReMeDI over LA is more prominent when the scale of noise is smaller. Indeed, the size of noise in practice is closer to the small noise scenario; see an extensive empirical study by Christensen, Oomen, and Podolskij (2014). Thus, in an extreme case when the noise has identical statistical properties in two samples, LA may give very different estimates due to the differences in sample sizes or noise-to-signal ratios. The ReMeDI approach remains robust and accurate.

5.3 Random Noise Size and Observation Times

As the last robustness check, we now allow for stochastic observation times and random scales of noise. Following Jacod, Li, and Zheng (2017), we let $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0277$ follow an inhomogeneous Poisson process with rate $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0278$ , where $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0279$ and the process γ satisfies $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0280$ , $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0281$ . We set $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0282$ , $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0283$ , $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0284$ , $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0285$ . Figure 4 reports the estimation of the autocorrelation functions by the two estimators. We observe similar patterns presented in Figure 2: compared to the ReMeDI estimators, the LA estimators have large biases with a wide confidence band.

The Supplemental Material (Li and Linton (2022)) provides additional simulation studies to examine the quality of the CLT approximation, the effect of rounding error due to the discreteness of price, and the sensitivity to the choice of tuning parameters.

6 Empirical Study

We obtain the transaction prices of Coca-Cola (trading symbol KO)¹³ from the TAQ database for January 2018 (21 trading days). We remove prices before 9:30 and after 16:00. We collect approximately 50,000 observations per day, that is, 2.1 transactions per second on average. The average price is $46.84, with a standard deviation of 0.85.

Figure 5 plots the estimated autocovariances of noise by the ReMeDI estimators (the blue plots) based on samples of different sizes. The autocorrelation pattern is nontrivial: noise exhibits positive autocorrelations up to 4 lags, and shortly thereafter the sign switches to negative for a few lags, and then reverts to positive autocorrelations before decaying to zero around 20 lags. The pointwise confidence interval¹⁴ includes zero or excludes positive values after lag 5, which is incompatible with simple long memory.

**Figure 5**
Open in figure viewer PowerPoint

Estimation of autocovariances of noise for Coca-Cola (KO) in January 2018. In the top panel, we use the transaction prices of KO on 2 January 2018; in the middle panel, we use the transaction prices of KO in the second trading week (8 January 2018 to 12 January 2018); we employ the entire transaction prices of KO in January 2018 in the bottom panel. The tuning parameters for ReMeDI and LA are 10 and 6, respectively. The shaded area in the top panel represents the 95% confidence interval, and we set i_n = 5, $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0286$ to compute the asymptotic variances of the ReMeDI estimators, where $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0287$ is the number of observations.

The ReMeDI estimates of microstructure noise presented in Figure 5 are economically intuitive. The positive autocovariances at the first several lags may be a consequence of the order splitting strategies by high-frequency traders (Biais, Hillion, and Spatt (1995)), or the successive transactions executed by limit orders (Parlour (1998)).¹⁵ The negative autocovariances at the intermediate lags are consistent with the prediction of inventory models (Ho and Stoll (1981), Hendershott and Menkveld (2014)), in which the market makers induce negatively autocorrelated order flows to balance their inventories. However, the LA method gives very different estimates: it says that the noise is strongly autocorrelated without any sign of decay after 20 lags. This is economically counterintuitive—such a pattern, if it exists, would be exploited by high-frequency traders and we would expect it to disappear rapidly. Moreover, the serial dependence, according to the LA estimates, is even stronger when estimation is performed on a larger sample. Since we only estimate autocovariances of noise up to 20 ticks/lags, or a few seconds, it is statistically counterintuitive to obtain stronger autocovariance estimates using the prices of a week than using the prices in a single trading day. This is in line with our simulation study that the LA estimates are subject to a finite sample bias that depends on the noise-to-signal ratio and sample size. The ReMeDI approach retains its accuracy and robustness.

7 Concluding Remarks

We introduce a differencing method to separate the microstructure noise from the underlying semimartingale efficient price in a general setting. We demonstrate the robustness of the proposed method compared to the main existing approach. We have concentrated on the infill setting primarily and the univariate case. The method naturally extends to the multivariate case, although in that case, several issues arise. First, the nonsynchronous trading issue has to be faced. Second, even when the assets trade on a common clock, there are some remaining theoretical results that need to be established for the infill case. We discussed briefly in Section 4.4.1 how one can improve efficiency by combining the estimators associated with different choices of $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0288$ . An alternative potential source of efficiency gain is from the heteroscedasticity delivered by the γ process, which was not exploited by our method. Given a consistent estimator of $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0289$ , one may implement a kind of feasible GLS procedure. We leave these problems for future research.

1 By price it always means the logarithmic price in this paper unless stated otherwise.

2 The differencing method has been used in high-frequency econometrics recently; see, for example, Todorov (2013), Hansen and Lunde (2014), Andersen, Li, Todorov, and Zhou (2020).

3 For example, Hasbrouck and Ho (1987), Choi, Salandro, and Shastri (1988), and Huang and Stoll (1997) modeled the probability of order reversal, and microstructure noise becomes an AR(1) process.

4 One can easily verify the following scenarios by simulation: (1) the LA estimator may report positive autocovariances when the true noise process is uncorrelated or even negatively autocorrelated; (2) the LA estimator has larger bias and variance if there are bursts of volatility in the efficient price process, for example, when the volatility process jumps; (3) the LA estimator gives very different estimates over two samples where the noise processes are identical but the efficient prices have different variances.

5 For example, the LA (ReMeDI) takes 99.77% (0.23%) of the CPU time to estimate the variance of the noise using noisy price data from a random walk plus AR(1) noise model, based on 1000 simulated samples of size 23,400. The ReMeDI estimator has been included in the R-package for high-frequency analysis; see https://CRAN.R-project.org/package=highfrequency. The Matlab code is also available on the authors' homepage; see https://sites.google.com/view/merrickli/research.

6 We have almost the same regularity conditions as Jacod, Li, and Zheng (2017). The only difference is that we have a slightly stronger restriction on the serial dependence of the stationary noise; see Remark 1.

7 This is a standard condition in high-frequency econometric analysis; see Aït-Sahalia and Jacod (2014) and Jacod and Protter (2011). In the sequel, by saying an Itô semimartingale satisfies Assumption H, we mean the components of its Grigelionis representation (recall (2)) satisfies Assumption H.

8 For any $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0062$ , the mixing coefficients for k are given by

$urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0063$

where $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0064$ , $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0065$ . The sequence $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0066$ is ρ-mixing if $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0067$ as $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0068$ .

9 By convention, we set $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0116$ if $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0117$ and $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0118$ if j is a singleton.

10 In the Supplemental Material Li and Linton (2022), we discuss how to select $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0183$ in practice.

11 Since the efficient returns and cross terms of efficient returns and noise are of smaller order under infill asymptotics.

12 We select the same tuning parameter for the LA estimator as in Jacod, Li, and Zheng (2017); we also check other alternatives, and we find $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0276$ leads to smaller bias.

13 In the Supplemental Material (Li and Linton (2022)), we use the transaction prices of General Electric (GE) and Citigroup (Citi), and we obtain similar results.

14 Recall Section B that the duration of successive observed prices is part of the asymptotic variance estimator. We do not plot the confidence intervals when we use transaction prices on different trading days since the prices will cover overnight non-trading hours.

15 Hasbrouck and Ho (1987) and Choi, Salandro, and Shastri (1988) modeled the continuation of order flows by an AR(1) process.

16 By convention we let $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0298$ .

Appendix A: The Asymptotic (Co)Variance

This section introduces $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0290$ that appears in the asymptotic variance in Theorem 2. In the sequel, whenever we have two vectors $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0291$ , $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0292$ , we suppose without loss of generality that $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0293$ . We denote

$urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0294$

For each $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0295$ , there is an associated (unique) pair of subsets:

$urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0296$ (25)

We denote for each $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0297$ the following moments:¹⁶

$urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0299$

Then $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0300$ is given by

$urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0301$ (26)

Appendix B: The Estimation of the Asymptotic (Co)Variance

First, we introduce a sequence of notations

$urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0302$

where the indices appearing above are given by

$urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0303$

The asymptotic variance estimator is given by

$urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0304$ (27)

where

$urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0305$

The estimators seem quite complicated. However, the intuition will be clear in light of the following convergences, under the asymptotic conditions that

$urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0306$ (28)

The proofs of the convergences are in the Supplemental Material.

Now we consider some special cases where the asymptotic (co)variances are simpler. As a consequence, the asymptotic variance estimators are also much simplified.

First, we consider the scenario $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0307$ . The observations schemes that satisfy this condition include the regular sampling scheme and the time-changed regular sampling scheme. Next, let $urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0308$ . One can verify that the modulated Poisson sampling scheme satisfies this condition; see the discussion in Jacod, Li, and Zheng (2017). The asymptotic (co)variance becomes

$urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0309$

and a consistent estimator is given by

$urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0310$

where

$urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0311$

Supporting Information

References

Abadie, A., and G. W. Imbens (2006): “Large Sample Properties of Matching Estimators for Average Treatment Effects,” Econometrica, 74, 235–267.
10.1111/j.1468-0262.2006.00655.x
Web of Science® Google Scholar
Aït-Sahalia, Y., and J. Jacod (2014): High-Frequency Financial Econometrics. Princeton University Press.
10.1515/9781400850327
Google Scholar
Aït-Sahalia, Y., P. A. Mykland, and L. Zhang (2005): “How Often to Sample a Continuous-Time Process in the Presence of Market Microstructure Noise,” Review of Financial Studies, 18, 351–416.
10.1093/rfs/hhi016
Web of Science® Google Scholar
Aït-Sahalia, Y., P. A. Mykland, and L. Zhang (2011): “Ultra High Frequency Volatility Estimation With Dependent Microstructure Noise,” Journal of Econometrics, 160, 160–175.
10.1016/j.jeconom.2010.03.028
Web of Science® Google Scholar
Andersen, T. G., Y. Li, V. Todorov, and B. Zhou (2020): “Volatility Measurement With Pockets of Extreme Return Persistence,” Journal of Econometrics (forthcoming).
Google Scholar
Angrist, J. D., and J.-S. Pischke (2008): Mostly Harmless Econometrics: An Empiricist's Companion. Princeton University Press.
10.2307/j.ctvcm4j72
Google Scholar
Bandi, F. M., and J. R. Russell (2008): “Microstructure Noise, Realized Variance, and Optimal Sampling,” Review of Economic Studies, 75, 339–369.
10.1111/j.1467-937X.2008.00474.x
Web of Science® Google Scholar
Barndorff-Nielsen, O. E., P. R. Hansen, A. Lunde, and N. Shephard (2008): “Designing Realized Kernels to Measure the ex post Variation of Equity Prices in the Presence of Noise,” Econometrica, 76, 1481–1536.
10.3982/ECTA6495
Web of Science® Google Scholar
Biais, B., P. Hillion, and C. Spatt (1995): “An Empirical Analysis of the Limit Order Book and the Order Flow in the Paris Bourse,” Journal of Finance, 50, 1655–1689.
10.1111/j.1540-6261.1995.tb05192.x
Web of Science® Google Scholar
Choi, J. Y., D. Salandro, and K. Shastri (1988): “On the Estimation of Bid-Ask Spreads: Theory and Evidence,” Journal of Financial and Quantitative Analysis, 23, 219–230.
10.2307/2330882
Web of Science® Google Scholar
Christensen, K., R. C. Oomen, and M. Podolskij (2014): “Fact or Friction: Jumps at Ultra High Frequency,” Journal of Financial Economics, 114, 576–599.
10.1016/j.jfineco.2014.07.007
Web of Science® Google Scholar
Da, R., and D. Xiu (2019): “ When Moving-Average Models Meet High-Frequency Data: Uniform Inference on Volatility,” Tech. rep.
Google Scholar
Delattre, S., and J. Jacod (1997): “A Central Limit Theorem for Normalized Functions of the Increments of a Diffusion Process, in the Presence of Round-off Errors,” Bernoulli, 1–28.
10.2307/3318650
Web of Science® Google Scholar
Delbaen, F., and W. Schachermayer (1994): “A General Version of the Fundamental Theorem of Asset Pricing,” Mathematische Annalen, 300, 463–520.
10.1007/BF01450498
Web of Science® Google Scholar
French, K. R., and R. Roll (1986): “Stock Return Variances: The Arrival of Information and the Reaction of Traders,” Journal of Financial Economics, 17, 5–26.
10.1016/0304-405X(86)90004-8
Web of Science® Google Scholar
Glosten, L. R., and L. E. Harris (1988): “Estimating the Components of the Bid/Ask Spread,” Journal of Financial Economics, 21, 123–142.
10.1016/0304-405X(88)90034-7
Web of Science® Google Scholar
Hansen, P. R., and A. Lunde (2014): “Estimating the Persistence and the Autocorrelation Function of a Time Series That Is Measured With Error,” Econometric Theory, 60–93.
10.1017/S0266466613000121
Web of Science® Google Scholar
Hasbrouck, J. (2007): Empirical Market Microstructure: The Institutions, Economics, and Econometrics of Securities Trading. Oxford University Press.
10.1093/oso/9780195301649.001.0001
Google Scholar
Hasbrouck, J., and T. S. Ho (1987): “Order Arrival, Quote Behavior, and the Return-Generating Process,” Journal of Finance, 42, 1035–1048.
10.1111/j.1540-6261.1987.tb03926.x
Web of Science® Google Scholar
Hendershott, T., and A. J. Menkveld (2014): “Price Pressures,” Journal of Financial Economics, 114, 405–423.
10.1016/j.jfineco.2014.08.001
Web of Science® Google Scholar
Ho, T., and H. R. Stoll (1981): “Optimal Dealer Pricing Under Transactions and Return Uncertainty,” Journal of Financial Economics, 9, 47–73.
10.1016/0304-405X(81)90020-9
Web of Science® Google Scholar
Huang, R. D., and H. R. Stoll (1997): “The Components of the Bid-Ask Spread: A General Approach,” Review of Financial Studies, 10, 995–1034.
10.1093/rfs/10.4.995
Web of Science® Google Scholar
Jacod, J., and P. E. Protter (2011): Discretization of Processes, Vol. 67. Springer Science & Business Media.
Google Scholar
Jacod, J., and A. N. Shiryaev (2003): Limit Theorems for Stochastic Processes, Vol. 288. Berlin: Springer-Verlag.
10.1007/978-3-662-05265-5
Google Scholar
Jacod, J., Y. Li, P. A. Mykland, M. Podolskij, and M. Vetter (2009): “Microstructure Noise in the Continuous Case: The Pre-Averaging Approach,” Stochastic Processes and their Applications, 119, 2249–2276.
10.1016/j.spa.2008.11.004
Web of Science® Google Scholar
Jacod, J., Y. Li, and X. Zheng (2017): “Statistical Properties of Microstructure Noise,” Econometrica, 85, 1133–1174.
10.3982/ECTA13085
Web of Science® Google Scholar
Lehmann, B. N. (1990): “Fads, Martingales, and Market Efficiency,” Quarterly Journal of Economics, 105, 1–28.
10.2307/2937816
Web of Science® Google Scholar
Li, J. (2013): “Robust Estimation and Inference for Jumps in Noisy High Frequency Data: A Local-to-Continuity Theory for the Pre-Averaging Method,” Econometrica, 81, 1673–1693.
10.3982/ECTA10534
Web of Science® Google Scholar
Li, Y., and P. A. Mykland (2015): “Rounding Errors and Volatility Estimation,” Journal of Financial Econometrics, 13, 478–504.
10.1093/jjfinec/nbu005
Web of Science® Google Scholar
Li, Z. M., and O. Linton (2019): “ A ReMeDI for Microstructure Noise,” https://ssrn.com/abstract=3423607.
Google Scholar
Li, Z. M., O. Linton (2022): “ Supplement to ‘A ReMeDI for Microstructure Noise’,” Econometrica Supplemental Material, 90, https://doi.org/10.3982/ECTA17505.
Google Scholar
Li, Z. M., R. J. Laeven, and M. H. Vellekoop (2020): “Dependent Microstructure Noise and Integrated Volatility Estimation From High-Frequency Data,” Journal of Econometrics, 215, 536–558.
10.1016/j.jeconom.2019.10.004
Web of Science® Google Scholar
Linton, O., and S. Mahmoodzadeh (2018): “Implications of High-Frequency Trading for Security Markets,” Annual Review of Economics, 10, 237–259.
10.1146/annurev-economics-063016-104407
Web of Science® Google Scholar
O'Hara, M. (1995): Market Microstructure Theory, Vol. 108. Cambridge, MA: Blackwell.
Google Scholar
Park, A., and H. Sabourian (2011): “Herding and Contrarian Behavior in Financial Markets,” Econometrica, 79, 973–1026.
10.3982/ECTA8602
Web of Science® Google Scholar
Parlour, C. A. (1998): “Price Dynamics in Limit Order Markets,” Review of Financial Studies, 11, 789–816.
10.1093/rfs/11.4.789
Web of Science® Google Scholar
Rosenbaum, M. (2009): “Integrated Volatility and Round-off Error,” Bernoulli, 15, 687–720.
10.3150/08-BEJ170
Web of Science® Google Scholar
Schwartz, R. A., and D. K. Whitcomb (1977): “The Time-Variance Relationship: Evidence on Autocorrelation in Common Stock Returns,” Journal of Finance, 32, 41–55.
10.1111/j.1540-6261.1977.tb03240.x
Web of Science® Google Scholar
Todorov, V. (2013): “Power Variation From Second Order Differences for Pure Jump Semimartingales,” Stochastic Processes and their Applications, 123, 2829–2850.
10.1016/j.spa.2013.04.005
Web of Science® Google Scholar
Todorov, V., and G. Tauchen (2011): “Volatility Jumps,” Journal of Business & Economic Statistics, 29, 356–371.
10.1198/jbes.2010.08342
Web of Science® Google Scholar
Xiu, D. (2010): “Quasi-Maximum Likelihood Estimation of Volatility With High Frequency Data,” Journal of Econometrics, 159, 235–250.
10.1016/j.jeconom.2010.07.002
Web of Science® Google Scholar
Zhang, L. (2006): “Efficient Estimation of Stochastic Volatility Using Noisy Observations: A Multi-Scale Approach,” Bernoulli, 12, 1019–1043.
10.3150/bj/1165269149
Web of Science® Google Scholar
Zhang, L., P. A. Mykland, and Y. Aït-Sahalia (2005): “A Tale of Two Time Scales: Determining Integrated Volatitility With Noisy High-Frequency Data,” Journal of the American Statistical Association, 100, 1394–1411.
10.1198/016214505000000169
CAS Web of Science® Google Scholar

Citing Literature

Volume90, Issue1

January 2022

Pages 367-389

Filename	Description
ecta200362-sup-0001-onlineappendix.pdf276.8 KB	Online Appendix
ecta200362-sup-0002-dataandprograms.zip112.9 KB	Data and Programs

A ReMeDI for Microstructure Noise

Abstract

1 Introduction

2 Continuous-Time Framework and Assumptions

2.1 Efficient Price Process

2.2 Observation Scheme

2.3 Microstructure Noise

2.4 The Observed Noisy Price

3 The Design and the Intuition of the ReMeDI Estimators

3.1 The Estimator of the Autocovariance Function

3.2 The General ReMeDI Design

4 The Asymptotic Properties of the ReMeDI Estimators

4.1 Consistency

4.2 Limit Distribution

4.3 Estimating the Autocovariances of Microstructure Noise

4.4 Some Extended Discussions

4.4.1 Variance Reduction

4.4.2 Rounding Errors

5 Simulation Study

5.1 Model Settings

5.2 LA versus ReMeDI

5.3 Random Noise Size and Observation Times

6 Empirical Study

7 Concluding Remarks

Appendix A: The Asymptotic (Co)Variance

Appendix B: The Estimation of the Asymptotic (Co)Variance

Supporting Information

References

Citing Literature

Figures

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

A ReMeDI for Microstructure Noise

Abstract

1 Introduction

2 Continuous-Time Framework and Assumptions

2.1 Efficient Price Process

2.2 Observation Scheme

2.3 Microstructure Noise

2.4 The Observed Noisy Price

3 The Design and the Intuition of the ReMeDI Estimators

3.1 The Estimator of the Autocovariance Function

3.2 The General ReMeDI Design

4 The Asymptotic Properties of the ReMeDI Estimators

4.1 Consistency

4.2 Limit Distribution

4.3 Estimating the Autocovariances of Microstructure Noise

4.4 Some Extended Discussions

4.4.1 Variance Reduction

4.4.2 Rounding Errors

5 Simulation Study

5.1 Model Settings

5.2 LA versus ReMeDI

5.3 Random Noise Size and Observation Times

6 Empirical Study

7 Concluding Remarks

Appendix A: The Asymptotic (Co)Variance

Appendix B: The Estimation of the Asymptotic (Co)Variance

Supporting Information

References

Citing Literature

Figures

References

Related

Information