Volume 90, Issue 1 pp. 367-389
Original Articles
Full Access

A ReMeDI for Microstructure Noise

Z. Merrick Li

Corresponding Author

Z. Merrick Li

Department of Economics, The Chinese University of Hong Kong

Search for more papers by this author
Oliver Linton

Oliver Linton

Faculty of Economics, University of Cambridge

Search for more papers by this author
First published: 26 January 2022
Citations: 15
We would like to thank three anonymous referees for their helpful comments. We also benefit from discussions with Yacine Aït-Sahalia, Torben Andersen, Peter Boswijk, Rob Engle, Christian Gouriéroux, Peter Reinhard Hansen, Jean Jacod, Ilze Kalnina, Frank Kleibergen, Eben Lazarus, Jia Li, Yingying Li, Albert Menkveld, Per Mykland, George Tauchen, Michel Vellekoop, Bas Werker, Dacheng Xiu, Xiye Yang, Xinghua Zheng, and seminar participants at the 2017 North American Summer Meeting (St. Louis, June 2017), 2017 European Meeting (Lisbon, August 2017) and 2018 China Meeting (Shanghai, June 2018) of the Econometric Society, International Conference on Quantitative Finance and Financial Econometrics (Marseille, June 2018), the 12th Annual SoFiE Conference (Shanghai, June 2019), the Econometric Society and Bocconi University Virtual World Congress (August, 2020), the Econometric Society European Winter Meeting (Virtual, 2020) and the North American Winter Meeting (Virtual, 2021), the Chinese University of Hong Kong, Hong Kong University of Science and Technology, NYU Shanghai, Peking University HSBC Business School, the University of Amsterdam, and the University of Cambridge. The project is partially sponsored by the Keynes Fund (JHUL).

Abstract

We introduce the Realized moMents of Disjoint Increments (ReMeDI) paradigm to measure microstructure noise (the deviation of the observed asset prices from the fundamental values caused by market imperfections). We propose consistent estimators of arbitrary moments of the microstructure noise process based on high-frequency data, where the noise process could be serially dependent, endogenous, and nonstationary. We characterize the limit distributions of the proposed estimators and construct confidence intervals under infill asymptotics. Our simulation and empirical studies show that the ReMeDI approach is very effective to measure the scale and the serial dependence of microstructure noise. Moreover, the estimators are quite robust to model specifications, sample sizes, and data frequencies.

1 Introduction

Economic time series are often modeled as the sum of a latent process obtained from an underlying economic model and another term that reflects a variety of adjustments to or departures from the frictionless theoretical model, thus
urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0001(1)
The two processes X and ε are generated by different mechanisms, and can have quite distinct statistical properties and economic interpretations. Both quantities may be of interest as they give interpretation of some underlying economic theory and its relevance for the observed data. However, since only the sum process Y is observable, this makes the estimation and inference about the underlying signal X and noise ε challenging.

We are concerned with applications of this framework in financial markets where the observed asset price (Y) subsumes both the market microstructure noise (ε) and the efficient price (or fundamental value) (X). The fundamental theorem of asset pricing says that X should be a semimartingale process (Delbaen and Schachermayer (1994)). In practice, however, many market frictions, such as transaction costs, price discreteness, inventory holdings, information asymmetry, or measurement errors, may cause the observed prices to deviate from this ideal price. One may also want to allow for temporary mis-pricing (French and Roll (1986)) or fad effects (Lehmann (1990)); see also O'Hara (1995) and Hasbrouck (2007) for insightful reviews. A lot of early work proceeded on the basis that the microstructure noise process was i.i.d., but recently this assumption has been shown to be too strong; both theoretically and empirically, the microstructure noise may exhibit rich dynamics depending on its origin. If the microstructure effects are negligible, the observed price should be close to the efficient price and be unpredictable. Therefore, the dispersion and persistence of the microstructure noise serve as natural measures of market quality. Market quality is of concern to regulators and practitioners as well as academics; proxies for market quality are widely used in empirical analysis; see Linton and Mahmoodzadeh (2018).

We introduce a general econometric approach to measure microstructure noise in a nonparametric setting. Specifically, we propose a new estimator of the moments of a general dependent noise process based on the observed noisy high-frequency transaction prices; we call our estimator the Realized moMents of Disjoint Increments (ReMeDI). The estimation method is based on the differencing paradigm, which is widely used in microeconometrics to eliminate nuisance parameters; see, for example, Angrist and Pischke (2008). We build on the general setup introduced in the seminal work of Jacod, Li, and Zheng (2017). Specifically, we assume that the underlying efficient price follows a semimartingale, which may accommodate stochastic volatility, jumps, etc. We allow the microstructure noise to be weakly dependent and to have a serial correlation of an unknown form that may decay at an algebraic rate; this may capture, for instance, the effects of clustered (or hidden) order flows or herding (Park and Sabourian (2011)). The microstructure noise is allowed to have time-varying and stochastic heteroscedasticity, which allows for intraday variation in the scale of the noise. The general setting we consider allows for random and endogenous observation schemes. We develop estimators of arbitrary moments of the microstructure noise; this includes the autocovariance function of powers of the noise process as well as other quantities of interest. We derive the stable convergence in law of the estimated quantities as the sample size increases on a given domain. We provide a consistent estimator of the asymptotic variance that allows us to quantify the accuracy of our estimator.

We present some simulation studies comparing the ReMeDI approach with the method of Jacod, Li, and Zheng (2017). We find that the ReMeDI approach is relatively robust to: the data frequency, the sample size, the tuning parameter, and the model specification. We provide an empirical study on an individual stock price, which reveals that the microstructure noise has nontrivial serial dependence, but that the dependence structure falls short of being long memory. This is consistent with leading microstructure models, and differs from the findings in Jacod, Li, and Zheng (2017).

The robustness of the ReMeDI approach as demonstrated in our simulation and empirical studies has an intuitive explanation. The differencing method works because the increments of X over disjoint intervals (the efficient returns) are small and/or uncorrelated, and what remains is attributed to ε. This property distinguishes the ReMeDI approach from alternative high-frequency estimators that rely structurally on the infill asymptotics.

There are a number of methods for estimation of the moments of noise and the parameters of the efficient price, specifically: the two-scale/multi-scale realized volatility by Zhang, Mykland, and Aït-Sahalia (2005), Zhang (2006), Aït-Sahalia, Mykland, and Zhang (2011); the optimal-sampling realized variance by Bandi and Russell (2008); the maximum likelihood estimators by Aït-Sahalia, Mykland, and Zhang (2005), Xiu (2010); the pre-averaging method developed in Jacod, Li, Mykland, Podolskij, and Vetter (2009), see also Li (2013); and the realized kernel by Barndorff-Nielsen, Hansen, Lunde, and Shephard (2008). Most of this literature only considers i.i.d. microstructure noise.

Several recent papers explore richer microstructure models by allowing for autocorrelated noise. The estimators of the second moments of noise in Da and Xiu (2019) and Li, Laeven, and Vellekoop (2020) are by-products of the integrated volatility estimators in the presence of autocorrelated noise. In a recent seminal paper, Jacod, Li, and Zheng (2017) introduced the first feasible procedure, called the local averaging (LA) method, to estimate arbitrary moments of microstructure noise using high-frequency data. They also introduced a general framework allowing for a stochastic observation scheme and a microstructure noise with a semimartingale “size process.” We follow their general setup and derive asymptotic properties of our estimators under this general framework. We differentiate our paper from Jacod, Li, and Zheng (2017) as follows. First, the ReMeDI method is based on differencing, while the LA method is based on deviations from local averages; both ideas are widely used in other contexts such as panel data and semiparametric estimation to eliminate nuisance parameters. Second, the ReMeDI approach works beyond the infill framework. Specifically, in the working paper version, Li and Linton (2019), we proved that the ReMeDI estimator is consistent and has an associated CLT in a long-span, non-infill setting. In this case, the method works provided the efficient price is a martingale, in which case its increments are uncorrelated at any horizon. The LA method, however, is inconsistent when applied to low-frequency data. Next, the finite sample performance of the LA estimators heavily depends on the sample size and the noise-to-signal ratio (the ratio of noise variance to the integrated volatility of the efficient price); see an analysis in Jacod, Li, and Zheng (2017). This may cause many issues in the implementations with real data. The bias of the ReMeDI estimators, by contrast, only depends on the slope of the autocovariance function of the microstructure noise, and in short memory contexts this bias can be very small. Last, the ReMeDI approach has another two advantages in real implementations: it is computationally very efficient, and it is very robust to a wide range of tuning parameters.

2 Continuous-Time Framework and Assumptions

We follow the general framework of Jacod, Li, and Zheng (2017) to specify the continuous-time efficient price process, the observation scheme, and the microstructure noise.

2.1 Efficient Price Process

We assume that the efficient price process X is an Itô semimartingale defined on a filtered probability space urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0002 with the Grigelionis representation
urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0003(2)
where W, urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0004 are a Wiener process and a Poisson random measure on urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0005 and E, respectively. Here, urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0006 is a measurable Polish space on urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0007 and the predictable compensator of urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0008 is urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0009 for some given σ-finite measure on urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0010; see Jacod and Shiryaev (2003) for detailed introduction of the last two integrals. Moreover, X satisfies the following regularity condition:

Assumption H.The process b is locally bounded, the process σ is càdlàg, there is a localizing sequence urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0011 of stopping times, and, for each n, a deterministic nonnegative function urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0012 on E satisfying urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0013 such that urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0014 for all urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0015 satisfying urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0016.

The efficient price process is very general; it allows for stochastic volatility and jumps in both the price and volatility processes.

2.2 Observation Scheme

For each n, let urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0017 be a sequence of random finite observed times (usually when a transaction or quote occurs) with urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0018, where urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0019 is the set of nonnegative integers. We denote
urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0020(3)
Here, urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0021 is the stochastic number of observations recorded on the interval urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0022 for urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0023, while urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0024 is the ith spacing of the observation times. For any process V, we denote urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0025.

Let urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0026 be a positive sequence of real numbers satisfying urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0027 as urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0028. We may think of urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0029 as the average magnitude of the spacings between successive observation times: if the observation times were equally spaced (the regular observation scheme), then urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0030 would be proportional to that spacing. The difference between the regular observation scheme and the general scheme is characterized by two semimartingale intensity processes α, urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0031. Conditional upon an appropriate σ-algebra, the expectations of urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0032 and urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0033 are approximately equal to urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0034 and urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0035, respectively. Specifically, we assume the following:

Assumption O.α, urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0036 are two Itô semimartingales defined on urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0037 satisfying Assumption H. We further assume there is a localizing sequence urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0038 of stopping times and positive constants urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0039 and κ such that:

  • 1. For urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0040, we have urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0041 and urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0042, where urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0043 and urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0044 are the left limits of urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0045 and urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0046.
  • 2.

    Let urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0047 be the smallest filtration satisfying

    • (a) urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0048,
    • (b) urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0049 is a urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0050 stopping time for urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0051,
    • (c) urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0052, conditional urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0053, is independent of urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0054 for urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0055

  • 3. With the restriction urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0056, and for all urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0057,
    urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0058(4)

A useful consequence of our setting is the following convergence in probability:
urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0059(5)
The observation times framework is very general, and includes, inter alia: regular sampling scheme, time-changed regular sampling scheme, modulated Poisson sampling scheme, and predictably-modulated random walk sampling scheme; see the discussion in Jacod, Li, and Zheng (2017).

2.3 Microstructure Noise

We suppose that the microstructure noise has a multiplicative form that allows for serial dependence, stochastic scale, and dependence of the scale on the efficient price process.

Assumption N.Let urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0060 be a stationary ρ-mixing random sequence with mixing coefficients urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0061. We further assume that urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0069 is centered at 0 with variance 1 and finite moments of all orders, and is independent of urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0070. Moreover, there is some urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0071, urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0072 such that

urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0073(6)
At stage n, the noise at time urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0074 is given by
urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0075(7)
where γ is a nonnegative Itô semimartingale on urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0076 satisfying Assumption H and is not identically zero on any interval.

Remark 1.To obtain limit results, we shall suppose that urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0077 for consistency and that urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0078 to derive the limit distribution, which allows for quite strong dependence close to the long memory boundary. Jacod, Li, and Zheng (2017) required urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0079 for consistency and urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0080 to establish the limit distribution.

2.4 The Observed Noisy Price

Finally, the observed noisy price urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0081 is given by (for urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0082)
urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0083(8)
Note that both X and ε are latent; only Y is observable. Our purpose is to estimate the moments of ε using Y only.

3 The Design and the Intuition of the ReMeDI Estimators

3.1 The Estimator of the Autocovariance Function

The intuition of the ReMeDI design can best be seen in a simpler setting. Let urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0084 be a stationary mixing sequence with mean zero and finite variance; we would like to estimate its autocovariance urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0085. The natural estimator is the sample analogue
urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0086(9)
which is consistent and asymptotically normal under very mild conditions.
We consider instead an estimator that replaces the “observations” urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0087, urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0088 by the “long differences”, that is,
urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0089(10)
where urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0090, urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0091 are integers that grow at certain rates as the sample size increases. The estimator urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0092 follows the ReMeDI design and it provides another consistent estimator of urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0093, provided urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0094, and urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0095. The intuition of the consistency becomes immediate if one rewrites urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0096 as
urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0097(11)
The first average is (asymptotically) equivalent to the sample analogue (9), thus it converges in probability to urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0098; the remaining three averages are centered at urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0099, urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0100, and urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0101, which themselves converge to zero at a rate depending on (6) as urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0102.

Taking differences seems redundant if the time series urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0103 is observable. However, in our framework, ε is masked by the efficient price X, and we only observe urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0104. Taking time differences removes the effect of the efficient price. The intuition of such removal under the infill asymptotics is that the differences of the efficient prices, say, urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0105, are much smaller than the differences of the noise as n increases.

3.2 The General ReMeDI Design

We next formally define the ReMeDI estimator of a general class of parameters. First, we provide some notations that we will use below. Let urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0106 be the set of all finite sequences of integers satisfying urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0107. In the sequel, we will assume without loss of generality that urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0108 for any urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0109. The j-moments of χ, the stationary component of microstructure noise, are given by
urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0110(12)
This is our parameter of interest (up to the scaling by the γ heteroscedasticity process); it includes the autocovariance function of the noise process and many other examples as special cases.
Let urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0111 be a q-tuple of integers. For any urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0112 and any process V, let urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0113 be the set of observation indices on urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0114 for which the following multi-difference operator urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0115 is well defined:
urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0119(13)
Then the ReMeDI estimator corresponding to urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0120 based on data urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0121 and tuning parameters k is defined by
urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0122(14)
Note that we do not normalize yet by urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0123.

Remark 2.The estimator (10) can be written in this general form with urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0124, urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0125, urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0126, where the differencing operator urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0127 is applied to the data urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0128. The general ReMeDI approach shares two features with the special case estimator (10) regarding the choices of k: (1) the first entry of k will be negative whereas the remaining ones are positive, that is, the first difference is a forward difference and the remaining ones are backward differences; (2) urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0129, urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0130 as urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0131, and we will often write urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0132 in the sequel to reflect such dependence.

We discuss a little more why the general ReMeDI procedure works under infill asymptotics. For this purpose, suppose that the noise size process γ is constant equal to 1 so that urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0133. Suppose that urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0134 satisfies the two properties in Remark 2. Next, we explain how to connect urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0135 and urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0136 with urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0137. To see this, we first note that urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0138 are the “distant” indices of the intervals on which the backward and forward differences are taken. Figure 1 illustrates a simple example with urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0139, urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0140 for some urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0141. The forward difference starts at the urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0142th observation and ends at the urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0143th observation; for the remaining indices in j, the associated differences start from urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0144, urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0145 and end at urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0146, urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0147, respectively. The intuition of the ReMeDI approach is that the “distant” noise terms are approximately independent of each other, and are also independent of the “clustered” noise urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0148 (this is because the elements in urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0149 are quite “sparse;” recall a special case outlined in (11)). Hence any term that has one (or more) of the distant noise as a factor will have a zero expectation approximately. On the other hand, expanding urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0150 yields
urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0151
Therefore, we have by taking expectations that urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0152. If urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0153 is still relatively small such that urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0154, the differences/increments of the efficient price over the intervals are asymptotically negligible. That is, urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0155. Thus, the averages of urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0156 will converge in probability to urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0157 by the law of large numbers. This is the intuition of the identification.
Details are in the caption following the image

Illustration of the ReMeDI estimator of j-moments with j = (j1,j2,j3) and kn = (−kn,2kn,4kn).

Remark 3. (The intuition of the LA method)The ReMeDI method is essentially based on differencing, while the local averaging (LA) method employs deviations from local averages. Specifically, a local average of the observable noisy prices provides a proxy of the efficient price since the noise is averaged out, that is, urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0158; consequently, urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0159. Therefore, the moments of noise can be estimated by the sample moments of the proxies urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0160. This is the intuition of the LA method.

4 The Asymptotic Properties of the ReMeDI Estimators

4.1 Consistency

We next give the large sample properties of the ReMeDI estimator (for a given choice of urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0161) in our general setting. For a general γ process that satisfies Assumption N, the “average size” of the noise moments urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0162 is urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0163, and this scaling appears in the probability limit of the ReMeDI estimators. Also recall (6) that v is the parameter that controls the degree of serial dependence in the noise.

Theorem 1.Let Assumptions H, O, and N hold, assume urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0164 and urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0165 satisfies

urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0166(15)
as urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0167. For urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0168, we have the following convergence in probability:
urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0169(16)
where urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0170 is defined in (12) and urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0171 in (5).

This says that our estimator consistently estimates urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0172 up to a time t-varying scaling factor that depends on the average scale of the noise and on the stochastic process governing the observation times.

Let urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0173 be a sequence of integers satisfying urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0174, urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0175. Let urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0176 be specified as follows: urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0177 if urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0178, and urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0179 if urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0180. Then, urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0181 satisfies the conditions in (15).

4.2 Limit Distribution

We first restrict further the values of urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0182 in order to facilitate the limit theory. Among many possibilities, we propose the following specification of urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0184, which is solely determined by a single integer urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0185:
urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0186(17)
where urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0187 is related to v as follows: urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0188, urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0189, urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0190.

Remark 4.Note that (17) implies (15). In the sequel, we will omit urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0191 and simply write urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0192 and urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0193 instead of urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0194 and urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0195 when urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0196 satisfies (17).

We establish the CLT for both the following centered stochastic processes:
urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0197
The first process involves unknown but deterministic norming, whereas the second process is normed by the observed stochastic sample size. Thus, the second one is “feasible” in practice.

Theorem 2.Let Assumptions H, O, and N hold, and urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0198, v satisfy (17). For any urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0199, urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0200, we have the following urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0201-stable convergence in law:

  • 1. urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0202, where the limit is defined on an extension urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0203 of urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0204. Conditionally on urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0205, urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0206 are centered Gaussian with (co)variances urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0207 that are given by
    urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0208(18)
    where urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0209 is given by (26).
  • 2. urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0210, where the limit is defined on an extension urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0211 of urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0212. Conditionally on urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0213, urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0214 are centered Gaussian with (co)variances urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0215 that are given by
    urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0216(19)

Remark 5.The term urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0217 is the asymptotic variance of the ReMeDI estimators contributed by the stationary part of the noise. The explicit form is given by (26) in Appendix A. If the sampling scheme is regular, for example, equally spaced at millisecond frequency, then urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0218 and the asymptotic variances in (18) and (19) are greatly simplified, since terms other than the first one are zero. We provide further discussion of this in Appendix B.

Remark 6. (Asymptotic variances of ReMeDI and LA)Note that the ReMeDI and LA estimators have very similar asymptotic (co)variances. The only difference lies in the urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0219 part, which represents the asymptotic variance contributed by the stationary part of the noise. The urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0220 of the ReMeDI estimators includes the asymptotic (co)variances of the “distant” noise terms (recall the discussion in Section 3.2). It is therefore larger than the counterpart of the LA estimators. Hence, the LA estimators are asymptotically more efficient (although one can improve the efficiency of ReMeDI by taking averages of estimators computed using different urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0221; see Section 4.4.1). However, simulation studies show that the ReMeDI class works better in finite samples with realistic sample sizes (or equivalently, data frequency) — it has smaller finite sample variance and is almost unbiased under various model specifications. Moreover, the ReMeDI approach has greater computational efficiency, which pays off when one is working with massive high-frequency data sets (recall Footnote ).

Theorem 3.Suppose that all the conditions of Theorem 2 hold. For any urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0222, we have the following urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0223-stable convergence in law:

urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0224(20)
where Φ is a standard normal random variable that is defined on an extension of the space and is independent of urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0225, and urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0226 is a consistent estimator of the asymptotic variance constructed in (27).

4.3 Estimating the Autocovariances of Microstructure Noise

In this section, we consider the special case concerning the estimation of the autocovariance function of the microstructure noise. Let urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0227, urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0228, and let
urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0229(21)
The following corollary provides the limit distribution.

Corollary 1. (ReMeDI Estimators of Autocovariances)Under the conditions of Theorem 2, we have urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0230, where

urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0231(22)
Moreover, under the assumptions of Theorem 3, we have
urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0232(23)
where Φ is a standard normal random variable as in Theorem 3 and urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0233 is provided in (27).

Remark 7.urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0234 represents the variance of the ReMeDI estimators contributed by the stationary part of noise. It has two components: urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0235 is in fact the asymptotic variance of the sample analogue (recall (9)); the second part urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0236 is the asymptotic variance of the three additional terms that appear in (11) which arise in differencing.

Remark 8.The last three terms of urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0237 that appear in (22) arise because of the stochastic sampling scheme; this term is nonnegative, and is zero whenever urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0238, urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0239, or urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0240, where K is a constant.

We note that while the multiplicative structure of the microstructure noise (recall (7)) allows for a time-varying and stochastic size of the noise, the serial correlation of the noise is not affected by the size process. This structure allows us to estimate the autocorrelations of noise directly once we have an estimator of the autocovariances. Define the ReMeDI estimator of the noise autocorrelation, urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0241, and its asymptotic variance estimator
urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0242

The following corollary spells out the limit distribution of the proposed estimators.

Corollary 2. (ReMeDI Estimators of Autocorrelations)Under the conditions of Theorem 3, we have the following urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0243-stable convergence: urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0244, where Φ is a standard normal random variable as in Theorem 3.

4.4 Some Extended Discussions

Here, we comment on the efficiency issue and on the behavior of our procedure under the rounding model of discrete prices.

4.4.1 Variance Reduction

The efficiency of the ReMeDI estimator can be improved by combining estimators that use different urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0245 sequences satisfying the regularity conditions. We explain the procedure for urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0246 defined in (21). Rewrite urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0247 as urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0248 to indicate its dependence on urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0249. Let urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0250 be a sequence of tuning parameters, and define the combined estimator urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0251. As we know from other contexts (see, e.g., Abadie and Imbens (2006)), that averaging reduces variance, we just sketch the argument here. It suffices to consider the noise part:
urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0252
The key observation is that sums over the last three terms will have negligible asymptotic variance provided urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0253 is large. To see this, consider the sum urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0254, where
urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0255
The sequence urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0256 has asymptotically negligible covariances if urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0257 are sparse enough, for example, urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0258. As a consequence, the asymptotic variance of urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0259. The asymptotic distribution is entirely determined by urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0260, that is, the three additional terms that appear in (11) from differencing will not affect the limiting variance. Therefore, urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0261 (recall (22)) will be reduced since the asymptotic variance of the stationary noise becomes urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0262.

4.4.2 Rounding Errors

The additive noise model (1) is the main framework in the literature, but there is a small but growing literature on rounding models that captures the effect of price discreteness; see, for example, Delattre and Jacod (1997), Rosenbaum (2009), and Li and Mykland (2015). Rounding of a continuous-state efficient price process can induce negative firs- order autocovariance in the observed returns similar to that induced by bid-ask bounce and infrequent trading (Schwartz and Whitcomb (1977)), and this effect may be particularly large when the nominal stock price level is low and when trading is frequent. Some applied papers in market microstructure have explicitly allowed for rounding errors, see, for example, Glosten and Harris (1988). Theoretically, the rounding model is difficult to work with in the very general semimartingale setting we have for the efficient price, and we have not so far managed to include this feature in our theoretical analysis. However, we do have simulation evidence suggesting that the ReMeDI estimator also works quite well in this case; see Section E.2 in Li and Linton (2022). Li and Mykland (2015) found that subsampling helps mitigate rounding errors. The ReMeDI approach shares something with subsampling methods in that it takes differences over long intervals. Perhaps this explains the superior performance of the ReMeDI estimators in the presence of rounding errors in the simulation experiments.

5 Simulation Study

5.1 Model Settings

We suppose that the efficient price process has stochastic volatility and jumps that appear in both the price and volatility processes:
urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0263(24)
We set urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0264; urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0265; urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0266; urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0267; urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0268; urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0269; urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0270; urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0271. This setting is motivated by some empirical facts that jumps in price levels and volatility tend to occur together; see Todorov and Tauchen (2011).

We further suppose that the stationary component of the microstructure noise follows an AR(1) process with Gaussian innovations urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0272, urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0273, urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0274. Note that χ has unit variance. We set urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0275, motivated by the empirical studies in Aït-Sahalia, Mykland, and Zhang (2011) and Li, Laeven, and Vellekoop (2020).

5.2 LA versus ReMeDI

We estimate the autocovariances of microstructure noise using the ReMeDI estimator and the local averaging (LA) estimators (Jacod, Li, and Zheng (2017)). We assume that the noise is stationary so that we can compare the estimates to the true parameters. We also assume that the observation scheme is regular so that we know explicitly the data frequency, which is a key factor that affects the finite sample performance of many high-frequency estimators.

The top and middle panels of Figure 2 present the estimation of the first 20 autocovariances of the noise by ReMeDI and LA. The solid lines are the mean estimates over 1000 replications; the shaded region represents the 95% simulated confidence intervals. We simulate 23,400 observations for each sample path, corresponding to the number of seconds in a business day (6.5 trading hours). The ReMeDI estimators perform well: the estimates are approximately unbiased with narrow confidence bands. Surprisingly, there is a significant average deviation of the LA estimates from the true parameters, and the confidence bands are much larger as well.

Details are in the caption following the image

Estimation of the autocovariances of noise by the ReMeDI method (top panel), the local averaging method (middle panel), and the bias-corrected local averaging method (bottom panel). The blue solid lines are the mean estimates of 1000 simulations by the three estimators. The tuning parameters of the ReMeDI and LA estimators are 10 and 6, respectively. The noise scale is fixed at γ ≡ 5 × 10−4.

The deviation of the LA estimates is elicited by a finite sample bias, which is known to be a fraction of the prior unknown quadratic variation (QV) of the efficient price; see the discussion in Jacod, Li, and Zheng (2017). Thus, to correct the bias, we need an estimate of the QV. But the estimation of QV in the presence of dependent noise is not trivial; see a discussion in Li, Laeven, and Vellekoop (2020). In a simulation context, we can obtain the QV and thus can give the LA estimators the privilege to make the bias correction, which is, of course, not feasible in practice. The bottom panel of Figure 2 displays the bias-corrected estimation of LA. Even with accurate bias correction, however, the ReMeDI estimators still outperform the LA estimators with almost no bias but greater accuracy.

It is interesting to compare ReMeDI and LA when the data frequencies vary. However, increasing the data frequency in a fixed time span has two effects: both the number of observations and the noise-to-signal ratio of tick returns will increase. We design a simulation study to separate the two effects and examine how sensitive ReMeDI and LA are to these changes.

The left panel of Figure 3 presents the mean squared error (MSE) of the ReMeDI and LA estimators for the first 20 autocovariances of noise. The sample size varies from 23,400 (1 trading day) to 117,000 (1 trading week), and 468,000 (1 trading month). The MSE of the ReMeDI estimators remains low and slightly drops when the sample sizes increases. The LA estimators, however, have larger MSE in a larger sample! This is statistically counterintuitive. However, it does make sense if we recall that the integrated volatility contributes to the finite sample bias of the LA estimators. Hence, a longer time span induces larger integrated volatility (relative to the number of observations), which in turn leads to a larger finite sample bias. This is especially so if the sample covers a period of volatility burst, and the likelihood of such an event increases if the sampling period becomes large; see our empirical studies with real transaction prices.

Details are in the caption following the image

Mean squared error (MSE) of the ReMeDI and LA estimators for the first 20 autocovariances of noise based on 1000 simulations. In the left panel, the noise scale is fixed at γ = 5 × 10−4 and the sample size varies; in the right panel, the size sample is 23,400 while the noise scale parameter varies. The tuning parameters of the ReMeDI and LA estimators are 10 and 6, respectively.

The right panel of Figure 3 compares ReMeDI and LA when noise variance varies from 10−8 (small noise) to 10−6 (large noise). We note that the advantage of ReMeDI over LA is more prominent when the scale of noise is smaller. Indeed, the size of noise in practice is closer to the small noise scenario; see an extensive empirical study by Christensen, Oomen, and Podolskij (2014). Thus, in an extreme case when the noise has identical statistical properties in two samples, LA may give very different estimates due to the differences in sample sizes or noise-to-signal ratios. The ReMeDI approach remains robust and accurate.

5.3 Random Noise Size and Observation Times

As the last robustness check, we now allow for stochastic observation times and random scales of noise. Following Jacod, Li, and Zheng (2017), we let urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0277 follow an inhomogeneous Poisson process with rate urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0278, where urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0279 and the process γ satisfies urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0280, urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0281. We set urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0282, urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0283, urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0284, urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0285. Figure 4 reports the estimation of the autocorrelation functions by the two estimators. We observe similar patterns presented in Figure 2: compared to the ReMeDI estimators, the LA estimators have large biases with a wide confidence band.

Details are in the caption following the image

Estimation of the autocorrelations of noise by the ReMeDI method (left panel) and the local averaging method (right panel). The blue solid lines are the mean estimates of 1000 simulations by the two estimators. The tuning parameters of the ReMeDI and LA estimators are 10 and 6, respectively. The noise has stochastic scales and the observation times are random; see the specifications in Section 5.3.

The Supplemental Material (Li and Linton (2022)) provides additional simulation studies to examine the quality of the CLT approximation, the effect of rounding error due to the discreteness of price, and the sensitivity to the choice of tuning parameters.

6 Empirical Study

We obtain the transaction prices of Coca-Cola (trading symbol KO) from the TAQ database for January 2018 (21 trading days). We remove prices before 9:30 and after 16:00. We collect approximately 50,000 observations per day, that is, 2.1 transactions per second on average. The average price is $46.84, with a standard deviation of 0.85.

Figure 5 plots the estimated autocovariances of noise by the ReMeDI estimators (the blue plots) based on samples of different sizes. The autocorrelation pattern is nontrivial: noise exhibits positive autocorrelations up to 4 lags, and shortly thereafter the sign switches to negative for a few lags, and then reverts to positive autocorrelations before decaying to zero around 20 lags. The pointwise confidence interval includes zero or excludes positive values after lag 5, which is incompatible with simple long memory.

Details are in the caption following the image

Estimation of autocovariances of noise for Coca-Cola (KO) in January 2018. In the top panel, we use the transaction prices of KO on 2 January 2018; in the middle panel, we use the transaction prices of KO in the second trading week (8 January 2018 to 12 January 2018); we employ the entire transaction prices of KO in January 2018 in the bottom panel. The tuning parameters for ReMeDI and LA are 10 and 6, respectively. The shaded area in the top panel represents the 95% confidence interval, and we set in = 5, urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0286 to compute the asymptotic variances of the ReMeDI estimators, where urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0287 is the number of observations.

The ReMeDI estimates of microstructure noise presented in Figure 5 are economically intuitive. The positive autocovariances at the first several lags may be a consequence of the order splitting strategies by high-frequency traders (Biais, Hillion, and Spatt (1995)), or the successive transactions executed by limit orders (Parlour (1998)). The negative autocovariances at the intermediate lags are consistent with the prediction of inventory models (Ho and Stoll (1981), Hendershott and Menkveld (2014)), in which the market makers induce negatively autocorrelated order flows to balance their inventories. However, the LA method gives very different estimates: it says that the noise is strongly autocorrelated without any sign of decay after 20 lags. This is economically counterintuitive—such a pattern, if it exists, would be exploited by high-frequency traders and we would expect it to disappear rapidly. Moreover, the serial dependence, according to the LA estimates, is even stronger when estimation is performed on a larger sample. Since we only estimate autocovariances of noise up to 20 ticks/lags, or a few seconds, it is statistically counterintuitive to obtain stronger autocovariance estimates using the prices of a week than using the prices in a single trading day. This is in line with our simulation study that the LA estimates are subject to a finite sample bias that depends on the noise-to-signal ratio and sample size. The ReMeDI approach retains its accuracy and robustness.

7 Concluding Remarks

We introduce a differencing method to separate the microstructure noise from the underlying semimartingale efficient price in a general setting. We demonstrate the robustness of the proposed method compared to the main existing approach. We have concentrated on the infill setting primarily and the univariate case. The method naturally extends to the multivariate case, although in that case, several issues arise. First, the nonsynchronous trading issue has to be faced. Second, even when the assets trade on a common clock, there are some remaining theoretical results that need to be established for the infill case. We discussed briefly in Section 4.4.1 how one can improve efficiency by combining the estimators associated with different choices of urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0288. An alternative potential source of efficiency gain is from the heteroscedasticity delivered by the γ process, which was not exploited by our method. Given a consistent estimator of urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0289, one may implement a kind of feasible GLS procedure. We leave these problems for future research.

  • 1 By price it always means the logarithmic price in this paper unless stated otherwise.
  • 2 The differencing method has been used in high-frequency econometrics recently; see, for example, Todorov (2013), Hansen and Lunde (2014), Andersen, Li, Todorov, and Zhou (2020).
  • 3 For example, Hasbrouck and Ho (1987), Choi, Salandro, and Shastri (1988), and Huang and Stoll (1997) modeled the probability of order reversal, and microstructure noise becomes an AR(1) process.
  • 4 One can easily verify the following scenarios by simulation: (1) the LA estimator may report positive autocovariances when the true noise process is uncorrelated or even negatively autocorrelated; (2) the LA estimator has larger bias and variance if there are bursts of volatility in the efficient price process, for example, when the volatility process jumps; (3) the LA estimator gives very different estimates over two samples where the noise processes are identical but the efficient prices have different variances.
  • 5 For example, the LA (ReMeDI) takes 99.77% (0.23%) of the CPU time to estimate the variance of the noise using noisy price data from a random walk plus AR(1) noise model, based on 1000 simulated samples of size 23,400. The ReMeDI estimator has been included in the R-package for high-frequency analysis; see https://CRAN.R-project.org/package=highfrequency. The Matlab code is also available on the authors' homepage; see https://sites.google.com/view/merrickli/research.
  • 6 We have almost the same regularity conditions as Jacod, Li, and Zheng (2017). The only difference is that we have a slightly stronger restriction on the serial dependence of the stationary noise; see Remark 1.
  • 7 This is a standard condition in high-frequency econometric analysis; see Aït-Sahalia and Jacod (2014) and Jacod and Protter (2011). In the sequel, by saying an Itô semimartingale satisfies Assumption H, we mean the components of its Grigelionis representation (recall (2)) satisfies Assumption H.
  • 8 For any urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0062, the mixing coefficients for k are given by
    urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0063
    where urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0064, urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0065. The sequence urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0066 is ρ-mixing if urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0067 as urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0068.
  • 9 By convention, we set urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0116 if urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0117 and urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0118 if j is a singleton.
  • 10 In the Supplemental Material Li and Linton (2022), we discuss how to select urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0183 in practice.
  • 11 Since the efficient returns and cross terms of efficient returns and noise are of smaller order under infill asymptotics.
  • 12 We select the same tuning parameter for the LA estimator as in Jacod, Li, and Zheng (2017); we also check other alternatives, and we find urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0276 leads to smaller bias.
  • 13 In the Supplemental Material (Li and Linton (2022)), we use the transaction prices of General Electric (GE) and Citigroup (Citi), and we obtain similar results.
  • 14 Recall Section B that the duration of successive observed prices is part of the asymptotic variance estimator. We do not plot the confidence intervals when we use transaction prices on different trading days since the prices will cover overnight non-trading hours.
  • 15 Hasbrouck and Ho (1987) and Choi, Salandro, and Shastri (1988) modeled the continuation of order flows by an AR(1) process.
  • 16 By convention we let urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0298.
  • Appendix A: The Asymptotic (Co)Variance

    This section introduces urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0290 that appears in the asymptotic variance in Theorem 2. In the sequel, whenever we have two vectors urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0291, urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0292, we suppose without loss of generality that urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0293. We denote
    urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0294
    For each urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0295, there is an associated (unique) pair of subsets:
    urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0296(25)
    We denote for each urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0297 the following moments:
    urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0299
    Then urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0300 is given by
    urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0301(26)

    Appendix B: The Estimation of the Asymptotic (Co)Variance

    First, we introduce a sequence of notations
    urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0302
    where the indices appearing above are given by
    urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0303
    The asymptotic variance estimator is given by
    urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0304(27)
    where
    urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0305
    The estimators seem quite complicated. However, the intuition will be clear in light of the following convergences, under the asymptotic conditions that
    urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0306(28)
    The proofs of the convergences are in the Supplemental Material.

    Now we consider some special cases where the asymptotic (co)variances are simpler. As a consequence, the asymptotic variance estimators are also much simplified.

    First, we consider the scenario urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0307. The observations schemes that satisfy this condition include the regular sampling scheme and the time-changed regular sampling scheme. Next, let urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0308. One can verify that the modulated Poisson sampling scheme satisfies this condition; see the discussion in Jacod, Li, and Zheng (2017). The asymptotic (co)variance becomes
    urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0309
    and a consistent estimator is given by
    urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0310
    where
    urn:x-wiley:00129682:media:ecta200362:ecta200362-math-0311

      The full text of this article hosted at iucr.org is unavailable due to technical difficulties.