The estimation of a biased density for exponentially strongly mixing sequences is investigated. We construct a new adaptive wavelet estimator based on a hard thresholding rule. We determine a sharp upper bound of the associated mean integrated square error for a wide class of functions.

1. Introduction

In the standard density estimation problem, we observe n random variables X₁, …, X_n with common density function f. The goal is to estimate f from X₁, …, X_n. However, in some applications, X₁, …, X_n are not accessible; we only have n random variables Z₁, …, Z_n with the common density

(1.1)

where w denotes a known positive function and μ is the unknown normalization parameter: μ = ∫w(y)f(y)dy. Our goal is to estimate the “biased density” f from Z₁, …, Z_n. Practical examples can be found in, for example, [1–3] and the survey by the author of [4].

The standard i.i.d. case has been investigated in several papers. See, for example, [5–9]. To the best of our knowledge, the dependent case has only been examined in [10] for associated (positively or negatively) Z₁, …, Z_n. In this paper, we study another dependent (and realistic) structure which has not been addressed earlier: we suppose that Z₁, …, Z_n is a sample of a strictly stationary and exponentially strongly mixing process (Z_i) _i∈ℤ (to be defined in Section 2). Such a dependence condition arises for a wide class of GARCH-type time series models classically encountered in finance. See, for example, [11, 12] for an overview.

We focus our attention on the wavelet methods because they provide a coherent set of procedures that are spatially adaptive and near optimal over a wide range of function spaces. See, for example, [13, 14] for a detailed coverage of wavelet theory in statistics. We develop two new wavelet estimators: a linear nonadaptive based on projections and a nonlinear adaptive using the hard thresholding rule introduced by [15]. We measure their performances by determining upper bounds of the mean integrated squared error (MISE) over Besov balls (to be defined in Section 3). We prove that our adaptive estimator attains a sharp rate of convergence, close to the one attained by the linear wavelet estimator (constructed in a nonadaptive fashion to minimize the MISE).

The rest of the paper is organized as follows. Section 2 is devoted to the assumptions on the model. In Section 3, we present wavelets and Besov balls. The considered wavelet estimators are defined in Section 4. Section 5 is devoted to the results. The proofs are postponed in Section 6.

2. Assumptions on the Model

We assume that Z₁, …, Z_n coming from a strictly stationary process (Z_i) _i∈ℤ. For any m ∈ ℤ, we define the mth strongly mixing coefficient of (Z_i) _i∈ℤ by

(2.1)

where, for any u ∈ ℤ,

is the σ-algebra generated by the random variables …, Z_u−1, Z_u and

is the σ-algebra generated by the random variables Z_u, Z_u+1, ….

We consider the exponentially strongly mixing case, that is, there exist three known constants, γ > 0, c > 0, and θ > 0, such that, for any m ∈ ℤ,

(2.2)

This assumption is satisfied by a large class of GARCH processes. See, for example, [11, 12, 16, 17].

Note that, when θ → ∞, we are in the standard i.i.d. case.

W.o.l.g., the support of the functions f, and w are [0,1].

There exist two constants, c > 0 and C > 0, such that

(2.3)

There exists a (known) constant C > 0 such that

(2.4)

For any m ∈ {1, …, n}, let

be the density of (Z₀, Z_m). There exists a constant C > 0 such that

(2.5)

The two first boundedness assumptions are standard in the estimation of biased densities. See, for example, [6–8].

3. Wavelets and Besov Balls

Let N be an integer ϕ and ψ be the initial wavelets of dbN (so supp (ϕ) = supp (ψ) = [1 − N, N]). Set

(3.1)

With an appropriate treatments at the boundaries, there exists an integer τ satisfying 2^τ ≥ 2N such that the collection ℬ = {ϕ_τ,k(·), k ∈ {0, …, 2^τ − 1}; ψ_j,k(·); j ∈ ℕ − {0, …, τ − 1}, k ∈ {0, …, 2^j − 1}}, is an orthonormal basis of 𝕃²([0,1]) (the space of square-integrable functions on [0,1]). See [18].

For any integer ℓ ≥ τ, any h ∈ 𝕃²([0,1]) can be expanded on ℬ as

(3.2)

where α_j,k and β_j,k are the wavelet coefficients of h defined by

(3.3)

Let M > 0, s > 0, p ≥ 1, and r ≥ 1. A function h belongs to

if and only if there exists a constant M^* > 0 (depending on M) such that the associated wavelet coefficients (3.3) satisfy

(3.4)

In this expression, s is a smoothness parameter and p and r are norm parameters. For a particular choice of s, p, and r,

contains some classical sets of functions as the Hölder and Sobolev balls. See [19].

4. Estimators

Firstly, we consider the following estimator for μ:

(4.1)

It is obtained by the method of moments (see Proposition 6.2 below).

Then, for any integer j ≥ τ and any k ∈ {0, …, 2^j − 1}, we estimate the unknown wavelet coefficient

(i)
by
(4.2)
(ii)
by
(4.3)

Note that they are those considered in the i.i.d. case (see, e.g., [8, 9]). Their statistical properties, with our dependent structure, are investigated in Propositions 6.2, 6.3, and 6.4 below.

Assuming that

with p ≥ 2, we define the linear estimator

(4.4)

where

is defined by (4.2) and j₀ is the integer satisfying

(4.5)

For a survey on wavelet linear estimators for various density models, we refer the reader to [20]. For the consideration of strongly mixing sequences, see, for example, [21, 22].

We define the hard thresholding estimator

(4.6)

x ∈ [0,1], where

is defined by (4.2) and

by (4.3), for any random event 𝒜, 𝕀_𝒜 is the indicator function on 𝒜, j₁ is the integer satisfying

(4.7)

θ is the one in (2.2), κ is a large enough constant (the one in Proposition 6.4 below) and λ_n is the threshold

(4.8)

The feature of the hard thresholding estimator is to only estimate the “large" unknown wavelet coefficients of f which contain his main characteristics.

For the construction of hard thresholding wavelet estimators in the standard density model, see, for example, [15, 23].

5. Results

Theorem 5.1 (upper bound for f∧L). Consider (1.1) under the assumptions of Section 2. Suppose that with s > 0, p ≥ 2, and r ≥ 1. Let be (4.4). Then there exists a constant C > 0 such that

(5.1)

The proof of Theorem 5.1 uses a suitable decomposition of the MISE and a moment inequality on (4.2) (see Proposition 6.3 below).

Note that n^−2s/(2s+1) is the optimal rate of convergence (in the minimax sense) for the standard density model in the independent case (see, e.g., [14, 23]).

Theorem 5.2 (upper bound for f∧H). Consider (1.1) under the assumptions of Section 2. Let be (4.6). Suppose that with r ≥ 1, {p ≥ 2 and s > 0} or {p ∈ [1,2) and s > 1/p}. Then there exists a constant C > 0 such that

(5.2)

The proof of Theorem 5.2 uses a suitable decomposition of the MISE, some moment inequalities on (4.2) and (4.3) (see Proposition 6.3 below), and a concentration inequality on (4.3) (see Proposition 6.4 below).

Theorem 5.2 shows that, besides being adaptive, attains a rate of convergence close to the one of . The only difference is the logarithmic term (ln n) ^{(1+1/θ)(2s/(2s+1))}.

Note that, if we restrict our study to the independent case, that is, θ → ∞, the rate of convergence attained by becomes the standard one: (log n/n) ^2s/(2s+1). See, for example, [14, 15, 23].

6. Proofs

In this section, we consider (1.1) under the assumptions of Section 2. Moreover, C denotes any constant that does not depend on j, k and n. Its value may change from one term to another and may depends on ϕ or ψ.

6.1. Auxiliary Results

Lemma 6.1. For any integer j ≥ τ and any k ∈ {0, …, 2^j − 1}, let be (4.2) and . Then, under the assumptions of Section 2, there exists a constant C > 0 such that

(6.1)

This inequality holds for ψ instead of ϕ (and, a fortiori,

defined by (4.3) instead of

and

instead of α_j,k).

Proof of Lemma 6.1. We have

(6.2)

Due to (2.3), we have

and

. Therefore

(6.3)

Using (2.4) and the Cauchy-Schwarz inequality, we obtain

(6.4)

Hence

(6.5)

Lemma 6.1 is proved.

Proposition 6.2. For any integer j ≥ τ such that 2^j ≤ n and any k ∈ {0, …, 2^j − 1}, let and be (4.1). Then,

(1)
one has
(6.6)
(2)
there exists a constant C > 0 such that
(6.7)
(3)
there exists a constant C > 0 such that
(6.8)

These results hold for ψ instead of ϕ (and, a fortiori,

instead of α_j,k).

Proof of Proposition 6.2. (1) We have

(6.9)

Since f is a density, we obtain

(6.10)

(2) We have

(6.11)

Using (2.3) and (2.4), we have sup _x∈[0,1]g(x) ≤ C. Hence,

(6.12)

It follows from the stationarity of (Z_i) _i∈ℤ and 2^j ≤ n that

(6.13)

where

(6.14)

Let us now bound T₁ and T₂.

Upper Bound for T1 Using (2.5), (2.3), and doing the change a variables y = 2^jx − k, we obtain

(6.15)

Therefore,

(6.16)

Upper Bound for T2 By the Davydov inequality for strongly mixing processes (see [24]), for any q ∈ (0,1), it holds that

(6.17)

By (2.3), we have

(6.18)

and, by (6.12),

(6.19)

Therefore,

(6.20)

Since

, we have

(6.21)

It follows from (6.13), (6.16), and (6.21) that

(6.22)

Combining (6.11), (6.12), and (6.22), we obtain

(6.23)

(3) Proceeding in a similar fashion to 2-, we obtain

(6.24)

Using (2.3) (which implies sup _x∈[0,1](1/w(x)) ≤ C) and applying the Davydov inequality, we obtain

(6.25)

The proof of Proposition 6.2 is complete.

Proposition 6.3. For any integer j ≥ τ such that 2^j ≤ n and any k ∈ {0, …, 2^j − 1}, let and be (4.2). Then,

(1)
there exists a constant C > 0 such that
(6.26)
(2)
there exists a constant C > 0 such that
(6.27)

These inequalities hold for

defined by (4.3) instead of

, and

instead of α_j,k.

Proof of Proposition 6.3. (1) Applying Lemma 6.1 and Proposition 6.2, we have

(6.28)

(2) We have

(6.29)

By (2.3), we have

and sup _x∈[0,1](1/w(x)) ≤ C. So,

(6.30)

By (6.4), we have |α_j,k | ≤ C. Therefore

(6.31)

It follows from (6.31) and (6.28) that

(6.32)

The proof of Proposition 6.3 is complete.

Proposition 6.4. For any j ∈ {τ, …, j₁} and any k ∈ {0, …, 2^j − 1}, let , be (4.3) and λ_n be (4.8). Then there exist two constants, κ > 0 and C > 0, such that

(6.33)

Proof of Proposition 6.4. It follows from Lemma 6.1 that

(6.34)

where

(6.35)

In order to bound P₁ and P₂, let us present a Bernstein inequality for exponentially strongly mixing process. We refer to [25, 26].

Lemma 6.5 (see [25], [26].)Let γ > 0, c > 0, θ > 1 and (Z_i) _i∈ℤ be a stationary process such that, for any m ∈ ℤ, the associated mth strongly mixing coefficient (2.2) satisfies a_m ≤ γexp (−c | m|^θ). Let n ∈ ℕ^*, h : ℝ → ℝ be a measurable function and, for any i ∈ ℤ, U_i = h(Z_i). One assumes that 𝔼(U₁) = 0 and there exists a constant M > 0 satisfying |U₁ | ≤ M < ∞. Then, for any m ∈ {1, …, n} and any λ > 4mM/n, one has

(6.36)

Upper Bound for P1 For any i ∈ {1, …, n}, set

(6.37)

Then U₁, …, U_n are identically distributed, depend on the stationary strongly mixing process (Z_i) _i∈ℤ which satisfies (2.2), Proposition 6.2 gives

(6.38)

and, by (2.3) and (6.4),

(6.39)

It follows from Lemma 6.5 applied with U₁, …, U_n, λ = κCλ_n, λ_n = ((ln n) ^1+1/θ/n) ^1/2, m = (uln n) ^1/θ with u > 0 (chosen later), M = C2^j/2 and

, that

(6.40)

Therefore, for large enough κ and u, we have

(6.41)

Upper Bound for P2 For any i ∈ {1, …, n}, set

(6.42)

Then U₁, …, U_n are identically distributed, depend on the stationary strongly mixing process (Z_i) _i∈ℤ which satisfies (2.2), Proposition 6.2 gives

(6.43)

By (2.3), we have

(6.44)

It follows from Lemma 6.5 applied with U₁, …, U_n, λ = κCλ_n, λ_n = ((ln n) ^1+1/θ/n) ^1/2, m = (uln n) ^1/θ with u > 0 (chosen later) and M = C that

(6.45)

Therefore, for large enough κ and u, we have

(6.46)

Putting (6.34), (6.41), and (6.46) together, this ends the proof of Proposition 6.4.

6.2. Proofs of the Main Results

Proof of Theorem 5.1. We expand the function f on ℬ as

(6.47)

where

and

We have, for any x ∈ [0,1],

(6.48)

Since ℬ is an orthonormal basis of 𝕃²([0,1]), we have,

(6.49)

Using Proposition 6.3, we obtain

(6.50)

Since p ≥ 2, we have

. Hence

(6.51)

Therefore,

(6.52)

The proof of Theorem 5.1 is complete.

Proof of Theorem 5.2. We expand the function f on ℬ as

(6.53)

where

and

We have, for any x ∈ [0,1],

(6.54)

Since ℬ is an orthonormal basis of 𝕃²([0,1]), we have

(6.55)

where

(6.56)

Let us bound R, T, and S, in turn.

Upper Bound for R Using Proposition 6.3 and 2s/(2s + 1) < 1, we obtain

(6.57)

Upper Bound for T For r ≥ 1 and p ≥ 2, we have . Since 2s/(2s + 1) < 2s, we have

(6.58)

For r ≥ 1 and p ∈ [1,2), we have

. Since s > 1/p, we have s + 1/2 − 1/p > s/(2s + 1). So

(6.59)

Hence, for r ≥ 1, {p ≥ 2 and s > 0} or {p ∈ [1,2) and s > 1/p}, we have

(6.60)

Upper Bound for S Note that we can write the term S as

(6.61)

where

(6.62)

Let us investigate the bounds of S₁, S₂, S₃, and S₄ in turn.

Upper Bounds for S1 and S3 We have

(6.63)

So,

(6.64)

It follows from the Cauchy-Schwarz inequality, Propositions 6.3 and 6.4, and

that

(6.65)

Since 2s/(2s + 1) < 1, we have

(6.66)

Upper Bound for S2 Using again Proposition 6.3, we obtain

(6.67)

Hence,

(6.68)

Let j₂ be the integer defined by

(6.69)

We have

(6.70)

where

(6.71)

We have

(6.72)

For r ≥ 1 and p ≥ 2, since

(6.73)

For r ≥ 1, p ∈ [1,2) and s > 1/p, using

and (2s + 1)(2 − p)/2 + (s + 1/2 − 1/p)p = 2s, we have

(6.74)

So, for r ≥ 1, {p ≥ 2 and s > 0} or {p ∈ [1,2) and s > 1/p}, we have

(6.75)

Upper Bound for S4 We have

(6.76)

Let j₂ be the integer (6.69). Then

(6.77)

where

(6.78)

We have

(6.79)

For r ≥ 1 and p ≥ 2, since

, we have

(6.80)

For r ≥ 1, p ∈ [1,2) and s > 1/p, using

and (2s + 1)(2 − p)/2 + (s + 1/2 − 1/p)p = 2s, we have

(6.81)

So, for r ≥ 1, {p ≥ 2 and s > 0} or {p ∈ [1,2) and s > 1/p}, we have

(6.82)

It follows from (6.61), (6.66), (6.75), and (6.82) that

(6.83)

Combining (6.55), (6.57), (6.60), and (6.83), we have, for r ≥ 1, {p ≥ 2 and s > 0} or {p ∈ [1,2) and s > 1/p},

(6.84)

The proof of Theorem 5.2 is complete.

Acknowledgment

This paper is supported by ANR Grant NatImages, ANR-08-EMER-009.

References

1 Buckland S. T., Anderson D. R., Burnham K. P., and Laake J. L., Distance Sampling: Estimating Abundance of Biological Populations, 1993, Chapman & Hall, London, UK, 1263023.
10.1007/978-94-011-1572-8
Web of Science® Google Scholar
2 Cox D., N. L. Johnson and H. Smith, Some sampling problems in technology, New Developments in Survey Sampling, 1969, John Wiley & Sons, New York, NY, USA, 506–527.
Google Scholar
3 Heckman J., Selection bias and self-selection, The New Palgrave : A Dictionary of Economics, 1985, MacMillan Press, New York, NY, USA, 287–296.
Google Scholar
4 Patil G. P. and Rao C. R., P. R. Krishnaiah, The weighted distributions: a survey of their applications, Applications of Statistics, 1977, North-Holland, Amsterdam, The Netherlands, 383–405.
Web of Science® Google Scholar
5 El Barmi H. and Simonoff J. S., Transformation-based density estimation for weighted distributions, Journal of Nonparametric Statistics. (2000) 12, no. 6, 861–878, 1802580, https://doi.org/10.1080/10485250008832838, ZBL0971.62016.
10.1080/10485250008832838
Web of Science® Google Scholar
6 Efromovich S., Density estimation for biased data, The Annals of Statistics. (2004) 32, no. 3, 1137–1161, 2065200, https://doi.org/10.1214/009053604000000300, ZBL1091.62022.
10.1214/009053604000000300
Web of Science® Google Scholar
7 Brunel E., Comte F., and Guilloux A., Nonparametric density estimation in presence of bias and censoring, Test. (2009) 18, no. 1, 166–194, https://doi.org/10.1007/s11749-007-0075-5, 2495970, ZBL1203.62052.
10.1007/s11749-007-0075-5
Web of Science® Google Scholar
8 Chesneau C., Wavelet block thresholding for density estimation in the presence of bias, Journal of the Korean Statistical Society. (2010) 39, no. 1, 43–53, 2655811, https://doi.org/10.1016/j.jkss.2009.03.004.
10.1016/j.jkss.2009.03.004
Web of Science® Google Scholar
9 Ramírez P. and Vidakovic B., Wavelet density estimation for stratified size-biased sample, Journal of Statistical Planning and Inference. (2010) 140, no. 2 2, 419–432, 2558374, https://doi.org/10.1016/j.jspi.2009.07.021, ZBL1177.62046.
10.1016/j.jspi.2009.07.021
Web of Science® Google Scholar
10 Doosti H. and Dewan I., Wavelet linear density estimation for associated stratified size-biased sample, Statistics & Mathematics Unit. In press.
Google Scholar
11 Doukhan P., Mixing. Properties and Examples, 1994, 85, Springer, New York, NY, USA, Lecture Notes in Statistics, 1312160.
10.1007/978-1-4612-2642-0
Google Scholar
12 Carrasco M. and Chen X., Mixing and moment properties of various GARCH and stochastic volatility models, Econometric Theory. (2002) 18, no. 1, 17–39, 1885348, https://doi.org/10.1017/S0266466602181023, ZBL1181.62125.
10.1017/S0266466602181023
Web of Science® Google Scholar
13 Antoniadis A., Wavelets in statistics: a review (with discussion), Journal of the Italian Statistical Society, Series B. (1997) 6, 97–144.
10.1007/BF03178905
Google Scholar
14 Härdle W., Kerkyacharian G., Picard D., and Tsybakov A., Wavelets, Approximation, and Statistical Applications, 1998, 129, Springer, New York, NY, USA, Lecture Notes in Statistics, 1618204.
10.1007/978-1-4612-2222-4
Web of Science® Google Scholar
15 Donoho D. L., Johnstone I. M., Kerkyacharian G., and Picard D., Density estimation by wavelet thresholding, The Annals of Statistics. (1996) 24, no. 2, 508–539, 1394974, https://doi.org/10.1214/aos/1032894451, ZBL0860.62032.
10.1214/aos/1032894451
Web of Science® Google Scholar
16 Withers C. S., Conditions for linear processes to be strong-mixing, Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete. (1981) 57, no. 4, 477–480, https://doi.org/10.1007/BF01025869, 631371, ZBL0465.60032.
10.1007/BF01025869
Web of Science® Google Scholar
17 Modha D. S. and Masry E., Minimum complexity regression estimation with weakly dependent observations, IEEE Transactions on Information Theory. (1996) 42, no. 6, part 2, 2133–2145, https://doi.org/10.1109/18.556602, 1447519, ZBL0868.62015.
10.1109/18.556602
Web of Science® Google Scholar
18 Cohen A., Daubechies I., and Vial P., Wavelets on the interval and fast wavelet transforms, Applied and Computational Harmonic Analysis. (1993) 1, no. 1, 54–81, https://doi.org/10.1006/acha.1993.1005, 1256527, ZBL0795.42018.
10.1006/acha.1993.1005
Google Scholar
19 Meyer Y., Wavelets and Operators, 1992, 37, Cambridge University Press, Cambridge, UK, Cambridge Studies in Advanced Mathematics, 1228209.
10.1111/j.1749-6632.1988.tb32998.x
Web of Science® Google Scholar
20 Chaubey Y.P., Chesneau C., and Doosti H., On linear wavelet density estimation: some recent developments, Journal of the Indian Society of Agricultural Statistics. In press.
Google Scholar
21 Leblanc F., Wavelet linear density estimator for a discrete-time stochastic process: L_p-losses, Statistics & Probability Letters. (1996) 27, no. 1, 71–84, 1394179, https://doi.org/10.1016/0167-7152(95)00046-1, ZBL0845.62033.
10.1016/0167-7152(95)00046-1
Web of Science® Google Scholar
22 Masry E., Probability density estimation from dependent observations using wavelets orthonormal bases, Statistics & Probability Letters. (1994) 21, no. 3, 181–194, 1310095, https://doi.org/10.1016/0167-7152(94)90114-7, ZBL0814.62021.
10.1016/0167-7152(94)90114-7
Web of Science® Google Scholar
23 Delyon B. and Juditsky A., On minimax wavelet estimators, Applied and Computational Harmonic Analysis. (1996) 3, no. 3, 215–228, https://doi.org/10.1006/acha.1996.0017, 1400080, ZBL0865.62023.
10.1006/acha.1996.0017
Web of Science® Google Scholar
24 Davydov J. A., The invariance principle for stationary processes, Theory of Probability and Its Applications. (1970) 15, 498–509, 0283872.
10.1137/1115050
Web of Science® Google Scholar
25 Rio E., The functional law of the iterated logarithm for stationary strongly mixing sequences, The Annals of Probability. (1995) 23, no. 3, 1188–1203, 1349167, https://doi.org/10.1214/aop/1176988179, ZBL0833.60024.
10.1214/aop/1176988179
Web of Science® Google Scholar
26 Liebscher E., Strong convergence of sums of a-mixing random variables with applications to density estimation, Stochastic Processes and their Applications. (1996) 65, no. 1, 69–80, 1422880, https://doi.org/10.1016/S0304-4149(96)00096-8.
10.1016/S0304-4149(96)00096-8
Web of Science® Google Scholar

All articles

Adaptive Wavelet Estimation of a Biased Density for Strongly Mixing Sequences

Abstract

1. Introduction

2. Assumptions on the Model

3. Wavelets and Besov Balls

4. Estimators

5. Results

6. Proofs

6.1. Auxiliary Results

6.2. Proofs of the Main Results

Acknowledgment

References

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

Adaptive Wavelet Estimation of a Biased Density for Strongly Mixing Sequences

Abstract

1. Introduction

2. Assumptions on the Model

3. Wavelets and Besov Balls

4. Estimators

5. Results

6. Proofs

6.1. Auxiliary Results

6.2. Proofs of the Main Results

Acknowledgment

References

References

Related

Information