1. Introduction
In the standard density estimation problem, we observe
n random variables
X1, …,
Xn with common density function
f. The goal is to estimate
f from
X1, …,
Xn. However, in some applications,
X1, …,
Xn are not accessible; we only have
n random variables
Z1, …,
Zn with the common density
(1.1)
where
w denotes a known positive function and
μ is the unknown normalization parameter:
μ = ∫
w(
y)
f(
y)
dy. Our goal is to estimate the “biased density”
f from
Z1, …,
Zn. Practical examples can be found in, for example, [
1–
3] and the survey by the author of [
4].
The standard i.i.d. case has been investigated in several papers. See, for example, [5–9]. To the best of our knowledge, the dependent case has only been examined in [10] for associated (positively or negatively) Z1, …, Zn. In this paper, we study another dependent (and realistic) structure which has not been addressed earlier: we suppose that Z1, …, Zn is a sample of a strictly stationary and exponentially strongly mixing process (Zi) i∈ℤ (to be defined in Section 2). Such a dependence condition arises for a wide class of GARCH-type time series models classically encountered in finance. See, for example, [11, 12] for an overview.
We focus our attention on the wavelet methods because they provide a coherent set of procedures that are spatially adaptive and near optimal over a wide range of function spaces. See, for example, [13, 14] for a detailed coverage of wavelet theory in statistics. We develop two new wavelet estimators: a linear nonadaptive based on projections and a nonlinear adaptive using the hard thresholding rule introduced by [15]. We measure their performances by determining upper bounds of the mean integrated squared error (MISE) over Besov balls (to be defined in Section 3). We prove that our adaptive estimator attains a sharp rate of convergence, close to the one attained by the linear wavelet estimator (constructed in a nonadaptive fashion to minimize the MISE).
The rest of the paper is organized as follows. Section 2 is devoted to the assumptions on the model. In Section 3, we present wavelets and Besov balls. The considered wavelet estimators are defined in Section 4. Section 5 is devoted to the results. The proofs are postponed in Section 6.
2. Assumptions on the Model
We assume that
Z1, …,
Zn coming from a strictly stationary process (
Zi)
i∈ℤ. For any
m ∈
ℤ, we define the
mth strongly mixing coefficient of (
Zi)
i∈ℤ by
(2.1)
where, for any
u ∈
ℤ,
is the
σ-algebra generated by the random variables …,
Zu−1,
Zu and
is the
σ-algebra generated by the random variables
Zu,
Zu+1, ….
We consider the exponentially strongly mixing case, that is, there exist three known constants,
γ > 0,
c > 0, and
θ > 0, such that, for any
m ∈
ℤ,
(2.2)
This assumption is satisfied by a large class of GARCH processes. See, for example, [
11,
12,
16,
17].
Note that, when θ → ∞, we are in the standard i.i.d. case.
W.o.l.g., the support of the functions f, and w are [0,1].
There exist two constants,
c > 0 and
C > 0, such that
(2.3)
There exists a (known) constant
C > 0 such that
(2.4)
For any
m ∈ {1, …,
n}, let
be the density of (
Z0,
Zm). There exists a constant
C > 0 such that
(2.5)
The two first boundedness assumptions are standard in the estimation of biased densities. See, for example, [6–8].
3. Wavelets and Besov Balls
Let
N be an integer
ϕ and
ψ be the initial wavelets of
dbN (so supp (
ϕ) = supp (
ψ) = [1 −
N,
N]). Set
(3.1)
With an appropriate treatments at the boundaries, there exists an integer
τ satisfying 2
τ ≥ 2
N such that the collection
ℬ = {
ϕτ,k(·),
k ∈ {0, …, 2
τ − 1};
ψj,k(·);
j ∈
ℕ − {0, …,
τ − 1},
k ∈ {0, …, 2
j − 1}}, is an orthonormal basis of
𝕃2([0,1]) (the space of square-integrable functions on [0,1]). See [
18].
For any integer
ℓ ≥
τ, any
h ∈
𝕃2([0,1]) can be expanded on
ℬ as
(3.2)
where
αj,k and
βj,k are the wavelet coefficients of
h defined by
(3.3)
Let
M > 0,
s > 0,
p ≥ 1, and
r ≥ 1. A function
h belongs to
if and only if there exists a constant
M* > 0 (depending on
M) such that the associated wavelet coefficients (
3.3) satisfy
(3.4)
In this expression,
s is a smoothness parameter and
p and
r are norm parameters. For a particular choice of
s,
p, and
r,
contains some classical sets of functions as the Hölder and Sobolev balls. See [
19].
4. Estimators
Firstly, we consider the following estimator for
μ:
(4.1)
It is obtained by the method of moments (see Proposition
6.2 below).
Then, for any integer
j ≥
τ and any
k ∈ {0, …, 2
j − 1}, we estimate the unknown wavelet coefficient
- (i)
by
(4.2)
- (ii)
by
(4.3)
Note that they are those considered in the i.i.d. case (see, e.g., [8, 9]). Their statistical properties, with our dependent structure, are investigated in Propositions 6.2, 6.3, and 6.4 below.
Assuming that
with
p ≥ 2, we define the linear estimator
by
(4.4)
where
is defined by (
4.2) and
j0 is the integer satisfying
(4.5)
For a survey on wavelet linear estimators for various density models, we refer the reader to [20]. For the consideration of strongly mixing sequences, see, for example, [21, 22].
We define the hard thresholding estimator
by
(4.6)
x ∈ [0,1], where
is defined by (
4.2) and
by (
4.3), for any random event
𝒜,
𝕀𝒜 is the indicator function on
𝒜,
j1 is the integer satisfying
(4.7)
θ is the one in (
2.2),
κ is a large enough constant (the one in Proposition
6.4 below) and
λn is the threshold
(4.8)
The feature of the hard thresholding estimator is to only estimate the “large" unknown wavelet coefficients of
f which contain his main characteristics.
For the construction of hard thresholding wavelet estimators in the standard density model, see, for example, [15, 23].
5. Results
Theorem 5.1 (upper bound for <!--${ifMathjaxEnabled: 10.1155%2F2011%2F604150}-->f∧L<!--${/ifMathjaxEnabled:}--><!--${ifMathjaxDisabled: 10.1155%2F2011%2F604150}--><!--${/ifMathjaxDisabled:}-->). Consider (1.1) under the assumptions of Section 2. Suppose that with s > 0, p ≥ 2, and r ≥ 1. Let be (4.4). Then there exists a constant C > 0 such that
(5.1)
The proof of Theorem 5.1 uses a suitable decomposition of the MISE and a moment inequality on (4.2) (see Proposition 6.3 below).
Note that n−2s/(2s+1) is the optimal rate of convergence (in the minimax sense) for the standard density model in the independent case (see, e.g., [14, 23]).
Theorem 5.2 (upper bound for <!--${ifMathjaxEnabled: 10.1155%2F2011%2F604150}-->f∧H<!--${/ifMathjaxEnabled:}--><!--${ifMathjaxDisabled: 10.1155%2F2011%2F604150}--><!--${/ifMathjaxDisabled:}-->). Consider (1.1) under the assumptions of Section 2. Let be (4.6). Suppose that with r ≥ 1, {p ≥ 2 and s > 0} or {p ∈ [1,2) and s > 1/p}. Then there exists a constant C > 0 such that
(5.2)
The proof of Theorem 5.2 uses a suitable decomposition of the MISE, some moment inequalities on (4.2) and (4.3) (see Proposition 6.3 below), and a concentration inequality on (4.3) (see Proposition 6.4 below).
Theorem 5.2 shows that, besides being adaptive, attains a rate of convergence close to the one of . The only difference is the logarithmic term (ln n) (1+1/θ)(2s/(2s+1)).
Note that, if we restrict our study to the independent case, that is, θ → ∞, the rate of convergence attained by becomes the standard one: (log n/n) 2s/(2s+1). See, for example, [14, 15, 23].
6. Proofs
In this section, we consider (1.1) under the assumptions of Section 2. Moreover, C denotes any constant that does not depend on j, k and n. Its value may change from one term to another and may depends on ϕ or ψ.
6.1. Auxiliary Results
Lemma 6.1. For any integer j ≥ τ and any k ∈ {0, …, 2j − 1}, let be (4.2) and . Then, under the assumptions of Section 2, there exists a constant C > 0 such that
(6.1)
This inequality holds for
ψ instead of
ϕ (and, a fortiori,
defined by (
4.3) instead of
and
instead of
αj,k).
Proof of Lemma 6.1. We have
(6.2)
Due to (
2.3), we have
and
. Therefore
(6.3)
Using (
2.4) and the Cauchy-Schwarz inequality, we obtain
(6.4)
Hence
(6.5)
Lemma
6.1 is proved.
Proposition 6.2. For any integer j ≥ τ such that 2j ≤ n and any k ∈ {0, …, 2j − 1}, let and be (4.1). Then,
These results hold for
ψ instead of
ϕ (and, a fortiori,
instead of
αj,k).
Proof of Proposition 6.2.
(1) We have
(6.9)
Since
f is a density, we obtain
(6.10)
(2) We have
(6.11)
Using (
2.3) and (
2.4), we have sup
x∈[0,1]g(
x) ≤
C. Hence,
(6.12)
It follows from the stationarity of (
Zi)
i∈ℤ and 2
j ≤
n that
(6.13)
where
(6.14)
Let us now bound
T1 and
T2.
Upper Bound for T1 Using (2.5), (2.3), and doing the change a variables y = 2jx − k, we obtain
(6.15)
Therefore,
(6.16)
Upper Bound for T2 By the Davydov inequality for strongly mixing processes (see [24]), for any q ∈ (0,1), it holds that
(6.17)
By (
2.3), we have
(6.18)
and, by (
6.12),
(6.19)
Therefore,
(6.20)
Since
, we have
(6.21)
It follows from (
6.13), (
6.16), and (
6.21) that
(6.22)
Combining (
6.11), (
6.12), and (
6.22), we obtain
(6.23)
(3) Proceeding in a similar fashion to 2-, we obtain
(6.24)
Using (
2.3) (which implies sup
x∈[0,1](1/
w(
x)) ≤
C) and applying the Davydov inequality, we obtain
(6.25)
The proof of Proposition
6.2 is complete.
Proposition 6.3. For any integer j ≥ τ such that 2j ≤ n and any k ∈ {0, …, 2j − 1}, let and be (4.2). Then,
These inequalities hold for
defined by (
4.3) instead of
, and
instead of
αj,k.
Proof of Proposition 6.3.
(1) Applying Lemma 6.1 and Proposition 6.2, we have
(6.28)
(2) We have
(6.29)
By (
2.3), we have
and sup
x∈[0,1](1/
w(
x)) ≤
C. So,
(6.30)
By (
6.4), we have |
αj,k | ≤
C. Therefore
(6.31)
It follows from (
6.31) and (
6.28) that
(6.32)
The proof of Proposition
6.3 is complete.
Proposition 6.4. For any j ∈ {τ, …, j1} and any k ∈ {0, …, 2j − 1}, let , be (4.3) and λn be (4.8). Then there exist two constants, κ > 0 and C > 0, such that
(6.33)
Proof of Proposition 6.4. It follows from Lemma 6.1 that
(6.34)
where
(6.35)
In order to bound
P1 and
P2, let us present a Bernstein inequality for exponentially strongly mixing process. We refer to [
25,
26].
Lemma 6.5 (see [25], [26].)Let γ > 0, c > 0, θ > 1 and (Zi) i∈ℤ be a stationary process such that, for any m ∈ ℤ, the associated mth strongly mixing coefficient (2.2) satisfies am ≤ γexp (−c | m|θ). Let n ∈ ℕ*, h : ℝ → ℝ be a measurable function and, for any i ∈ ℤ, Ui = h(Zi). One assumes that 𝔼(U1) = 0 and there exists a constant M > 0 satisfying |U1 | ≤ M < ∞. Then, for any m ∈ {1, …, n} and any λ > 4mM/n, one has
(6.36)
Upper Bound for P1 For any i ∈ {1, …, n}, set
(6.37)
Then
U1, …,
Un are identically distributed, depend on the stationary strongly mixing process (
Zi)
i∈ℤ which satisfies (
2.2), Proposition
6.2 gives
(6.38)
and, by (
2.3) and (
6.4),
(6.39)
It follows from Lemma
6.5 applied with
U1, …,
Un,
λ =
κCλn,
λn = ((ln
n)
1+1/θ/
n)
1/2,
m = (
uln
n)
1/θ with
u > 0 (chosen later),
M =
C2
j/2 and
, that
(6.40)
Therefore, for large enough κ and u, we have
(6.41)
Upper Bound for P2 For any i ∈ {1, …, n}, set
(6.42)
Then
U1, …,
Un are identically distributed, depend on the stationary strongly mixing process (
Zi)
i∈ℤ which satisfies (
2.2), Proposition
6.2 gives
(6.43)
By (
2.3), we have
(6.44)
It follows from Lemma
6.5 applied with
U1, …,
Un,
λ =
κCλn,
λn = ((ln
n)
1+1/θ/
n)
1/2,
m = (
uln
n)
1/θ with
u > 0 (chosen later) and
M =
C that
(6.45)
Therefore, for large enough κ and u, we have
(6.46)
Putting (
6.34), (
6.41), and (
6.46) together, this ends the proof of Proposition
6.4.
6.2. Proofs of the Main Results
Proof of Theorem 5.1. We expand the function f on ℬ as
(6.47)
where
and
.
We have, for any x ∈ [0,1],
(6.48)
Since
ℬ is an orthonormal basis of
𝕃2([0,1]), we have,
(6.49)
Using Proposition
6.3, we obtain
(6.50)
Since
p ≥ 2, we have
. Hence
(6.51)
Therefore,
(6.52)
The proof of Theorem
5.1 is complete.
Proof of Theorem 5.2. We expand the function f on ℬ as
(6.53)
where
and
.
We have, for any x ∈ [0,1],
(6.54)
Since
ℬ is an orthonormal basis of
𝕃2([0,1]), we have
(6.55)
where
(6.56)
Let us bound
R,
T, and
S, in turn.
Upper Bound for R Using Proposition 6.3 and 2s/(2s + 1) < 1, we obtain
(6.57)
Upper Bound for T For r ≥ 1 and p ≥ 2, we have . Since 2s/(2s + 1) < 2s, we have
(6.58)
For
r ≥ 1 and
p ∈ [1,2), we have
. Since
s > 1/
p, we have
s + 1/2 − 1/
p >
s/(2
s + 1). So
(6.59)
Hence, for
r ≥ 1, {
p ≥ 2 and
s > 0} or {
p ∈ [1,2) and
s > 1/
p}, we have
(6.60)
Upper Bound for S Note that we can write the term S as
(6.61)
where
(6.62)
Let us investigate the bounds of
S1,
S2,
S3, and
S4 in turn.
Upper Bounds for S1 and S3 We have
(6.63)
So,
(6.64)
It follows from the Cauchy-Schwarz inequality, Propositions
6.3 and
6.4, and
that
(6.65)
Since 2
s/(2
s + 1) < 1, we have
(6.66)
Upper Bound for S2 Using again Proposition 6.3, we obtain
(6.67)
Hence,
(6.68)
Let
j2 be the integer defined by
(6.69)
We have
(6.70)
where
(6.71)
We have
(6.72)
For
r ≥ 1 and
p ≥ 2, since
,
(6.73)
For
r ≥ 1,
p ∈ [1,2) and
s > 1/
p, using
,
and (2
s + 1)(2 −
p)/2 + (
s + 1/2 − 1/
p)
p = 2
s, we have
(6.74)
So, for
r ≥ 1, {
p ≥ 2 and
s > 0} or {
p ∈ [1,2) and
s > 1/
p}, we have
(6.75)
Upper Bound for S4 We have
(6.76)
Let
j2 be the integer (
6.69). Then
(6.77)
where
(6.78)
We have
(6.79)
For
r ≥ 1 and
p ≥ 2, since
, we have
(6.80)
For
r ≥ 1,
p ∈ [1,2) and
s > 1/
p, using
,
and (2
s + 1)(2 −
p)/2 + (
s + 1/2 − 1/
p)
p = 2
s, we have
(6.81)
So, for
r ≥ 1, {
p ≥ 2 and
s > 0} or {
p ∈ [1,2) and
s > 1/
p}, we have
(6.82)
It follows from (
6.61), (
6.66), (
6.75), and (
6.82) that
(6.83)
Combining (6.55), (6.57), (6.60), and (6.83), we have, for r ≥ 1, {p ≥ 2 and s > 0} or {p ∈ [1,2) and s > 1/p},
(6.84)
The proof of Theorem
5.2 is complete.
Acknowledgment
This paper is supported by ANR Grant NatImages, ANR-08-EMER-009.
- 1
Buckland S. T.,
Anderson D. R.,
Burnham K. P., and
Laake J. L., Distance Sampling: Estimating Abundance of Biological Populations, 1993, Chapman & Hall, London, UK, 1263023.
- 2
Cox D., N. L. Johnson and H. Smith, Some sampling problems in technology, New Developments in Survey Sampling, 1969, John Wiley & Sons, New York, NY, USA, 506–527.
- 3
Heckman J., Selection bias and self-selection, The New Palgrave : A Dictionary of Economics, 1985, MacMillan Press, New York, NY, USA, 287–296.
- 4
Patil G. P. and
Rao C. R., P. R. Krishnaiah, The weighted distributions: a survey of their applications, Applications of Statistics, 1977, North-Holland, Amsterdam, The Netherlands, 383–405.
- 5
El Barmi H. and
Simonoff J. S., Transformation-based density estimation for weighted distributions, Journal of Nonparametric Statistics. (2000) 12, no. 6, 861–878, 1802580, https://doi.org/10.1080/10485250008832838, ZBL0971.62016.
- 6
Efromovich S., Density estimation for biased data, The Annals of Statistics. (2004) 32, no. 3, 1137–1161, 2065200, https://doi.org/10.1214/009053604000000300, ZBL1091.62022.
- 7
Brunel E.,
Comte F., and
Guilloux A., Nonparametric density estimation in presence of bias and censoring, Test. (2009) 18, no. 1, 166–194, https://doi.org/10.1007/s11749-007-0075-5, 2495970, ZBL1203.62052.
- 8
Chesneau C., Wavelet block thresholding for density estimation in the presence of bias, Journal of the Korean Statistical Society. (2010) 39, no. 1, 43–53, 2655811, https://doi.org/10.1016/j.jkss.2009.03.004.
- 9
Ramírez P. and
Vidakovic B., Wavelet density estimation for stratified size-biased sample, Journal of Statistical Planning and Inference. (2010) 140, no. 2 2, 419–432, 2558374, https://doi.org/10.1016/j.jspi.2009.07.021, ZBL1177.62046.
- 10
Doosti H. and
Dewan I., Wavelet linear density estimation for associated stratified size-biased sample, Statistics & Mathematics Unit. In press.
- 11
Doukhan P., Mixing. Properties and Examples, 1994, 85, Springer, New York, NY, USA, Lecture Notes in Statistics, 1312160.
- 12
Carrasco M. and
Chen X., Mixing and moment properties of various GARCH and stochastic volatility models, Econometric Theory. (2002) 18, no. 1, 17–39, 1885348, https://doi.org/10.1017/S0266466602181023, ZBL1181.62125.
- 13
Antoniadis A., Wavelets in statistics: a review (with discussion), Journal of the Italian Statistical Society, Series B. (1997) 6, 97–144.
- 14
Härdle W.,
Kerkyacharian G.,
Picard D., and
Tsybakov A., Wavelets, Approximation, and Statistical Applications, 1998, 129, Springer, New York, NY, USA, Lecture Notes in Statistics, 1618204.
- 15
Donoho D. L.,
Johnstone I. M.,
Kerkyacharian G., and
Picard D., Density estimation by wavelet thresholding, The Annals of Statistics. (1996) 24, no. 2, 508–539, 1394974, https://doi.org/10.1214/aos/1032894451, ZBL0860.62032.
- 16
Withers C. S., Conditions for linear processes to be strong-mixing, Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete. (1981) 57, no. 4, 477–480, https://doi.org/10.1007/BF01025869, 631371, ZBL0465.60032.
- 17
Modha D. S. and
Masry E., Minimum complexity regression estimation with weakly dependent observations, IEEE Transactions on Information Theory. (1996) 42, no. 6, part 2, 2133–2145, https://doi.org/10.1109/18.556602, 1447519, ZBL0868.62015.
- 18
Cohen A.,
Daubechies I., and
Vial P., Wavelets on the interval and fast wavelet transforms, Applied and Computational Harmonic Analysis. (1993) 1, no. 1, 54–81, https://doi.org/10.1006/acha.1993.1005, 1256527, ZBL0795.42018.
- 19
Meyer Y., Wavelets and Operators, 1992, 37, Cambridge University Press, Cambridge, UK, Cambridge Studies in Advanced Mathematics, 1228209.
- 20
Chaubey Y.P.,
Chesneau C., and
Doosti H., On linear wavelet density estimation: some recent developments, Journal of the Indian Society of Agricultural Statistics. In press.
- 21
Leblanc F., Wavelet linear density estimator for a discrete-time stochastic process: Lp-losses, Statistics & Probability Letters. (1996) 27, no. 1, 71–84, 1394179, https://doi.org/10.1016/0167-7152(95)00046-1, ZBL0845.62033.
- 22
Masry E., Probability density estimation from dependent observations using wavelets orthonormal bases, Statistics & Probability Letters. (1994) 21, no. 3, 181–194, 1310095, https://doi.org/10.1016/0167-7152(94)90114-7, ZBL0814.62021.
- 23
Delyon B. and
Juditsky A., On minimax wavelet estimators, Applied and Computational Harmonic Analysis. (1996) 3, no. 3, 215–228, https://doi.org/10.1006/acha.1996.0017, 1400080, ZBL0865.62023.
- 24
Davydov J. A., The invariance principle for stationary processes, Theory of Probability and Its Applications. (1970) 15, 498–509, 0283872.
- 25
Rio E., The functional law of the iterated logarithm for stationary strongly mixing sequences, The Annals of Probability. (1995) 23, no. 3, 1188–1203, 1349167, https://doi.org/10.1214/aop/1176988179, ZBL0833.60024.
- 26
Liebscher E., Strong convergence of sums of a-mixing random variables with applications to density estimation, Stochastic Processes and their Applications. (1996) 65, no. 1, 69–80, 1422880, https://doi.org/10.1016/S0304-4149(96)00096-8.