Wavelet Optimal Estimations for Density Functions under Severely Ill-Posed Noises
Abstract
Motivated by Lounici and Nickl′s work (2011), this paper considers the problem of estimation of a density f based on an independent and identically distributed sample Y1, …, Yn from g = f*φ. We show a wavelet optimal estimation for a density (function) over Besov ball and Lp risk (1 ≤ p < ∞) in the presence of severely ill-posed noises. A wavelet linear estimation is firstly presented. Then, we prove a lower bound, which shows our wavelet estimator optimal. In other words, nonlinear wavelet estimations are not needed in that case. It turns out that our results extend some theorems of Pensky and Vidakovic (1999), as well as Fan and Koo (2002).
1. Introduction and Preliminary
Wavelets have made great achievements in studying the statistical model Y = X + ϵ, where X stands for real-valued random variable with unknown probability density f, and ϵ denotes an independent random noise (error) with density φ.
In 1999, Pensky and Vidakovic [1] investigate Meyer wavelet estimation over Sobolev spaces and L2 risk under moderately and severely ill-posed noises. Three years later, Fan and Koo [2] extend those works from to Besov spaces . It should be pointed out that, by using different method, Lounici and Nickl [3] study wavelet optimal estimation over Besov spaces and L∞ risk under both noises. In [4], we provide a wavelet optimal estimation over and Lp risk (1 ≤ p < ∞, r, q ∈ [1, ∞]) under moderately ill-posed noise. This current paper deals with the same problem under the severely ill-posed noises. It turns out that our result contains some theorems of [1, 2] as special cases. Our discussion also shows that nonlinear wavelet estimations are not needed for severely ill-posed noise, which is totally different with moderately ill-posed case.
Lemma 1 (see [6].)Let h be a scaling or a wavelet function with sup x∈ℝ ∑k∈ℤ|h(x − k)| < ∞. Then, there exist C2 ≥ C1 > 0 such that for λ = {λk} ∈ lp(ℤ) with 1 ≤ p ≤ ∞,
Lemma 2 (see [6].)Let ϕ be a Meyer scaling function and ψ be the corresponding wavelet. If f ∈ Lr(ℝ), 1 ≤ r ≤ ∞, α0k = ∫ f(x)ϕ0k(x)dx, and βjk = ∫ f(x)ψjk(x)dx, then the following assertions are equivalent:
- (i)
;
- (ii)
, where Pjf : = ∑k∈ℤαjkϕjk;
- (iii)
.
- (C1)
;
- (C2)
;
- (C3)
.
In the next section, we define a wavelet linear estimator and provide an upper bound estimation over Besov spaces and Lp risk under the condition (C3); the third part gives a lower bound estimation which shows the result of Section 2 optimal; some concluding remarks are discussed in the last part.
2. Upper Bound
Theorem 3. Let φ satisfy (C3) and ϕ be the Meyer scaling function. If p ∈ [1, ∞), q, r ∈ [1, ∞], , then, with , (3/8π)((ln n)/4c) 1/α < 2j ≤ (3/4π)((ln n)/4c) 1/α, s′ : = s − (1/r − 1/p) +, and x+ = max {x, 0},
Proof. When r ≤ p, s′ : = s − (1/r − 1/p) + = s − 1/r + 1/p. Since lr is continuously embedded to lp, Lemma 2 implies . Hence,
Recall that and . Then
Now, it remains to consider : Using and Lemma 1, one knows
Because Xi,k are i.i.d, the Rosenthal′s inequality tells that
Remark 4. Note that the choices of j and Kn do not depend on the unknown parameters s, r, and q. Then our linear wavelet estimator over Besov space is adaptive or implementable. The same conclusion holds for L∞ and L2 estimations; see Theorem 2 in [3] and Corollary 1 in [1]. On the other hand, when p = 2 and 1 ≤ r ≤ 2, our Theorem 3 reduces to Theorem 4 in [2]; from the proof of Theorem 3, we find that, for p > 1, the assumption can be replaced by ∥xf(x)∥∞ ≤ A, which is the same as in [1]. Therefore, for p = r = q = 2, Theorem 3 of [1] follows directly from our Theorem 3.
3. Lower Bound
In this part, we will provide a lower bound estimation, which shows Theorem 3 to be the best possible in some sense. The following lemmas are needed in the proof of our main theorem of this section.
Lemma 5. Let hη(x): = ηp(ηx) with p(x) = 1/π(1 + x2), η > 0, and r, q ≥ 1. Then for L > 0 (L > 2 when r = 1), there exists η0 > 0 such that . If ψ is the Meyer wavelet function and |λk| ≤ d2−j/2 (k = 1,2, …, 2j), then, for some small d > 0,
Proof. It is easy to see that (for r ≥ 1) and by the definition of Besov space. Since
If |λk| ≤ d2−j/2 (k = 1,2, …, 2j), then because ψ is the Meyer function. Note that − 2−jk|2) −1 ≤ . Then for some small d > 0 and |x | ≥ 2,
The next lemma extends an estimate in the proof of Theorem 1 in [3].
Lemma 6. Let ψ be the Meyer wavelet function, h0(x), defined as in Lemma 5. If φ satisfies (C1), (C2), and ωk ∈ {0,1}, then
Proof. As shown in proof of Theorem 1 of [3], one finds easily that (h0*φ)(y)≳(1 + y2) −1 and therefore
To estimate , one proves an inequality:
Define . Then q, q′ ∈ L(ℝ) and q is locally absolutely continuous. Therefore, and
Two more classical theorems play important roles in our discussions. We list the first one as Lemma 7, which can be found in [7].
Lemma 7 (Varshamov-Gilbert). Let Ω = {ω = (ω1, …, ωm), ωk ∈ {0,1}} with m ≥ 8. Then there exists a subset {ω(0), …, ω(M)} of Ω such that M ≥ 2m/8, ω(0) = (0, …, 0), and for j, l = 0,1, …, M, j ≠ l,
Lemma 8 (Fano). Let (𝕏, ℱ, Pk) be probability measurable spaces and Ak ∈ ℱ, k = 0,1, …, m. If Ak∩Av = ∅ for k ≠ v, then
Now, we are in the position to state the main theorem in this section.
Theorem 9. Let φ satisfy (C1) and (C2), and let fn(·): = fn(Y1, Y2, …, Yn, ·) be an estimator of . Then for s > 0, p ∈ [1, ∞), q, r ∈ [1, ∞], and s ≥ 1/r, there exists C > 0 independent of fn such that with s′ : = s − (1/r − 1/p) +,
Proof. Assume that ψ is the Meyer wavelet function, then . By Lemma 2,
By Lemma 7, one finds with and such that for ω ≠ ω′ and , . It is easy to see that
According to Lemma 5, hω(x)≲h0(x) and . Moreover, = . Since hω*φ/h0 * φ > 0, . Combining this with ln (1 + x) ≤ x(x > −1), one knows
Note that and take j such that
When r < p, s′ = s − (1/r − 1/p) + = s − 1/r + 1/p, it remains to show
Define . Then Ak∩Av = ∅ (k ≠ v) and
Remark 10. By Theorems 9 and 3, the linear wavelet estimator is optimal for a density in Besov spaces with severely ill-posed noise. Therefore, we do not need to consider nonlinear wavelet estimations in that case. This contrasts sharply with moderately ill-posed noise case under which nonlinear wavelet estimation improves the linear one [2, 4].
4. Concluding Remarks
Compare with the proof of Theorem 1 in [3], the arguments of Theorem 9 are more complicated in the sense that we use Varshmov-Gilbert Lemma (Lemma 7). It is reasonable because we deal with unmatched estimation (p and r may not be equal), while they do the matched case .
Although the Shannon function ϕS(t) = sinπt/πt is much simpler than the Meyer′s, it cannot be used in our discussion because the Shannon’s does not belong to L(ℝ), while our theorems cover the case for p = 1.
Finally, it should be pointed out that we assume the independence of observations Y1, Y2, …, Yn in this paper. However, some dependent data are more important (of course, more complicated) in practice. We will investigate that case in future.
Acknowledgment
This paper is supported by the National Natural Science Foundation of China (no. 11271038).