Motivated by Lounici and Nickl′s work (2011), this paper considers the problem of estimation of a density f based on an independent and identically distributed sample Y₁, …, Y_n from g = f*φ. We show a wavelet optimal estimation for a density (function) over Besov ball and L^p risk (1 ≤ p < ∞) in the presence of severely ill-posed noises. A wavelet linear estimation is firstly presented. Then, we prove a lower bound, which shows our wavelet estimator optimal. In other words, nonlinear wavelet estimations are not needed in that case. It turns out that our results extend some theorems of Pensky and Vidakovic (1999), as well as Fan and Koo (2002).

1. Introduction and Preliminary

Wavelets have made great achievements in studying the statistical model Y = X + ϵ, where X stands for real-valued random variable with unknown probability density f, and ϵ denotes an independent random noise (error) with density φ.

In 1999, Pensky and Vidakovic [1] investigate Meyer wavelet estimation over Sobolev spaces and L² risk under moderately and severely ill-posed noises. Three years later, Fan and Koo [2] extend those works from to Besov spaces . It should be pointed out that, by using different method, Lounici and Nickl [3] study wavelet optimal estimation over Besov spaces and L^∞ risk under both noises. In [4], we provide a wavelet optimal estimation over and L^p risk (1 ≤ p < ∞, r, q ∈ [1, ∞]) under moderately ill-posed noise. This current paper deals with the same problem under the severely ill-posed noises. It turns out that our result contains some theorems of [1, 2] as special cases. Our discussion also shows that nonlinear wavelet estimations are not needed for severely ill-posed noise, which is totally different with moderately ill-posed case.

Let ϕ and ψ ∈ L²(ℝ) be a scaling and mother wavelet function, respectively. Then each f ∈ L²(ℝ) has an expansion (in L² sense):

()

with α_jk : = 〈f, ϕ_jk〉 and β_jk : = 〈f, ψ_jk〉. Here and throughout, we use the standard notation h_jk(x): = 2^j/2h(2^jx − k) in wavelet analysis [5]. A class of important wavelets are Meyer′s, whose Fourier transforms are C^∞ and compactly supported on {t : 2π/3≤|t | ≤ 8π/3} [5]. It is easy to see that ∀a ≥ 0, ∃C_a > 0 such that |x|^a | ϕ(x)| ≤ C_a. In this paper, the Fourier transform

for f ∈ L(ℝ) is defined by

()

The classical method extends that definition to L²(ℝ) functions.

The following two lemmas are fundamental in our discussions. We use ∥f∥_p to denote L^p(ℝ) norm of f ∈ L^p(ℝ), and ∥λ∥_p do l^p(ℤ) norm of λ ∈ l^p(ℤ), where

()

Lemma 1 (see [6].)Let h be a scaling or a wavelet function with sup _x∈ℝ ∑_k∈ℤ|h(x − k)| < ∞. Then, there exist C₂ ≥ C₁ > 0 such that for λ = {λ_k} ∈ l^p(ℤ) with 1 ≤ p ≤ ∞,

()

One of the advantages of wavelet bases is that they can characterize Besov spaces. To introduce those spaces [6], we need the well-known Sobolev spaces with integer exponents

()

with the Sobolev norm

. Then L^p(ℝ) can be considered as

. For 1 ≤ p, q ≤ ∞ and s = m + α with α ∈ (0,1], a Besov space is defined by

()

with the norm

, where

denotes the smoothness modulus of f and

()

Lemma 2 (see [6].)Let ϕ be a Meyer scaling function and ψ be the corresponding wavelet. If f ∈ L^r(ℝ), 1 ≤ r ≤ ∞, α_0k = ∫ f(x)ϕ_0k(x)dx, and β_jk = ∫ f(x)ψ_jk(x)dx, then the following assertions are equivalent:

(i)
;
(ii)
, where P_jf : = ∑_k∈ℤα_jkϕ_jk;
(iii)
.

In each case,

()

Here and after, A≲B denotes A ≤ CB for some constant C > 0; A ≳ B means B≲A; A ~ B stands for both A≲B and B≲A, α_0· does for the sequence of {α_0k, k ∈ ℤ}.

At the end of this subsection, we make some assumptions on noise density φ, which will be dealt with in this current paper. For α > 0, c > 0, β ∈ ℝ,

(C1)
;
(C2)
;
(C3)
.

Clearly, the classical Cauchy densities satisfy (C1)–(C3) with α = c = 1 and β = 0, and the Gaussian density does satisfy (C1)–(C3) with α = 2, c = 1/2, and β = 0. It should be pointed out that those above conditions (C1)–(C3) are a little different with [2].

In the next section, we define a wavelet linear estimator and provide an upper bound estimation over Besov spaces and L^p risk under the condition (C3); the third part gives a lower bound estimation which shows the result of Section 2 optimal; some concluding remarks are discussed in the last part.

2. Upper Bound

To introduce the main theorem of this section, we assume that Y₁, Y₂, …, Y_n are independent and identically distributed (i.i.d) random variables of Y = X + ϵ, the density φ of random noise ϵ satisfies condition (C3), and ϕ stands for the Meyer scaling function. As in [1], define

()

as well as a linear wavelet estimator

()

(the positive integer K_n will be given later on). Then

is well defined, and

We use supp f to stand for the support of f and |supp f| to do its length. Moreover, for L > 0, denote

()

It is reasonable to assume L > 1 for r = 1, since ∥f∥_s1q ≥ ∥f∥₁ = 1 in that case.

Theorem 3. Let φ satisfy (C3) and ϕ be the Meyer scaling function. If p ∈ [1, ∞), q, r ∈ [1, ∞], , then, with , (3/8π)((ln n)/4c) ^1/α < 2^j ≤ (3/4π)((ln n)/4c) ^1/α, s^′ : = s − (1/r − 1/p) ₊, and x₊ = max {x, 0},

()

In particular,

can be replaced by

, when r ≤ p.

Proof. When r ≤ p, s^′ : = s − (1/r − 1/p) ₊ = s − 1/r + 1/p. Since l^r is continuously embedded to l^p, Lemma 2 implies . Hence,

()

When r > p, one obtains that, for some C > 0,

and

()

In fact,

and Hölder inequality imply that

due to |supp f| ≤ M. By the definition of Besov norm, ∥f∥_spq ≤ C∥f∥_srq. According to (13) and (14), it is sufficient to prove

()

for the conclusion of Theorem 3.

Recall that and . Then

()

and Lemma 2,

()

To estimate the middle term of (16), one observes that α_jk = ∫_ℝ ϕ_jk(x)f(x)dx, |k²α_jk| ≤ ∫_ℝ | 2^jx − k − 2^jx|² | ϕ_jk(x) | f(x)dx≲∫_ℝ | 2^jx − k|² | ϕ_jk(x) | f(x)dx + ∫_ℝ | 2^jx|² | ϕ_jk(x) | f(x)dx. Since ϕ is the Meyer scaling function, sup _x∈ℝ | x|² | ϕ(x)| < ∞ and

()

On the other hand,

with 1/μ + 1/μ^′ = 1. Therefore

and

≲

. This with Lemma 1 leads to

()

Now, it remains to consider : Using and Lemma 1, one knows

()

Clearly,

. Define X_i,k : = (K_jϕ) _jk(Y_i) − E(K_jϕ) _jk(Y_i). Then EX_i,k = 0 and

. To apply Rosenthal′s inequality (Proposition 10.2, [6]), one estimates |X_i,k| and E | X_i,k|^p: note that |(K_jϕ) _jk(Y_i)| = 2^j/2 | (1/2π) ×

due to (C3) and

. Then

()

Because X_i,k are i.i.d, the Rosenthal′s inequality tells that

()

This with (21) implies that, for p ≥ 2,

> 2} +

. Moveover, (20) reduces to

()

Then it follows from (16)–(19) and (23) that

()

By the choices

and (3/8π)((ln n)/4c) ^1/α < 2^j ≤ (3/4π)((ln n)/4c) ^1/α (stated in Theorem 3), one receives that 2^−jps≲(ln n) ^−(s/α)p,

()

Finally, the desired conclusion (15) follows.

Remark 4. Note that the choices of j and K_n do not depend on the unknown parameters s, r, and q. Then our linear wavelet estimator over Besov space is adaptive or implementable. The same conclusion holds for L^∞ and L² estimations; see Theorem 2 in [3] and Corollary 1 in [1]. On the other hand, when p = 2 and 1 ≤ r ≤ 2, our Theorem 3 reduces to Theorem 4 in [2]; from the proof of Theorem 3, we find that, for p > 1, the assumption can be replaced by ∥xf(x)∥_∞ ≤ A, which is the same as in [1]. Therefore, for p = r = q = 2, Theorem 3 of [1] follows directly from our Theorem 3.

3. Lower Bound

In this part, we will provide a lower bound estimation, which shows Theorem 3 to be the best possible in some sense. The following lemmas are needed in the proof of our main theorem of this section.

Lemma 5. Let h_η(x): = ηp(ηx) with p(x) = 1/π(1 + x²), η > 0, and r, q ≥ 1. Then for L > 0 (L > 2 when r = 1), there exists η₀ > 0 such that . If ψ is the Meyer wavelet function and |λ_k| ≤ d2^−j/2 (k = 1,2, …, 2^j), then, for some small d > 0,

()

Proof. It is easy to see that (for r ≥ 1) and by the definition of Besov space. Since

()

where ⌊s⌋ denotes the largest integer no more than

can be made small enough by choosing small η₀ > 0, when r > 1. Clearly, L > 2 is needed, when r = 1.

If |λ_k| ≤ d2^−j/2 (k = 1,2, …, 2^j), then because ψ is the Meyer function. Note that − 2^−jk|²) ⁻¹ ≤ . Then for some small d > 0 and |x | ≥ 2,

()

Hence, (26) holds for |x | ≥ 2. On the other hand, when |x | < 2, h₀(x) ≳ 1 and

≤

. Therefore, (26) is true, when d > 0 small enough. This completes the proof of Lemma 5.

The next lemma extends an estimate in the proof of Theorem 1 in [3].

Lemma 6. Let ψ be the Meyer wavelet function, h₀(x), defined as in Lemma 5. If φ satisfies (C1), (C2), and ω_k ∈ {0,1}, then

()

Proof. As shown in proof of Theorem 1 of [3], one finds easily that (h₀*φ)(y)≳(1 + y²) ⁻¹ and therefore

()

By Parseval identity, (C1) and

. Moreover, the orthonormality of

concludes that

()

To estimate , one proves an inequality:

()

Note that

, and

. Then

+ k) ²ψ(x)ψ[x − (k^′ − k)]dx and

()

Since 〈ψ(x), ψ(x − k)〉 = δ_k,0,

; On the other hand, the boundedness of ∫_ℝ x²ψ(x)ψ(x + l)dx and ∫_ℝ xψ(x)ψ(x + l)dx implies that

()

as well as

. Hence,

, which reaches (32).

Define . Then q, q^′ ∈ L(ℝ) and q is locally absolutely continuous. Therefore, and

()

Clearly,

and ∫_ℝ | q^′(t)|²dt ≤

thanks to (C1), (C2), and

()

Moreover,

because of (32) and the orthonormality of ψ_jk. This with (35), (31), and (30) leads to the desired conclusion of Lemma 6.

Two more classical theorems play important roles in our discussions. We list the first one as Lemma 7, which can be found in [7].

Lemma 7 (Varshamov-Gilbert). Let Ω = {ω = (ω₁, …, ω_m), ω_k ∈ {0,1}} with m ≥ 8. Then there exists a subset {ω⁽⁰⁾, …, ω^(M)} of Ω such that M ≥ 2^m/8, ω⁽⁰⁾ = (0, …, 0), and for j, l = 0,1, …, M, j ≠ l,

()

Given two probability measures P and Q on a measurable space (𝕏, ℱ), the Kullback divergence of P and Q is defined by

()

Here, P ≪ Q stands for P absolutely continuous with respect to Q. In that case, K(P, Q) = ∫ln (f_P(x)/f_Q(x))f_P(x)dx, where the function f_P(x) denotes the density function of P. The second classical theorem is taken from [8].

Lemma 8 (Fano). Let (𝕏, ℱ, P_k) be probability measurable spaces and A_k ∈ ℱ, k = 0,1, …, m. If A_k∩A_v = ∅ for k ≠ v, then

()

where

, and A^c denotes the complement of a set A.

Now, we are in the position to state the main theorem in this section.

Theorem 9. Let φ satisfy (C1) and (C2), and let f_n(·): = f_n(Y₁, Y₂, …, Y_n, ·) be an estimator of . Then for s > 0, p ∈ [1, ∞), q, r ∈ [1, ∞], and s ≥ 1/r, there exists C > 0 independent of f_n such that with s^′ : = s − (1/r − 1/p) ₊,

()

Proof. Assume that ψ is the Meyer wavelet function, then . By Lemma 2,

()

for ω_k ∈ {0,1}. Furthermore, with the function h₀ defined in Lemma 5, there exists c₁ > 0 such that

and

due to that Lemma. Define

()

Then ∫ h_ω(x)dx = 1 because ∫ ψ(x)dx = 0 and ∫ h₀(x)dx = 1.

By Lemma 7, one finds with and such that for ω ≠ ω^′ and , . It is easy to see that

()

This with Lemma 1 leads to

, and therefore

()

Define

for h_ω ∈ Λ^′. Then

, when ω ≠ ω^′. Clearly, h_ω*φ is a density function because both h_ω and φ are density functions. Let

be the probability measure on the Lebesgue space (ℝⁿ, ℒ) with the density

. Then Lemma 8 tells that

()

According to Lemma 5, h_ω(x)≲h₀(x) and . Moreover, = . Since h_ω*φ/h₀ * φ > 0, . Combining this with ln (1 + x) ≤ x(x > −1), one knows

()

Because

, the above inequality reduces to

thanks to Lemma 6. Hence,

()

Note that and take j such that

()

Then

(choose c₁ > 0 small enough). Furthermore, (45) reduces to

()

Hence,

. This with (44) and (48) leads to

()

which is the desired conclusion of Theorem 9, when r ≥ p (s^′ = s in that case).

When r < p, s^′ = s − (1/r − 1/p) ₊ = s − 1/r + 1/p, it remains to show

()

Similar to the proof of (50), one takes small c₂ > 0 such that

()

satisfies h_k(x) ≥ 0,

and ∫ h_k(x)dx = 1. Clearly,

and

for 1 ≤ k ≠ k^′ ≤ 2^j. Since ψ is the Meyer wavelet function, inf _k≠0∥ψ(·)−ψ(·−k)∥_p > 0 and

()

Define . Then A_k∩A_v = ∅ (k ≠ v) and

()

due to Lemma 8. Similar (even simpler) arguments to the estimation of

show

≲

. Taking j as in (48), one receives that

()

and

by choosing small c₂ > 0. Thus (54) reduces to

()

Moreover,

≥

. This with (53) and (48) leads to (51). This completes the proof of Theorem 9.

Remark 10. By Theorems 9 and 3, the linear wavelet estimator is optimal for a density in Besov spaces with severely ill-posed noise. Therefore, we do not need to consider nonlinear wavelet estimations in that case. This contrasts sharply with moderately ill-posed noise case under which nonlinear wavelet estimation improves the linear one [2, 4].

Remark 11. When p = 2 and 1 ≤ r ≤ 2, our Theorem 9 is better than Theorem 6 in [2], because . Moreover, Theorems 9 and 3 lead to Theorem 3 in that paper for p = 2 and 1 ≤ r ≤ 2. In addition, our conditions (C1) and (C2) are a little weaker than the assumptions in [2].

4. Concluding Remarks

This paper provides an L^p (1 ≤ p < ∞) risk upper bound for a linear wavelet estimator

(Theorem 3), which turns out to be optimal (Theorem 9). Therefore, nonlinear estimations are not needed under severely ill-posed noises. Although we assume p < ∞ in Theorem 9, the proof of that theorem shows that, for p = ∞,

()

In particular, when r = q = ∞, this above estimation reduces to partial result of Theorem 1 in [3].

Note that our model assumes the noise to be severely ill-posed; that is, the density φ of noise ϵ satisfies

. Then it is reasonable to choose the Meyer scaling function as ϕ because the compact supportness of

makes K_jϕ well defined, where

()

Compare with the proof of Theorem 1 in [3], the arguments of Theorem 9 are more complicated in the sense that we use Varshmov-Gilbert Lemma (Lemma 7). It is reasonable because we deal with unmatched estimation (p and r may not be equal), while they do the matched case .

Although the Shannon function ϕ^S(t) = sinπt/πt is much simpler than the Meyer′s, it cannot be used in our discussion because the Shannon’s does not belong to L(ℝ), while our theorems cover the case for p = 1.

Finally, it should be pointed out that we assume the independence of observations Y₁, Y₂, …, Y_n in this paper. However, some dependent data are more important (of course, more complicated) in practice. We will investigate that case in future.

Acknowledgment

This paper is supported by the National Natural Science Foundation of China (no. 11271038).

References

1 Pensky M. and Vidakovic B., Adaptive wavelet estimator for nonparametric density deconvolution, The Annals of Statistics. (1999) 27, no. 6, 2033–2053, https://doi.org/10.1214/aos/1017939249, MR1765627, ZBL0962.62030.
10.1214/aos/1017939249
Web of Science® Google Scholar
2 Fan J. and Koo J.-Y., Wavelet deconvolution, IEEE Transactions on Information Theory. (2002) 48, no. 3, 734–747, https://doi.org/10.1109/18.986021, MR1889978, ZBL1071.94511.
10.1109/18.986021
Web of Science® Google Scholar
3 Lounici K. and Nickl R., Global uniform risk bounds for wavelet deconvolution estimators, The Annals of Statistics. (2011) 39, no. 1, 201–231, https://doi.org/10.1214/10-AOS836, MR2797844, ZBL1209.62060.
10.1214/10-AOS836
Web of Science® Google Scholar
4 Li R. and Liu Y., Wavelet optimal estimations for a density with some additive noises, Applied and Computational Harmonic Analysis. (2013) https://doi.org/10.1016/j.acha.2013.07.002.
10.1016/j.acha.2013.07.002
PubMed Web of Science® Google Scholar
5 Daubechies I., Ten Lectures on Wavelets, 1992, Society for Industrial and Applied Mathematics, Philadelphia, Pa, USA, https://doi.org/10.1137/1.9781611970104, MR1162107.
10.1137/1.9781611970104
Google Scholar
6 Hardle W., Kerkyacharian G., Picard D., and Tsybakov A., Wavelets, Approximation and Statistical Applications, 1997, Springer, New York, NY, USA.
Google Scholar
7 Tsybakov A. B., Introduction to Nonparametric Estimation, 2009, Springer, Berlin, Germany, https://doi.org/10.1007/b13794, MR2724359.
10.1007/b13794
Web of Science® Google Scholar
8 DeVore R., Kerkyacharian G., Picard D., and Temlyakov V., Approximation methods for supervised learning, Foundations of Computational Mathematics. (2006) 6, no. 1, 3–58, https://doi.org/10.1007/s10208-004-0158-6, MR2198214, ZBL1146.62322.
10.1007/s10208-004-0158-6
Web of Science® Google Scholar

Citing Literature

All articles

Wavelet Optimal Estimations for Density Functions under Severely Ill-Posed Noises

Abstract

1. Introduction and Preliminary

2. Upper Bound

3. Lower Bound

4. Concluding Remarks

Acknowledgment

References

Citing Literature

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

Wavelet Optimal Estimations for Density Functions under Severely Ill-Posed Noises

Abstract

1. Introduction and Preliminary

2. Upper Bound

3. Lower Bound

4. Concluding Remarks

Acknowledgment

References

Citing Literature

References

Related

Information