Volume 2013, Issue 1 260573
Research Article
Open Access

Wavelet Optimal Estimations for Density Functions under Severely Ill-Posed Noises

Rui Li

Rui Li

Department of Applied Mathematics, Beijing University of Technology, Pingle Yuan 100, Beijing 100124, China

Search for more papers by this author
Youming Liu

Corresponding Author

Youming Liu

Department of Applied Mathematics, Beijing University of Technology, Pingle Yuan 100, Beijing 100124, China

Search for more papers by this author
First published: 12 December 2013
Citations: 4
Academic Editor: Ding-Xuan Zhou

Abstract

Motivated by Lounici and Nickl′s work (2011), this paper considers the problem of estimation of a density f based on an independent and identically distributed sample Y1, …, Yn from g = f*φ. We show a wavelet optimal estimation for a density (function) over Besov ball and Lp risk (1 ≤ p < ) in the presence of severely ill-posed noises. A wavelet linear estimation is firstly presented. Then, we prove a lower bound, which shows our wavelet estimator optimal. In other words, nonlinear wavelet estimations are not needed in that case. It turns out that our results extend some theorems of Pensky and Vidakovic (1999), as well as Fan and Koo (2002).

1. Introduction and Preliminary

Wavelets have made great achievements in studying the statistical model Y = X + ϵ, where X stands for real-valued random variable with unknown probability density f, and ϵ denotes an independent random noise (error) with density φ.

In 1999, Pensky and Vidakovic [1] investigate Meyer wavelet estimation over Sobolev spaces and L2 risk under moderately and severely ill-posed noises. Three years later, Fan and Koo [2] extend those works from to Besov spaces . It should be pointed out that, by using different method, Lounici and Nickl [3] study wavelet optimal estimation over Besov spaces and L risk under both noises. In [4], we provide a wavelet optimal estimation over and Lp risk (1 ≤ p < ,   r, q ∈ [1, ]) under moderately ill-posed noise. This current paper deals with the same problem under the severely ill-posed noises. It turns out that our result contains some theorems of [1, 2] as special cases. Our discussion also shows that nonlinear wavelet estimations are not needed for severely ill-posed noise, which is totally different with moderately ill-posed case.

Let ϕ and ψL2() be a scaling and mother wavelet function, respectively. Then each fL2() has an expansion (in L2 sense):
()
with αjk : = 〈f, ϕjk〉 and βjk : = 〈f, ψjk〉. Here and throughout, we use the standard notation hjk(x): = 2j/2h(2jxk) in wavelet analysis [5]. A class of important wavelets are Meyer′s, whose Fourier transforms are C and compactly supported on {t : 2π/3≤|t | ≤ 8π/3} [5]. It is easy to see that ∀a ≥ 0, ∃Ca > 0 such that |x|a | ϕ(x)| ≤ Ca. In this paper, the Fourier transform for fL() is defined by
()
The classical method extends that definition to L2() functions.
The following two lemmas are fundamental in our discussions. We use ∥fp to denote Lp() norm of fLp(), and ∥λp do lp() norm of λlp(), where
()

Lemma 1 (see [6].)Let h be a scaling or a wavelet function with sup x ∑k|h(xk)| < . Then, there exist C2C1 > 0 such that for λ = {λk} ∈ lp() with 1 ≤ p,

()

One of the advantages of wavelet bases is that they can characterize Besov spaces. To introduce those spaces [6], we need the well-known Sobolev spaces with integer exponents
()
with the Sobolev norm . Then Lp() can be considered as . For 1 ≤ p, q and s = m + α with α ∈ (0,1], a Besov space is defined by
()
with the norm , where denotes the smoothness modulus of f and
()

Lemma 2 (see [6].)Let ϕ be a Meyer scaling function and ψ be the corresponding wavelet. If fLr(), 1 ≤ r, α0k = ∫ f(x)ϕ0k(x)dx, and βjk = ∫ f(x)ψjk(x)dx, then the following assertions are equivalent:

  • (i)

    ;

  • (ii)

    , where Pjf : = ∑kαjkϕjk;

  • (iii)

    .

In each case,
()
Here and after, AB denotes ACB for some constant C > 0; AB means BA; A ~ B stands for both AB and BA, α does for the sequence of {α0k,   k}.
At the end of this subsection, we make some assumptions on noise density φ, which will be dealt with in this current paper. For α > 0, c > 0, β,
  • (C1)

    ;

  • (C2)

    ;

  • (C3)

    .

Clearly, the classical Cauchy densities satisfy (C1)–(C3) with α = c = 1 and β = 0, and the Gaussian density does satisfy (C1)–(C3) with α = 2, c = 1/2, and β = 0. It should be pointed out that those above conditions (C1)–(C3) are a little different with [2].

In the next section, we define a wavelet linear estimator and provide an upper bound estimation over Besov spaces and Lp risk under the condition (C3); the third part gives a lower bound estimation which shows the result of Section 2 optimal; some concluding remarks are discussed in the last part.

2. Upper Bound

To introduce the main theorem of this section, we assume that Y1, Y2, …, Yn are independent and identically distributed (i.i.d) random variables of Y = X + ϵ, the density φ of random noise ϵ satisfies condition (C3), and ϕ stands for the Meyer scaling function. As in [1], define
()
as well as a linear wavelet estimator
()
(the positive integer Kn will be given later on). Then , is well defined, and .
We use supp f to stand for the support of f and |supp  f| to do its length. Moreover, for L > 0, denote
()
It is reasonable to assume L > 1 for r = 1, since ∥fs1q ≥ ∥f1 = 1 in that case.

Theorem 3. Let φ satisfy (C3) and ϕ be the Meyer scaling function. If p ∈ [1, ), q, r ∈ [1, ], , then, with , (3/8π)((ln n)/4c) 1/α   <   2j ≤ (3/4π)((ln n)/4c) 1/α, s : = s − (1/r − 1/p) +, and x+ = max {x, 0},

()
In particular, can be replaced by , when rp.

Proof. When rp, s : = s − (1/r − 1/p) + = s − 1/r + 1/p. Since lr is continuously embedded to lp, Lemma 2 implies . Hence,

()
When r > p, one obtains that, for some C > 0, and
()
In fact, and Hölder inequality imply that due to |supp  f| ≤ M. By the definition of Besov norm, ∥fspqCfsrq. According to (13) and (14), it is sufficient to prove
()
for the conclusion of Theorem 3.

Recall that and . Then

()
By and Lemma 2,
()
To estimate the middle term of (16), one observes that αjk = ∫ϕjk(x)f(x)dx, |k2αjk|   ≤   ∫  | 2jxk − 2jx|2 | ϕjk(x) | f(x)dx≲∫  | 2jxk|2 | ϕjk(x) | f(x)dx + ∫  | 2jx|2 | ϕjk(x) | f(x)dx. Since ϕ is the Meyer scaling function, sup x | x|2 | ϕ(x)| < and
()
On the other hand, with 1/μ   +   1/μ = 1. Therefore and   ≲    ≲  . This with Lemma 1 leads to
()

Now, it remains to consider : Using and Lemma 1, one knows

()
Clearly, . Define Xi,k : = (Kjϕ) jk(Yi) − E(Kjϕ) jk(Yi). Then EXi,k = 0 and . To apply Rosenthal′s inequality (Proposition 10.2, [6]), one estimates |Xi,k| and E | Xi,k|p: note that |(Kjϕ) jk(Yi)| = 2j/2 | (1/2π)   ×    due to (C3) and . Then
()

Because Xi,k are i.i.d, the Rosenthal′s inequality tells that

()
This with (21) implies that, for p ≥ 2,    >   2}   +   . Moveover, (20) reduces to
()
Then it follows from (16)–(19) and (23) that
()
By the choices and (3/8π)((ln n)/4c) 1/α < 2j ≤ (3/4π)((ln n)/4c) 1/α (stated in Theorem 3), one receives that 2jps≲(ln n) −(s/α)p,
()
Finally, the desired conclusion (15) follows.

Remark 4. Note that the choices of j and Kn do not depend on the unknown parameters s, r, and q. Then our linear wavelet estimator over Besov space is adaptive or implementable. The same conclusion holds for L and L2 estimations; see Theorem 2 in [3] and Corollary 1 in [1]. On the other hand, when p = 2 and 1 ≤ r ≤ 2, our Theorem 3 reduces to Theorem 4 in [2]; from the proof of Theorem 3, we find that, for p > 1, the assumption can be replaced by ∥xf(x)∥A, which is the same as in [1]. Therefore, for p = r = q = 2, Theorem 3 of [1] follows directly from our Theorem 3.

3. Lower Bound

In this part, we will provide a lower bound estimation, which shows Theorem 3 to be the best possible in some sense. The following lemmas are needed in the proof of our main theorem of this section.

Lemma 5. Let hη(x): = ηp(ηx) with p(x) = 1/π(1 + x2), η > 0, and r, q ≥ 1. Then for L > 0 (L > 2 when r = 1), there exists η0 > 0 such that . If ψ is the Meyer wavelet function and |λk| ≤ d2j/2  (k = 1,2, …, 2j), then, for some small d > 0,

()

Proof. It is easy to see that (for r ≥ 1) and by the definition of Besov space. Since

()
where ⌊s⌋ denotes the largest integer no more than can be made small enough by choosing small η0 > 0, when r > 1. Clearly, L > 2 is needed, when r = 1.

If |λk| ≤ d2j/2  (k = 1,2, …, 2j), then because ψ is the Meyer function. Note that    −   2jk|2) −1   ≤   . Then for some small d > 0 and |x | ≥ 2,

()
Hence, (26) holds for |x | ≥ 2. On the other hand, when |x | < 2, h0(x) ≳ 1 and    ≤   . Therefore, (26) is true, when d > 0 small enough. This completes the proof of Lemma 5.

The next lemma extends an estimate in the proof of Theorem 1 in [3].

Lemma 6. Let ψ be the Meyer wavelet function, h0(x), defined as in Lemma 5. If φ satisfies (C1), (C2), and ωk ∈ {0,1}, then

()

Proof. As shown in proof of Theorem 1 of [3], one finds easily that (h0*φ)(y)≳(1 + y2) −1 and therefore

()
By Parseval identity, (C1) and ,    =   . Moreover, the orthonormality of concludes that
()

To estimate , one proves an inequality:

()
Note that ,  , and    =   . Then    ×      +   k) 2ψ(x)ψ[x − (kk)]dx and
()
Since 〈ψ(x), ψ(xk)〉 = δk,0, ; On the other hand, the boundedness of ∫x2ψ(x)ψ(x + l)dx and ∫xψ(x)ψ(x + l)dx implies that
()
as well as . Hence, , which reaches (32).

Define . Then q, qL() and q is locally absolutely continuous. Therefore, and

()
Clearly, and ∫  | q(t)|2dt   ≤     thanks to (C1), (C2), and
()
Moreover, because of (32) and the orthonormality of ψjk. This with (35), (31), and (30) leads to the desired conclusion of Lemma 6.

Two more classical theorems play important roles in our discussions. We list the first one as Lemma 7, which can be found in [7].

Lemma 7 (Varshamov-Gilbert). Let Ω = {ω = (ω1, …, ωm),   ωk ∈ {0,1}} with m ≥ 8. Then there exists a subset {ω(0), …, ω(M)} of Ω such that M ≥ 2m/8, ω(0) = (0, …, 0), and for j, l = 0,1, …, M, jl,

()

Given two probability measures P and Q on a measurable space (𝕏, ), the Kullback divergence of P and Q is defined by
()
Here, PQ stands for P absolutely continuous with respect to Q. In that case, K(P, Q) = ∫ln (fP(x)/fQ(x))fP(x)dx, where the function fP(x) denotes the density function of P. The second classical theorem is taken from [8].

Lemma 8 (Fano). Let (𝕏, , Pk) be probability measurable spaces and Ak,  k = 0,1, …, m. If AkAv = for kv, then

()
where , , and Ac denotes the complement of a set A.

Now, we are in the position to state the main theorem in this section.

Theorem 9. Let φ satisfy (C1) and (C2), and let fn(·): = fn(Y1, Y2, …, Yn, ·) be an estimator of . Then for s > 0, p ∈ [1, ), q, r ∈ [1, ], and s ≥ 1/r, there exists C > 0 independent of fn such that with s : = s − (1/r − 1/p) +,

()

Proof. Assume that ψ is the Meyer wavelet function, then . By Lemma 2,

()
for ωk ∈ {0,1}. Furthermore, with the function h0 defined in Lemma 5, there exists c1 > 0 such that and due to that Lemma. Define
()
Then ∫ hω(x)dx = 1 because ∫ ψ(x)dx = 0 and ∫ h0(x)dx = 1.

By Lemma 7, one finds with and such that for ωω and , . It is easy to see that

()
This with Lemma 1 leads to , and therefore
()
Define for hω ∈ Λ. Then , when ωω. Clearly, hω*φ is a density function because both hω and φ are density functions. Let be the probability measure on the Lebesgue space (n, ) with the density . Then Lemma 8 tells that
()

According to Lemma 5, hω(x)≲h0(x) and . Moreover,    =   . Since hω*φ/h0  *  φ > 0, . Combining this with ln (1 + x) ≤ x(x > −1), one knows

()
Because , the above inequality reduces to thanks to Lemma 6. Hence,
()

Note that and take j such that

()
Then (choose c1 > 0 small enough). Furthermore, (45) reduces to
()
Hence, . This with (44) and (48) leads to
()
which is the desired conclusion of Theorem 9, when rp (s = s in that case).

When r < p, s = s − (1/r − 1/p) + = s − 1/r + 1/p, it remains to show

()
Similar to the proof of (50), one takes small c2 > 0 such that
()
satisfies hk(x) ≥ 0, and ∫ hk(x)dx = 1. Clearly, and for 1 ≤ kk ≤ 2j. Since ψ is the Meyer wavelet function, inf k≠0ψ(·)−ψ(·−k)∥p > 0 and
()

Define . Then AkAv =   (kv) and

()
due to Lemma 8. Similar (even simpler) arguments to the estimation of show   ≲  . Taking j as in (48), one receives that
()
and by choosing small c2 > 0. Thus (54) reduces to
()
Moreover,    ≥      ≥   . This with (53) and (48) leads to (51). This completes the proof of Theorem 9.

Remark 10. By Theorems 9 and 3, the linear wavelet estimator is optimal for a density in Besov spaces with severely ill-posed noise. Therefore, we do not need to consider nonlinear wavelet estimations in that case. This contrasts sharply with moderately ill-posed noise case under which nonlinear wavelet estimation improves the linear one [2, 4].

Remark 11. When p = 2 and 1 ≤ r ≤ 2, our Theorem 9 is better than Theorem 6 in [2], because . Moreover, Theorems 9 and 3 lead to Theorem 3 in that paper for p = 2 and 1 ≤ r ≤ 2. In addition, our conditions (C1) and (C2) are a little weaker than the assumptions in [2].

4. Concluding Remarks

This paper provides an Lp  (1 ≤ p < ) risk upper bound for a linear wavelet estimator (Theorem 3), which turns out to be optimal (Theorem 9). Therefore, nonlinear estimations are not needed under severely ill-posed noises. Although we assume p < in Theorem 9, the proof of that theorem shows that, for p = ,
()
In particular, when r = q = , this above estimation reduces to partial result of Theorem 1 in [3].
Note that our model assumes the noise to be severely ill-posed; that is, the density φ of noise ϵ satisfies . Then it is reasonable to choose the Meyer scaling function as ϕ because the compact supportness of makes Kjϕ well defined, where
()

Compare with the proof of Theorem 1 in [3], the arguments of Theorem 9 are more complicated in the sense that we use Varshmov-Gilbert Lemma (Lemma 7). It is reasonable because we deal with unmatched estimation (p and r may not be equal), while they do the matched case .

Although the Shannon function ϕS(t) = sinπt/πt is much simpler than the Meyer′s, it cannot be used in our discussion because the Shannon’s does not belong to L(), while our theorems cover the case for p = 1.

Finally, it should be pointed out that we assume the independence of observations Y1, Y2, …, Yn in this paper. However, some dependent data are more important (of course, more complicated) in practice. We will investigate that case in future.

Acknowledgment

This paper is supported by the National Natural Science Foundation of China (no. 11271038).

      The full text of this article hosted at iucr.org is unavailable due to technical difficulties.