Volume 2013, Issue 1 694181
Research Article
Open Access

The Learning Rates of Regularized Regression Based on Reproducing Kernel Banach Spaces

Baohuai Sheng

Corresponding Author

Baohuai Sheng

Department of Mathematics, Shaoxing University, Shaoxing 312000, China usx.edu.cn

Search for more papers by this author
Peixin Ye

Peixin Ye

School of Mathematics and LPMC, Nankai University, Tianjin 300071, China nankai.edu.cn

Search for more papers by this author
First published: 23 December 2013
Citations: 2
Academic Editor: Qiang Wu

Abstract

We study the convergence behavior of regularized regression based on reproducing kernel Banach spaces (RKBSs). The convex inequality of uniform convex Banach spaces is used to show the robustness of the optimal solution with respect to the distributions. The learning rates are derived in terms of the covering number and K-functional.

1. Introduction

Recently, there is an increasing research interest in learning with abstract functional spaces, and considerable work has been done in [13] and so on.

Let  (, ∥·∥)  be a normed vector space consisting of real functions on a compact distance space  (X, d(·, ·)), and let M > 0  be a given positive number. Let be a finite set of samples drawn independently and identically (i.i.d.) according to a distribution  ρ(x, y)  on  Z.  Then, the regularized learning scheme associating with a given hypothesis space    and the least square loss is
()

where  q ≥ 1  is a given real number. The unknown Borel probability distribution  ρ(x, y)  can be decomposed into  ρ(yx)  and  ρX(x), where ρ(yx) is the conditional probability of ρ at xX and ρX(x) is the marginal probability on X.

The regression function corresponding to the least square loss is
()
which satisfies
()

When the hypothesis spaces    in (1) are reproducing kernel Banach spaces, we call it the RKBSs based on regularized regression learning defined by [4, 5] recently. The represented theorem related closely to regularized learning is studied in case that    is an RKBS, and the discussions are extended to the generalized semi-inner-product RKBSs in [6].

In the present paper, we will provide an investigation on the learning rates of scheme (1) when    is an RKBS with uniform convexity. The paper is organized as follows. In Section 2, we show the main results of the present paper. The robustness is studied in Section 3, and the sample errors are bounded in Section 4. The approximation error boils down to a K-functional. The learning rates are bounded in Section 5.

For a given real number p ≥ 1, we denote by  Lp(ρX)  the class of  ρX-measurable functions  f  satisfying .

We say A = O(B) if there is a constant  C > 0  such that  A/BC. We say A ~ B  if both  A = O(B)  and  B = O(A).

2. Notions and Results

To state the results of the present paper, we first introduce some notions as follows.

2.1. The RKBSs

We denote by    the Banach space with dual space  *  and norm  ∥·∥. For  f  and   f**, we write.

A reproducing kernel Banach space (RKBS) on  X  is a reflexive Banach space of real functions on  X  whose dual space  *  is isometric to a Banach space  #  of functions on X, and the point evaluations are continuous functions on both    and   #. It was shown by Theorem  2 of [4] that if  B  is an RKBS on  X, then, there exists uniquely a function  K : X × X called the reproducing kernel of    satisfying the following:
  • (i)

    K(x, ·) ∈ , K(·, x) ∈ *, xX;

  • (ii)

    f(x) = 〈f, K(·, x)〉 , f*(x) = 〈K(x, ·), f*〉 , f**,  xX.

  • (iii)

    The linear span of  {K(x, ·) : xX}  is dense in  , namely,

    ()

  • (iv)

    The linear span of  {K(·, x) : xX}  is dense in  *, namely,

()
  • (v)

    For all  x, yX  there holds  K(x, y) = 〈K(x, ·), K(·, y)〉 .

When    is an RKHS,  K  is indeed the reproducing kernel in the usual sense (see [7]).

Since    is a reflective Banach space, we have
()
()

A way of producing reproducing kernel spaces in Lp spaces by the idempotent integral operators was provided in [8]. In the present paper, we provide a method to construct RKBSs by orthogonal function series.

Example 1. Let  X = [a, b]  be a given closed interval and let,be a sequence of continuous functions on  [a, b]  satisfying the following:

  • (i)

    φkLp(ρ)  for  k = 0,1, 2, …;

  • (ii)

    φk  and  φl  are orthonormal (in  L2(ρ)) when  lk;

  • (iii)

    is dense in  Lp(ρ)  for  1 < p < +.

Letbe a given positive real number sequence satisfying . Define

()

and the functional classon  X  by

()

where  ak(f) = ∫[a,b]f(y)φk(y)dρ(y),   kN. We define the spacefor  p = p/(p − 1)  in an analogous way.

We have the following proposition.

Proposition 2. Define a bivariate operation onandby

()

Then, is a reproducing kernel Banach space with reproducing kernel  K(x, y).  

Proof. Let andbe defined in an analogous way. Then, bothandare Banach spaces andand.

By (9) we knowandare isometric isomorphisms. Therefore,are Banach spaces.

Since al(K(·, y)) = λlφl(y), we have for that
()

By the same way, we have for any that 〈K(x, ·), g〉 = g(x); that is, the reproducing property holds.

2.2. The Uniform Convexity

In this subsection, we focus on some notions in convex analysis and Banach geometry theory.

Let  F(f) : be a convex function. Then,
()

We call  F(f)  the subdifferential of  F(f)  at  f. If  ξF(f), then, we call  ξ  a subgradient of  F  at  f.

A well-known result is that  f0  is a minimal value point of a convex function  F(f)  on    if and only if  0 ∈ F(f0)  (see [9]).

A Banach space    is called  q-uniform convex if there are constants  q > 0,  c > 0  such that the modulus defined by
()

satisfies  δ(ɛ) ≥ cɛq.  In particular, any Hilbert spaces are 2-uniform convex Banach spaces.

Define Then, by (28) in Corollary  1 of [10] we know    is  q-uniform convex if and only if there is a positive constant  cq > 0  such that for all  f, g and all jq(f) ∈ Jq(f) there holds
()

In [1114] we know that, for a given 1 < p < +, the space lp, the Lebesgue spaces  Lp  and the Sobolev spaceare  max {2, p}-uniform convex. Also, letandbe defined as in Section 2.1. Then, by the fact thatand are isometric isomorphisms, we knowis  2-uniform convex if  p > 2  and  p-uniform convex if  1 < p ≤ 2. Therefore, we knowis a  q-uniform convex Banach space, where  q  is 2 if  p > 2  and its value is  p/(p − 1)  if  1 < p ≤ 2.

2.3. Main Results

Let  S  be a distance space and  η > 0. The covering number  𝒩(S, η)   is defined to be the minimal positive integer number  l  such that there exists  l  disk in  S  with radius  η  covering  S.

We say a compact subset  E  in a distance space  (, ∥·∥) has logarithmic complexity exponent  s ≥ 0  if there is a constant  cs > 0  such that the closed ball of radius  R  centered at origin, that is, R = {fE : ∥fR}, satisfies
()

Now we are in a position to present the main results of this paper.

Theorem 3. Let    be an RKBS with  q-uniform convexity and a reproducing kernel K(·, x)  which is uniform continuous on  X  in terms of the norm, that is, . is a uniform continuous function on  X, and there is a constant  k > 0  such thatholds for all  xX. Let  fz  be the unique minimizer of scheme (1). If  fρL2(ρX), then for any  ϵ > 0  there holds

()

where

()

is a  K-functional,and

()

The covering number involved in (16) has been studied widely (see [1519]). In this paper, we assume 𝒩(R, η)  has the logarithmic complexity.

Theorem 4. Under the conditions of Theorem 3, if fρL2(ρX) and (, ∥·∥)  has logarithmic complexity with exponent s ≥ 0, then for any δ ∈ (0,1), with confidence 1 − δ, there holds

()

where  cs  is defined in (15).

We now give some remarks on Theorems 3 and 4.
  • (i)

    In Theorem 3, we require that the kernel  K(x, y)  is uniform continuous and uniform bounded on  X. In fact, a large class of real bivariate functions satisfies these conditions. For example, if the function sequencedefined in Example 1 is uniformly bounded, that is, |φl(x)| ≤ 1  holds for all  l  and all  x ∈ [a, b], then, kernel  K(x, y) is continuous on [a, b]×[a, b]  which turns out that K(x, y) is uniform continuous on [a, b]×[a, b]. Therefore, shows that K(x, y) is uniform continuous and bounded with norm .

  • (ii)

    By the definition of γq, we know that if Dq(fρ, λ) = O(λβ),   0 < β ≤ 1, then, . It is bounded if β = 1.

  • (iii)

    If is a reproducing kernel Hilbert space, then, q = 2,  cq = 1. Moreover, if   Dq(fρ, λ) = O(λβ),   0 < β ≤ 1, then, we have by (19) that

    ()

  • (iv)

    We can show a way of bounding the decay rates of  Dq(fρ, λ) for 1 < p ≤ 2. Let  fLp(ρ). Then, we have the following Fourier expansion:

()
  • Define an operator sequence by

    ()

  • Then, for a given positive integer  n  we have  al(Vn(f)) = λlal(f)  and

    ()

  • where we have used the generalized Bessel inequality (see [20]):

    ()

  • Also,

    ()

  • By (25) and (23) we knowholds for all positive integers  n  and, in this case,

    ()

  • One can choose suitable  n  such that it depends upon the sample number  m  and obtain the decay rates when  m → +.  There are many choices for the type of operator (22). For example, the Bernstein-Durrmeyer operators (see, e.g., [2123]) and the de la Valle-Poussin sum operators are such types (see [24]). This method was first provided by [25] and was extended in [26, 27].

  • (v)

    We know from [19] that the RKHSs with logarithmic complexity with exponent  s ≥ 0   exist. By Corollary  4.1 and Theorem  2.1 of [16] we know that if  λl  satisfy  λl ~ 1/(1+l)α,   α > 1, then, the covering number ofmay attain the decay of complexity exponent. In a recent paper (see [28]), Guntuboyina and Sen showed that the set of all convex functions defined on  [a, b] d  that are uniform bounded has the logarithmic complexity exponent  d/2  in the  Lp-metric.

3. Robustness

Robustness is a quantitative description of the solutions on the distributions.

Define the  ρ-control integral regularized model corresponding to (1) by
()
where  ρ(f)  is defined in (3). Then,  f(ρ)  is influenced by the distributions  ρ. For any bounded  ρ-measurable function  f(x, y)  on  Z, we define the empirical measure  γz(x, y)  as follows:
()

Then,We give the following theorem.

Theorem 5. Let    be an RKBS with  q-uniform convexity and the reproducing kernel  K(x, y), and let   f(ρ)  and  f(γ)  be the solutions of scheme (27) with respect to distributions  ρ  and  γ,   respectively. Then,

()

where  cq  is the constant defined in (14).

Theorem 5 shows how  ρ  influences the unique solution  f(ρ).

To prove Theorem 5, we need the following lemmas.

Lemma 6. Under the conditions of Theorem 5, there holds

()

where the point   ·   in  K(·, x)  means  K(·, x) ∈   for any  xX.

Proof. We restate the following statement.

Let  (, ∥·∥)  be a Banach space, F(f) :  {∓}  be a real function. We say  F  is Gateaux differentiable at  f0  if there is an  ξ*  such that for any  g  there holds

()

and writeBy [29] we know that if  F  is convex on    and is Gateaux differentiable at  f0, then,

By equality

()

we have for any  g(x) = 〈g, K(·, x)〉  that

()

Since  ρ(f)  is a convex function on  , we know (30) holds.

Lemma 7. Take. Then, under the conditions of Theorem 5, there hold the following.

  • (i)

    There exists uniquely a minimizer  f(ρ)  of the problem (27) and

    ()

  • (ii)

    There is a  jq(f(ρ)) ∈ Jq(f(ρ))  such that

    ()

Proof. The uniqueness of the minimizer can be obtained by the fact that (27) is a strict convex optimization problem. By the definition of  f(ρ), we have

()

We then have (34).

Proof of (35). Since  f(ρ)  is the unique solution of (27), we have

()

Notice that both  ρ(f)  andare convex functions about  f  on  . We have

()

By (30), we know that (37) leads to

()

Therefore, there is  jq(f(ρ)) ∈ Jq(f(ρ))  such that (35) holds.

Lemma 8. Let    be an RKBS satisfying the conditions of Theorem 3. Then,

()

Proof. The reproducing property and (16) give

()

Then, the factgives (40).

Lemma 9. Let K(x, y) be the reproducing kernel of ,   and K(·, x) is uniform continuous about x on X in norm, R > 0 be a given real number. Then, the ball R = {f : ∥fR} is a compact subset of C(X).

Proof. Since  X  is a compact distance space, so is X × X. Since  K(·, x)  is uniform continuous about  x  in norm, we know that for any  ϵ > 0  there is a  δ > 0  such that for all  x, xX   with   d(x, x) < δ, we have

()

and for any  fR  holds

()

By (43), we know that R  is a closed, bounded, and equicontinuous set. Therefore, R   is a compact set of C(X).

Proof of Theorem 5. By the definition of  γ(f(ρ))  and (30) we know

()

Also, by (44) and the definitions of f(ρ) and f(γ) we have

()

Since    is  q-uniform convex, we have by (14) and the definition of  jq(f(ρ))  that

()

Combining (46) with (45), we have

()

It follows that

()

We then have (29).

4. Sample Error

We give the following sample error bounds.

Theorem 10. Let    be an RKBS satisfying the conditions of Theorem 3.  f(ρ)  is the solution of scheme (27) with respect to  ρ  and  fz  is the solution of (1). Then, for all   ϵ > 0  there hold

()

where

()

To show Theorem 10, we first give a lemma.

Lemma 11 (see [15].)Let    be a family of functions from a probability space  Z  to    and d(·, ·)   a distance on  . Let  𝒰Z  be of full measure and constants  B, L > 0  such that

  • (i)

    |ξ(z)| ≤ B for all   ξ and all z𝒰,

  • (ii)

    |Lz(ξ1) − Lz(ξ2)| ≤ Ld(ξ1, ξ2) for all   ξ1, ξ2 and all z𝒰m, where

    ()

Then, for all  ϵ > 0,

()

Proof of Theorem 10. Take   γ = γz   into (29). Then,

()

By (7) and the reproducing property, we have

()

Since

()

and (40), we have

()

Define

()

Then,

()

By (52), we have for all  ϵ > 0  that

()

By (53), (56), and (59), we know

()

which gives

()

It follows that

()

That is,

()

We then have (49).

5. Learning Rates

Proof of Theorem 3. We know from [30] that for any  fL2(ρX)  there holds

()

Since  X  is a compact set, we have by (40) thatTherefore,

()

By (65) we have

()

which gives for any h > 0   that

()

By (49) and above inequality we have

()

or

()

Since  1,we know

()

By (69) and above inequality we have (16).

To show Theorem 4, we need two lemmas.

Lemma 12 (see [31].)Let  c1 > 0,  c2 > 0  and u > t > 0.  Then, the equation

()

has a unique positive zero  x*. In addition,

()

Proof of Theorem 4. Since  (, ∥·∥)  has logarithmic complexity exponent  s ≥ 0, we have by (15) a constant  cs > 0  such that

()

Then, by (16) we have

()

Take

()

Then,

()

By Lemma 12, we know that the unique solution  ϵ*  of (75) satisfies

()
By (74) and (77), we have (19).

Acknowledgments

This work was supported partially by the National Natural Science Foundation of China under Grant nos. 10871226, 61179041, 11271199. The authors thank the reviewers for giving many valuable suggestions and comments which make the paper presented in a better form.

      The full text of this article hosted at iucr.org is unavailable due to technical difficulties.