The Learning Rates of Regularized Regression Based on Reproducing Kernel Banach Spaces
Abstract
We study the convergence behavior of regularized regression based on reproducing kernel Banach spaces (RKBSs). The convex inequality of uniform convex Banach spaces is used to show the robustness of the optimal solution with respect to the distributions. The learning rates are derived in terms of the covering number and K-functional.
1. Introduction
Recently, there is an increasing research interest in learning with abstract functional spaces, and considerable work has been done in [1–3] and so on.
where q ≥ 1 is a given real number. The unknown Borel probability distribution ρ(x, y) can be decomposed into ρ(y∣x) and ρX(x), where ρ(y∣x) is the conditional probability of ρ at x ∈ X and ρX(x) is the marginal probability on X.
When the hypothesis spaces ℬ in (1) are reproducing kernel Banach spaces, we call it the RKBSs based on regularized regression learning defined by [4, 5] recently. The represented theorem related closely to regularized learning is studied in case that ℬ is an RKBS, and the discussions are extended to the generalized semi-inner-product RKBSs in [6].
In the present paper, we will provide an investigation on the learning rates of scheme (1) when ℬ is an RKBS with uniform convexity. The paper is organized as follows. In Section 2, we show the main results of the present paper. The robustness is studied in Section 3, and the sample errors are bounded in Section 4. The approximation error boils down to a K-functional. The learning rates are bounded in Section 5.
For a given real number p ≥ 1, we denote by Lp(ρX) the class of ρX-measurable functions f satisfying .
We say A = O(B) if there is a constant C > 0 such that A/B ≤ C. We say A ~ B if both A = O(B) and B = O(A).
2. Notions and Results
To state the results of the present paper, we first introduce some notions as follows.
2.1. The RKBSs
We denote by ℬ the Banach space with dual space ℬ* and norm ∥·∥ℬ. For f ∈ ℬ and f* ∈ ℬ*, we write.
- (i)
K(x, ·) ∈ ℬ, K(·, x) ∈ ℬ*, x ∈ X;
- (ii)
f(x) = 〈f, K(·, x)〉 ℬ, f*(x) = 〈K(x, ·), f*〉 ℬ, f* ∈ ℬ*, x ∈ X.
- (iii)
The linear span of {K(x, ·) : x ∈ X} is dense in ℬ, namely,
()
- (iv)
The linear span of {K(·, x) : x ∈ X} is dense in ℬ*, namely,
- (v)
For all x, y ∈ X there holds K(x, y) = 〈K(x, ·), K(·, y)〉 ℬ.
When ℬ is an RKHS, K is indeed the reproducing kernel in the usual sense (see [7]).
A way of producing reproducing kernel spaces in Lp spaces by the idempotent integral operators was provided in [8]. In the present paper, we provide a method to construct RKBSs by orthogonal function series.
Example 1. Let X = [a, b] be a given closed interval and let,be a sequence of continuous functions on [a, b] satisfying the following:
- (i)
φk ∈ Lp(ρ) for k = 0,1, 2, …;
- (ii)
φk and φl are orthonormal (in L2(ρ)) when l ≠ k;
- (iii)
is dense in Lp(ρ) for 1 < p < +∞.
Letbe a given positive real number sequence satisfying . Define
and the functional classon X by
where ak(f) = ∫[a,b] f(y)φk(y)dρ(y), k ∈ N. We define the spacefor p′ = p/(p − 1) in an analogous way.
We have the following proposition.
Proposition 2. Define a bivariate operation onandby
Then, is a reproducing kernel Banach space with reproducing kernel K(x, y).
Proof. Let andbe defined in an analogous way. Then, bothandare Banach spaces andand.
By (9) we knowandare isometric isomorphisms. Therefore,are Banach spaces.
By the same way, we have for any that 〈K(x, ·), g〉 = g(x); that is, the reproducing property holds.
2.2. The Uniform Convexity
In this subsection, we focus on some notions in convex analysis and Banach geometry theory.
We call ∂F(f) the subdifferential of F(f) at f ∈ ℬ. If ξ ∈ ∂F(f), then, we call ξ a subgradient of F at f.
A well-known result is that f0 is a minimal value point of a convex function F(f) on ℬ if and only if 0 ∈ ∂F(f0) (see [9]).
satisfies δℬ(ɛ) ≥ cɛq. In particular, any Hilbert spaces are 2-uniform convex Banach spaces.
In [11–14] we know that, for a given 1 < p < +∞, the space lp, the Lebesgue spaces Lp and the Sobolev spaceare max {2, p}-uniform convex. Also, letandbe defined as in Section 2.1. Then, by the fact thatand are isometric isomorphisms, we knowis 2-uniform convex if p > 2 and p′-uniform convex if 1 < p ≤ 2. Therefore, we knowis a q-uniform convex Banach space, where q is 2 if p > 2 and its value is p/(p − 1) if 1 < p ≤ 2.
2.3. Main Results
Let S be a distance space and η > 0. The covering number 𝒩(S, η) is defined to be the minimal positive integer number l such that there exists l disk in S with radius η covering S.
Now we are in a position to present the main results of this paper.
Theorem 3. Let ℬ be an RKBS with q-uniform convexity and a reproducing kernel K(·, x) which is uniform continuous on X in terms of the norm, that is, . is a uniform continuous function on X, and there is a constant k > 0 such thatholds for all x ∈ X. Let fz be the unique minimizer of scheme (1). If fρ ∈ L2(ρX), then for any ϵ > 0 there holds
where
is a K-functional,and
The covering number involved in (16) has been studied widely (see [15–19]). In this paper, we assume 𝒩(ℬR, η) has the logarithmic complexity.
Theorem 4. Under the conditions of Theorem 3, if fρ ∈ L2(ρX) and (ℬ, ∥·∥ℬ) has logarithmic complexity with exponent s ≥ 0, then for any δ ∈ (0,1), with confidence 1 − δ, there holds
where cs is defined in (15).
- (i)
In Theorem 3, we require that the kernel K(x, y) is uniform continuous and uniform bounded on X. In fact, a large class of real bivariate functions satisfies these conditions. For example, if the function sequencedefined in Example 1 is uniformly bounded, that is, |φl(x)| ≤ 1 holds for all l and all x ∈ [a, b], then, kernel K(x, y) is continuous on [a, b]×[a, b] which turns out that K(x, y) is uniform continuous on [a, b]×[a, b]. Therefore, shows that K(x, y) is uniform continuous and bounded with norm .
- (ii)
By the definition of γq, we know that if Dq(fρ, λ) = O(λβ), 0 < β ≤ 1, then, . It is bounded if β = 1.
- (iii)
If ℬ is a reproducing kernel Hilbert space, then, q = 2, cq = 1. Moreover, if Dq(fρ, λ) = O(λβ), 0 < β ≤ 1, then, we have by (19) that
()
- (iv)
We can show a way of bounding the decay rates of Dq(fρ, λ) for 1 < p ≤ 2. Let f ∈ Lp(ρ). Then, we have the following Fourier expansion:
-
Define an operator sequence by
() -
Then, for a given positive integer n we have al(Vn(f)) = λlal(f) and
() -
where we have used the generalized Bessel inequality (see [20]):
() -
Also,
() -
By (25) and (23) we knowholds for all positive integers n and, in this case,
() -
One can choose suitable n such that it depends upon the sample number m and obtain the decay rates when m → +∞. There are many choices for the type of operator (22). For example, the Bernstein-Durrmeyer operators (see, e.g., [21–23]) and the de la Valle-Poussin sum operators are such types (see [24]). This method was first provided by [25] and was extended in [26, 27].
- (v)
We know from [19] that the RKHSs with logarithmic complexity with exponent s ≥ 0 exist. By Corollary 4.1 and Theorem 2.1 of [16] we know that if λl satisfy λl ~ 1/(1+l)α, α > 1, then, the covering number ofmay attain the decay of complexity exponent. In a recent paper (see [28]), Guntuboyina and Sen showed that the set of all convex functions defined on [a, b] d that are uniform bounded has the logarithmic complexity exponent d/2 in the Lp-metric.
3. Robustness
Robustness is a quantitative description of the solutions on the distributions.
Then,We give the following theorem.
Theorem 5. Let ℬ be an RKBS with q-uniform convexity and the reproducing kernel K(x, y), and let f(ρ) and f(γ) be the solutions of scheme (27) with respect to distributions ρ and γ, respectively. Then,
where cq is the constant defined in (14).
Theorem 5 shows how ρ influences the unique solution f(ρ).
To prove Theorem 5, we need the following lemmas.
Lemma 6. Under the conditions of Theorem 5, there holds
where the point · in K(·, x) means K(·, x) ∈ ℬ for any x ∈ X.
Proof. We restate the following statement.
Let (ℬ, ∥·∥ℬ) be a Banach space, F(f) : ℬ → ℜ⋃ {∓∞} be a real function. We say F is Gateaux differentiable at f0 ∈ ℬ if there is an ξ ∈ ℬ* such that for any g ∈ ℬ there holds
and writeBy [29] we know that if F is convex on ℬ and is Gateaux differentiable at f0 ∈ ℬ, then,
By equality
we have for any g(x) = 〈g, K(·, x)〉 ℬ ∈ ℬ that
Since ℰρ(f) is a convex function on ℬ, we know (30) holds.
Lemma 7. Take. Then, under the conditions of Theorem 5, there hold the following.
- (i)
There exists uniquely a minimizer f(ρ) of the problem (27) and
() - (ii)
There is a jq(f(ρ)) ∈ Jq(f(ρ)) such that
()
Proof. The uniqueness of the minimizer can be obtained by the fact that (27) is a strict convex optimization problem. By the definition of f(ρ), we have
We then have (34).
Proof of (35). Since f(ρ) is the unique solution of (27), we have
Notice that both ℰρ(f) andare convex functions about f on ℬ. We have
By (30), we know that (37) leads to
Therefore, there is jq(f(ρ)) ∈ Jq(f(ρ)) such that (35) holds.
Lemma 8. Let ℬ be an RKBS satisfying the conditions of Theorem 3. Then,
Lemma 9. Let K(x, y) be the reproducing kernel of ℬ, and K(·, x) is uniform continuous about x on X in norm, R > 0 be a given real number. Then, the ball ℬR = {f ∈ ℬ : ∥f∥ℬ ≤ R} is a compact subset of C(X).
Proof. Since X is a compact distance space, so is X × X. Since K(·, x) is uniform continuous about x in norm, we know that for any ϵ > 0 there is a δ > 0 such that for all x, x′ ∈ X with d(x, x′) < δ, we have
and for any f ∈ ℬR holds
By (43), we know that ℬR is a closed, bounded, and equicontinuous set. Therefore, ℬR is a compact set of C(X).
4. Sample Error
We give the following sample error bounds.
Theorem 10. Let ℬ be an RKBS satisfying the conditions of Theorem 3. f(ρ) is the solution of scheme (27) with respect to ρ and fz is the solution of (1). Then, for all ϵ > 0 there hold
where
To show Theorem 10, we first give a lemma.
Lemma 11 (see [15].)Let ℱ be a family of functions from a probability space Z to ℜ and d(·, ·) a distance on ℱ. Let 𝒰 ⊂ Z be of full measure and constants B, L > 0 such that
- (i)
|ξ(z)| ≤ B for all ξ ∈ ℱ and all z ∈ 𝒰,
- (ii)
|Lz(ξ1) − Lz(ξ2)| ≤ Ld(ξ1, ξ2) for all ξ1, ξ2 ∈ ℱ and all z ∈ 𝒰m, where
()
Then, for all ϵ > 0,
5. Learning Rates
Proof of Theorem 3. We know from [30] that for any f ∈ L2(ρX) there holds
Since X is a compact set, we have by (40) thatTherefore,
By (65) we have
which gives for any h > 0 that
By (49) and above inequality we have
or
Since ℱ ⊂ ℬ1,we know
To show Theorem 4, we need two lemmas.
Lemma 12 (see [31].)Let c1 > 0, c2 > 0 and u > t > 0. Then, the equation
has a unique positive zero x*. In addition,
Acknowledgments
This work was supported partially by the National Natural Science Foundation of China under Grant nos. 10871226, 61179041, 11271199. The authors thank the reviewers for giving many valuable suggestions and comments which make the paper presented in a better form.