Coefficient-Based Regression with Non-Identical Unbounded Sampling
Abstract
We investigate a coefficient-based least squares regression problem with indefinite kernels from non-identical unbounded sampling processes. Here non-identical unbounded sampling means the samples are drawn independently but not identically from unbounded sampling processes. The kernel is not necessarily symmetric or positive semi-definite. This leads to additional difficulty in the error analysis. By introducing a suitable reproducing kernel Hilbert space (RKHS) and a suitable intermediate integral operator, elaborate analysis is presented by means of a novel technique for the sample error. This leads to satisfactory results.
1. Introduction and Preliminary
Definition 1. Let 0 ≤ s ≤ 1; we say that the sequence converges exponentially fast in (Cs(X)) * to a probability measure ρX on X or converges exponentially in short if there exist C1 > 0 and 0 < α< 1 such that
Then we have . By using the integral operator technique from [4], in [9], Sun and Wu gave the capacity independent estimate for the convergence rate of , where ∥f∥ρ = (∫X |f(x)|2dρX(x)) 1/2. Shi investigated the error analysis in a data dependent hypothesis space for general kernels [10]. Sun and Guo conducted error analysis for the Mercer kernels by uniform bounded non-i.i.d. sampling [11]. In this paper, we study learning algorithm (8) by non-identical unbounded sampling processes with indefinite kernels.
If K is a Mercer kernel, from [3] we know that ℋK is in the range of . For an indefinite kernel K, recall . Based on the polar decomposition of compact operators ([12]).
Lemma 2. Let H be a separable Hilbert space and T a compact operator on H; then T can be factored as
We immediately have the following proposition [10].
Proposition 3. Consider as a subspace of ; then and , where U is a partial isometry on with U*U being the orthogonal projection onto .
We will conduct the error analysis in several steps. The major contribution we make is on the sample error estimate; the main difficulty is the non-identical unbounded sampling of the samples; we overcome this difficulty by introducing a suitable intermediate operator.
2. Key Analysis and Main Results
In order to give the error analysis, we assume that the kernel satisfies the following kernel condition [1, 11].
Definition 4. We say that the Mercer kernel satisfies the kernel condition of order s if, for some constant , and for all u, v ∈ X,
Since sample z is drawn from unbounded sampling processes, we will assume the following moment hypothesis condition [13].
There is a large literature on error analysis for learning algorithm (6); see, for example, [4, 5, 14–17]. But most obtained results are presented under the standard assumption that |y | ≤ M almost surely (M is a constant). This excludes the case of Gaussian noise. The moment hypothesis condition is a natural generalization of the condition |y | ≤ M. Wang and Zhou considered error analysis for algorithm (6) under condition (15). Our main results are about learning rates of algorithm (8) under conditions (3), (14) and the approximation ability of in terms of fρ.
Now we can state our general results on learning rates for algorithm (8).
Theorem 5. Assume moment hypothesis condition (15); satisfies condition (3) and satisfies condition (14); , for 1/2 < r ≤ 3/2; take λ = T−θ with 0 < θ < 1/3; then
Remark 6. If we take λ = T−1/2(r+1), then our rate is O(T−(2r−1)/4(r+1)). The proof of Theorem 5 will be conducted in Section 3, where the error term is decomposed into three parts. In [11], the authors consider the coefficient-based regression with the Mercer kernels by uniform bounded non-i.i.d sampling; the best rate of order O(T−2r/(1+2r)) was obtained.
When the samples are drawn i.i.d from measure ρ, we have the following result.
Theorem 7. Assume moment hypothesis condition (15); satisfies condition (14); ; then if 0 < r ≤ 1, take λ = T−1/(2r+3); one see that
Here we get the same learning rate as the one in [9]. But our rate is derived under a relaxation condition of the sampling output.
3. Error Analysis
In this section, we will state the error analysis in several steps.
3.1. Regularization Error Estimation
In this subsection, we address a bound for the regularization error . The error estimate for regularization error has been investigated in lots of the literature in learning theory ([4, 18] and the references therein); we will omit the proof and quote it directly.
Proposition 8. Assume for some and r > 0; the following bound for approximation error holds:
3.2. Estimate for the Measure Error
This subsection is devoted to the analysis of the term caused by the difference of measures, which we called measure error. The ideas of proof are from [1]. Before giving the result, let us state a crucial lemma first.
Lemma 9. Assume satisfies condition (14); then
Proof. For any h ∈ Cs(X), we see that
Proposition 10. Assume for some 1/2 < r ≤ 3/2; satisfies condition (14); the following bound for measure error holds:
3.3. Sample Error Estimation
Now we state our estimation for the sample error. The estimates are more involved since the sample is drawn by non-identical unbounded sampling processes. We overcome the difficulty by introducing a stepping integral operator which plays an intermediate role in the estimates, and the definition of it will be given later.
Theorem 11. Let fz,λ be given by (8), assume moment hypothesis condition (15), and the marginal distribution sequence , satisfies condition (3); then
Proof. We will estimate I and II, respectively:
This together with (45) yields
The term II is more involved; recall that
If we define ηtj(x) = ytK(xt, xj)K(x, xj) and
Applying the conclusion as shown in [9] and together with the above bound, we can see that
This yields
Now we are in a position to give the proofs of Theorems 5 and 7.
Proof of Theorem 5. Theorem 11 ensures that
For 1/2 < r ≤ 3/2, Proposition 10 tells that
Combining all the bounds together and noting that λ = T−θ with 0 < θ < 1/3, we can get the conclusion of Theorem 5 by taking .
Proof of Theorem 7. When the samples are drawn i.i.d. from measure ρ, then . Hence
The conclusion follows by discussing the relationship between r and 1.
Acknowledgments
The author would like to thank Professor Hongwei Sun for useful discussions which have helped to improve the presentation of the paper. The work described in this paper is supported partially by National Natural Science Foundation of China (Grant no. 11001247) and Doctor Grants of Guangdong University of Business Studies (Grant no. 11BS11001).