Properties and Iterative Methods for the Q-Lasso
Abstract
We introduce the Q-lasso which generalizes the well-known lasso of Tibshirani (1996) with Q a closed convex subset of a Euclidean m-space for some integer m ≥ 1. This set Q can be interpreted as the set of errors within given tolerance level when linear measurements are taken to recover a signal/image via the lasso. Solutions of the Q-lasso depend on a tuning parameter γ. In this paper, we obtain basic properties of the solutions as a function of γ. Because of ill posedness, we also apply l1-l2 regularization to the Q-lasso. In addition, we discuss iterative methods for solving the Q-lasso which include the proximal-gradient algorithm and the projection-gradient algorithm.
1. Introduction
The purpose of this paper is to study the behavior, in terms of γ > 0, of solutions to the regularized problem (7). (We leave the more general problem (11) to further work, due to the fact that the involvement of another closed convex set C brings some technical difficulties which are not easy to overcome.) We discuss iterative methods for solving the Q-lasso, including the proximal-gradient method and the projection-gradient method, the latter being derived via a duality technique. Due to ill posedness, we also apply the ℓ1-ℓ2 regularization to the Q-lasso.
2. Preliminaries
Proposition 1. One has that PK is firmly nonexpansive in the sense that
Proposition 2. Let f be everywhere finite-valued on ℝn.
Proposition 3. Let f be everywhere finite-valued convex on ℝn and z ∈ ℝn. Suppose f is bounded below (i.e., inf {f(x) : x ∈ ℝn}>−∞). Then z is a solution to minimization (20) if and only if it satisfies the first-order optimality condition:
3. Properties of the Q-Lasso
Observe that the assumption that S ≠ ∅ actually implies that Sγ is uniformly bounded in γ > 0, as shown by the lemma below.
Lemma 4. Assume that (24) is consistent (i.e., S ≠ ∅). Then, for γ > 0 and xγ ∈ Sγ, one has .
Proof. Let xγ ∈ Sγ. In the relation
Proposition 5. One has the following.
- (i)
The functions
() -
are well defined for γ > 0. That is, they do not depend upon particular choice of xγ ∈ Sγ.
- (ii)
The function ρ(γ) is decreasing in γ > 0.
- (iii)
The function η(γ) is increasing in γ > 0.
- (iv)
(I − PQ)Axγ is continuous in γ > 0.
Proof. For xγ ∈ Sγ, we have the optimality condition:
Now substituting xβ ∈ Sβ for x in (31), we get
To see that η(γ) is increasing, we use the inequality (as xγ ∈ Sγ)
Proposition 6. One has the following.
- (i)
.
- (ii)
lim γ→0ρ(γ) = min x∈S∥x∥1.
Proof. (i) Taking the limit as γ → 0 in the inequality (and using the boundedness of (xγ))
As for (ii), we have, by (27), for any . In particular, , where x† is an ℓ1 minimum-norm element of S; that is, .
Assume γk → 0 is such that . Then for any x,
It is a challenging problem how to select the tuning (i.e., regularizing) parameter γ in lasso (1) and Q-lasso (7). There is no general rule to universally select γ which should instead be selected in a case-to-case manner. The following result however points out that γ cannot be large.
Proposition 7. Let Q be a nonempty closed convex subset of ℝm and assume that Q-lasso (7) is consistent (i.e., solvable). If (note that this condition is reduced to for lasso (1) for which Q = {b}), then xγ = 0. (Here S is, as before, the solution set of the Q-least squares problem (24).)
Proof. Let xγ ∈ Sγ. The optimality condition
Now by Lemma 4, we have . Hence, from (49) it follows that if xγ ≠ 0, we must have . This completes the proof.
Notice that (48) shows that can be determined by Axγ. Hence, we arrive at the following characterization of solutions of Q-lasso (23).
Proposition 8. Let Q be a nonempty closed convex subset of ℝm and let γ > 0 and xγ ∈ Sγ. Then is a solution of the Q-lasso (23) if and only if and . It turns out that
Proof. If , then from the relations
4. Iterative Methods
In this section we discuss the proximal iterative methods for solving Q-lasso (7). The basics are Moreau′s concept of proximal operators and their fundamental properties which are briefly mentioned below. (For the sake of our purpose, we however confine ourselves to the finite-dimensional setting.)
4.1. Proximal Operators
Let Γ0(ℝn) be the space of convex functions in ℝn that are proper, lower semicontinuous and convex.
Definition 9 (see [10], [11].)The proximal operator of φ ∈ Γ0(ℝn) is defined by
For fundamental properties of proximal operators, the reader is referred to [12, 13] for details. Here we only mention the fact that the proximal operator proxλφ can have a closed-form expression in some important cases as shown in the examples below [12].
- (a)
If we take φ to be any norm ∥·∥ of ℝn, then
() -
In particular, if we take φ to be the absolute value function of the real line ℝ, we get
() -
which is also known as the scalar soft-thresholding operator.
- (b)
Let be an orthonormal basis of ℝn and let be real positive numbers. Define φ by
() -
Then , where
() -
In particular, if φ(x) = ∥x∥1 for x ∈ ℝn, then
() -
where αk = sgn (xk)max {|xk| − λ, 0} for k = 1, …, n.
4.2. Proximal-Gradient Algorithm
Proposition 10 (see [12], [14].)Let f, g ∈ Γ0(ℝn). Let x* ∈ ℝn and λ > 0. Assume f is finite valued and differentiable on ℝn. Then x* is a solution to (59) if and only if x* solves the fixed point equation:
Fixed point equation (60) immediately yields the following fixed point algorithm which is also known as the proximal-gradient algorithm for solving (59).
Theorem 11 (see [12], [14].)Let f, g ∈ Γ0(ℝn) and assume (59) is consistent. Assume in addition the following.
- (i)
∇f is Lipschitz continuous on ℝn:
() - (ii)
0 < liminf n→∞λn ≤ limsup n→∞λn < 2/L.
4.3. The Relaxed Proximal-Gradient Algorithm
The relaxed proximal-gradient algorithm generates a sequence (xk) by the following iteration process.
Theorem 12 (see [14].)Let f, g ∈ Γ0(ℝn) and assume (59) is consistent. Assume in addition the following.
- (i)
∇f is Lipschitz continuous on ℝn:
() - (ii)
0 < lim inf n→∞λn ≤ lim sup n→∞λn < 2/L.
- (iii)
0 < lim inf n→∞αn ≤ lim sup n→∞αn < 4/(2 + L · limsup n→∞λn).
If we take λn ≡ λ ∈ (0, 2/L), then the relaxation parameters αk can be chosen from a larger pool; they are allowed to be close to zero. More precisely, we have the following theorem.
Theorem 13 (see [14].)Let f, g ∈ Γ0(ℝn) and assume (59) is consistent. Define the sequence (xk) by the following relaxed proximal algorithm:
- (a)
∇f satisfies the Lipschitz continuity condition (i) in Theorem 12;
- (b)
0 < λ < 2/L and 0 ≤ αk ≤ (2 + λL)/4 for all k;
- (c)
.
4.4. Proximal-Gradient Algorithms Applied to Lasso
The convergence theorem of general proximal-gradient algorithm (61) reads the following for Q-lasso (7).
Theorem 14. Assume . Then the sequence (xk) generated by proximal-gradient algorithm (66) converges to a solution of lasso (7).
Remark 15. Relaxed proximal-gradient algorithms (63) and (65) also apply to Q-lasso (7). We however do not elaborate on them in detail.
Remark 16. Proximal-gradient algorithm (61) can be reduced to a projection-gradient algorithm in the case where the convex function g is homogeneous (i.e., g(tx) = tg(x) for t ≥ 0 and x ∈ ℝn) because the homogeneity of g implies that the proximal operator of g is actually a projection; more precisely, we have
Now we apply projection-gradient algorithm (68) to Q-lasso (7). In this case, we have and g(x) = γ∥x∥1 (homogeneous). Thus, ∇f(x) = At(I − PQ)Ax and the convex set K = ∂g(0) is given as K = γ∂(∥z∥1)|z=0 = γ[−1,1] n. We find that, for each positive number λ > 0, PλK is the projection of the Euclidean space ℝn to the ℓ∞ ball with radius of λγ; that is, {x ∈ ℝn : ∥x∥∞ ≤ λγ}. It turns out that proximal-projection algorithm (66) is rewritten as a projection algorithm below:
5. An ℓ1-ℓ2 Regularization for the Q-Lasso
Proposition 17. Assume the Q-least-squares problem
- (i)
As δ → 0 (for each fixed γ > 0 ), which is the ( ℓ2 ) minimum-norm solution to Q-lasso (7). Moreover, as γ → 0, every cluster point of is a ( ℓ1 ) minimum-norm solution of Q-least squares problem (73), that is, a point in the set argmin x∈S∥x∥1.
- (ii)
As γ → 0 (for each fixed δ > 0 ), which is the unique solution to the ℓ2 regularized problem:
() -
Moreover, as δ → 0, which is the ℓ2 minimal norm solution of (73); that is, .
Proof. We have that xγ,δ satisfies the optimality condition:
- (i)
For fixed γ > 0, we can use the theory of Tikhonov regularization to conclude that xγ,δ is continuous in δ > 0 and converges, as δ → 0, to which is the (ℓ2) minimum-norm solution to Q-lasso (7), that is, the unique element . By Proposition 6, we also find that every cluster point of , as γ → 0, lies in the set argmin x∈S∥x∥1.
- (ii)
Fix δ > 0 and use Proposition 6 to see that as γ → 0. Now the standard property of Tikhonov′s regularization ensures that as δ → 0.
Theorem 18. Assume
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
The authors were grateful to the anonymous referees for their helpful comments and suggestions which improved the presentation of this paper. This work was funded by the Deanship of Scientific Research (DSR), King Abdulaziz University, under Grant no. 2-363-1433-HiCi. The authors, therefore, acknowledge the technical and financial support of KAU.