Convergence of a Proximal Point Algorithm for Solving Minimization Problems
Abstract
We introduce and consider a proximal point algorithm for solving minimization problems using the technique of Güler. This proximal point algorithm is obtained by substituting the usual quadratic proximal term by a class of convex nonquadratic distance-like functions. It can be seen as an extragradient iterative scheme. We prove the convergence rate of this new proximal point method under mild assumptions. Furthermore, it is shown that this estimate rate is better than the available ones.
1. Introduction
The purpose of this paper is twofold. Firstly, it proposes an extension of the proximal point method introduced by Güler [1] in 1992, where the usual quadratic proximal term is substituted by a class of strictly convex distance-like functions, called Bregman functions. Secondly, it offers a general framework for the convergence analysis of the proximal point method of Güler. This framework is general enough to apply different classes of Bregman functions and still yield simple convergence proofs. The methods being analyzable in this context are called Güler′s generalized proximal point algorithm, and are closely related to the Bregman proximal methods [2–5]. The analysis, we develop is different from the works in [4, 5], since our method is based on Güler′s technique.
2. Preliminaries
Throughout this paper, ∥·∥ denotes the l2-norm and 〈·, ·〉 denotes the Euclidean inner product in ℝn. Let G be a continuous single-valued mapping from ℝn into ℝn. The mapping G is Lipschitz continuous with Lipschitz constant L, if for all x, y ∈ ℝn, ∥G(x) − G(y)∥ ≤ L∥x − y∥. We denote also by ρ(x, X) the distance of x to the set X and it is given by ρ(x, X) = min y∈X∥x − y∥. Further notations and definitions used in this paper are standard in convex analysis and may be found in Rockafellar′s book [7].
This type of kernels was introduced first by [8] in 1967. The corresponding algorithm using these Bregman proximal mappings is called the Generalized Proximal Point Method (GPPM) and known also under the terminology of Bregman Proximal Methods. These proximal method solve (2.1) by considering a sequence of unconstrained minimization problems, which can be summed as follows.
Algorithm 2.1. (1) Initialize x0 ∈ ℝn : f(x0) < ∞, c0 > 0.
(2) Compute the solution xk+1 by the iterative scheme:
For Dh(x, y) = (1/2)∥x−y∥2, Algorithm 2.1 coincides with the classical proximal point algorithm (PPA) introduced by Moreau [9] and Martinet [10].
Definition 2.2. h is called a Bregman function with zone S or a D-function if:
- (a)
h is continuously differentiable on S and continuous on ,
- (b)
h is strictly convex on ,
- (c)
for every λ ∈ ℝ, the partial level sets and L2(x, λ) = {y ∈ S : Dh(x, y) ≤ λ} are bounded for every y ∈ S and , respectively,
- (d)
if {yk} ∈ S is a convergent sequence with limit y*, then Dh(y*, yk) → 0,
- (e)
if {xk} and {yk} are sequences such that , {xk} is bounded and Dh(xk, yk) → 0, then xk → y*.
From the above definition, we extract the following properties (see, for instance, [6, 13]).
Lemma 2.3. Let h be a Bregman function with zone S. Then,
- (i)
Dh(x, x) = 0 and Dh(x, y) ≥ 0 for and y ∈ S,
- (ii)
for all a, b ∈ S and ,
() - (iii)
for all a, b ∈ S,
() - (iv)
for all
- (v)
let {xk} ∈ S such that xk → x* ∈ S, then Dh(x*, xk) → 0 and Dh(xk, x*) → 0.
Lemma 2.4. (i) Let g : ℝn → ℝ be a strictly convex function such that
(ii) If g is a Bregman function, then g(x) + c⊤x + d for any c ∈ ℝn, d ∈ ℝ, also is a Bregman function.
Remark 2.5. Dh(·, ·) cannot be considered as a distance because of the lack of the triangle inequality and the symmetry property. Dh(·, ·) is usually called an entropy distance.
The paper is organized as follows. In Section 3, we recall briefly the proximal point method of Güler. Section 4 will be devoted to the presentation and convergence analysis of the proposed algorithm. Finite convergence is shown in Section 5. Finally, in Section 6 we present an application of this method to solve variational inequalities problem.
3. Extragradient Algorithm
In 1992, Güler [1] has developed a new proximal point approach similar to the classical one (PPA) based on the idea stated by Nesterov [14].
Güler′s proximal point algorithm (GPPA) can be summed up as follows.
Algorithm 3.1. (i) Initialize x0 ∈ ℝn : f(x0) < ∞, c0 > 0, A > 0.
Define ν0 : = x0, A0 : = A, k = 0.
(ii) Compute .
(iii) Compute the solution xk+1 by the iterative scheme:
For the convergence analysis, see Güler [1].
Remark 3.2. The GPPA can be seen as a suitable conjugate gradient type modification of the PPA of Rockafellar applied to (2.1).
4. Main Result
4.1. Introduction
The method that we are proposing is a modification of Güler′s new proximal point approach GPPA discussed in Section 3 and can be considered as a nonlinear (or a nonquadratic) version of GPPA with Bregman kernels. In this paper it is shown that this method, which we call BGPPA possesses the strong convergence results obtained by Güler [1] and therefore this new scheme provides faster (global) convergence rates than the classical Bregman proximal point methods (cf. [2, 4–6, 11, 13, 15]). In this paper, we propose the following algorithm generalizing Güler′s proximal point algorithm and summed up as follows.
Algorithm 4.1. (i) Initialize: x0 ∈ ℝn : f(x0) < ∞, c0 > 0, A > 0.
Define ν0 : = x0, A0 : = A, k = 0
(ii) Compute: αk such that .
(iii) Compute the solution xk+1 by the iterative scheme:
In this section we develop convergence results for the generalized Güler′s proximal point algorithm GGPPA presented in Section 4.2. Our analysis is basically based on the following lemma.
Lemma 4.2 ([1, page 654]). One has
Theorem 4.3. For all such that f(x) < ∞, one has the following convergence rate estimate:
Proof. Using the fact that ϕ0(x) : = f(x0) + ADh(x, x0), x0 ∈ 𝒮, and Lemma 4.2, we obtain
Theorem 4.4. Consider the sequence {xk} generated by GGPPA and let x* be a minimizer of f(x) on ℝn. Assume that
- (1)
h is a Bregman function with zone 𝒮 such that ,
- (2)
Im(∇h) = ℝn or Im(∇h) is open,
- (3)
∇h is Lipschitz continuous with coefficient L,
- (a)
for all x0 ∈ Dom (∇h), the sequence {xk} is well defined,
- (b)
the GGPPA possesses this following convergence rate estimate:
() - (c)
f(xk) − f* → 0, when ,
- (d)
()if ck ≥ c > 0.
4.2. Finite Convergence
Note that the finite convergence property was established for the classical proximal point algorithm in the case of sharp minima, see, for example, [16]. Recently, Kiwiel [5] has extended this property to his generalized Bregman proximal method (BPM). In the following theorem we prove that Algorithm 3.1 has this property. Our proof is based on Kiwiel′s one [5, Theorem 6.1 page 1151].
Definition 4.5. A closed proper convex function f : ℝn → ℝ is said to have a sharp minimum on ℝn if and only if there exists τ > 0 such that
Theorem 4.6. Under the same hypothesis as in Theorem 4.4 and by considering GGPPA with f having a sharp minimum on ℝn and ck being bounded, then there exists k such that 0 ∈ ∂f(xk) and xk ∈ X*.
5. Convergence Rate of GGPPA
Theorem 5.1. GGPPA possesses the following convergence rate:
Proof. Let x* be a minimizer of f. For brevity, we denote Wk = f(xk) − f*, Vk = f(yk) − f* and Δ = ∇h(yk) − ∇h(xk+1). At optimality in the unconstrained minimization in GGPPA, we can write
Case 1. If . Then Wk+1 ≤ Vk ≤ Wk and we have . Thus, (5.18) becomes
Case 2. If Then Wk+1 ≤ Wk ≤ Vk and we have therefore; for n ≥ k we have . Thus, using inequality (5.18), we write
Case 3. If . In this case we observe that sequence {f(xk)} is increasing, which may imply a divergence of the approach.
Since f is convex, then the following convergence rate estimate can be derived directly.
Corollary 5.2. If one assumes that ck ≥ c > 0 for all k, then
6. Conclusion
We have introduced an extragradient method to minimize convex problems. The algorithm is based on a generalization of the technique originally proposed by Nesterov [14] and readapted by Güler in [1, 17], where the usual quadratic proximal term was substituted by a class of convex nonquadratic distance-like functions. The new algorithm has a better theoretical convergence rate compared to the available ones. This motivates naturally the study of the numerical efficiency of the new algorithm and its application to solve variational inequality problems [18, 19]. Also, further efforts are needed to consider the given study for nonconvex situations and apply it to solve nonconvex equilibrium problems [20].
Acknowledgments
The authors would like to thank Dr. Osman Güler for providing them with reference [12] and the English translation of the original paper of Nesterov [14].