Recently, a worst-case O(1/t) convergence rate was established for the Douglas-Rachford alternating direction method of multipliers (ADMM) in an ergodic sense. The relaxed proximal point algorithm (PPA) is a generalization of the original PPA which includes the Douglas-Rachford ADMM as a special case. In this paper, we provide a simple proof for the same convergence rate of the relaxed PPA in both ergodic and nonergodic senses.

1. Introduction

The finite-dimensional variational inequality (VI), denoted by VI (Ω, F), is to find a vector w^* ∈ Ω such that

()

where Ω is a nonempty closed convex set in ℜⁿ and F is a monotone mapping from ℜⁿ into itself. The solution set, denoted by Ω^* is assumed to be nonempty. We refer to [1–4] for the pivotal roles of VIs in various fields such as economics, transportation, and engineering.

As is well known, proximal point algorithm (PPA), which was presented originally in [5] and mainly developed in [6, 7], is a well-developed approach to solving VI (Ω, F). Let w^k be the current approximation of a solution of (1); then PPA generates the new iterate w^k+1 ∈ Ω by solving the following auxiliary VI:

()

where β is a positive constant. Compared to the monotone VI (1), (2) is easier to handle since it is a strongly monotone VI. In this paper, we focus on the relaxed proximal point algorithm (PPA) proposed by Gol’shtein and Tret’yakov in [8], which combines the PPA step (3a) with a relaxation step (3b) as follows:

()

where γ ∈ (0,2) is a relaxation factor and G is a symmetric positive semidefinite matrix. In particular, γ is called an under-relaxation factor when γ ∈ (0,1) or an over-relaxation factor when γ ∈ (1,2), and the relaxed PPA reduces to the original PPA (2) when γ = 1 and G = (1/β)I. For convenience, we still use the notation

to represent the nonnegative number w^TGw in our analysis.

The Douglas-Rachford alternating direction methods of multipliers (ADMM) scheme proposed by Glowinski and Marrocco in [9] (see also [10]) is a commonplace tool to solve the convex minimization problem with linear constraints and a separable objective function as follows:

()

where

, b ∈ ℜ^m,

, and

are closed convex sets and θ₁:

and θ₂:

are convex smooth functions. The iterative scheme of ADMM for solving (4) at the k-th iteration runs as

()

where H≔hI and h is a positive constant. As shown in [11], ADMM can be regarded as an application of the relaxed PPA with γ = 1 (i.e., the original PPA (2)) and

()

Without further assumption on B, the matrix G defined previously can be guaranteed as a symmetric and positive semidefinite matrix. Recently, He and Yuan in [12] have shown a worst-case O(1/t) convergence rate of the ADMM scheme (5a), (5b), and (5c) in an ergodic sense. You et al. in [13] have proved the same convergence rate of the Lagrangian PPA-based contraction methods with nonsymmetric linear proximal term in an ergodic sense. The purpose of this paper is to establish the O(1/t) convergence rate of the relaxed PPA (3a) and (3b) in both ergodic and nonergodic senses.

2. Preliminaries

In this section, we review some preliminaries which are useful for further discussions. More specially, we recall a useful characterization on Ω^*, the variational reformulation of (4), the relationship of the ADMM in [9, 10], and the relaxed PPA in [8] for solving this variational reformulation.

First, we provide a useful characterization on Ω^* as Theorem 2.3.5 in [14] and Theorem 2.1 in [12].

Theorem 1. The solution set of VI (Ω, F) is convex, and it can be characterized as

()

Based on Theorem 1,

can be regarded as an ε-approximation solution of VI (Ω, F) if it satisfies

()

where 𝒟⊆Ω is some compact set. As Definition 1 in [15], we can take

()

In the following, we will give a variational reformulation of (4). It is easy to see that the model (4) can be characterized by a variational inequality problem: find w^* = (x^*, y^*, λ^*) ∈ Ω≔𝒳 × 𝒴 × ℜ^m such that

()

where

()

Note that the mapping F is monotone since θ₁ and θ₂ are convex. As shown in [11], the ADMM scheme (5a), (5b), and (5c) is identical with the following iterative scheme in a cyclical sense:

()

Based on the definition (6) of the matrix G, we can rewrite (11a), (11b), (11c), and (12) as a special case of the relaxed PPA with γ = 1 immediately.

Lemma 2. For given w^k, let be generated by the ADMM scheme (11a), (11b), and (11c). Then, one has

()

where F and G are defined by (10b) and (6), respectively.

3. The Contraction of the Relaxed Proximal Point Algorithm

In this section, we prove the contraction of the relaxed PPA. First, we give an important lemma.

Lemma 3. Let the sequences {w^k} and be generated by the relaxed PPA (3a) and (3b), and let G be a symmetric positive semidefinite matrix. Then, one has

()

Proof. First, using (3a), we have

()

Since

(see (3b)), we have

()

Thus, it suffices to show that

()

By setting a = w,

, c = w^k, and d = w^k+1 in the identity

()

we derive that

()

On the other hand, using (3b), we have

()

Combining the last two equations, we obtain (17). The assertion (14) follows immediately. The proof is completed.

With the proved lemma, we are now ready to show the contraction of the relaxed PPA (3a) and (3b).

Theorem 4. Let the sequences {w^k} and be generated by the relaxed PPA (3a) and (3b), and let G be a symmetric positive semidefinite matrix. Then, for any k ≥ 0, one has

()

Proof. Setting w = w^* in (14), we get

()

On the other hand, since F is monotone and w^* ∈ Ω^*, we have

()

It follows from the previous two inequalities that

()

The proof is completed.

4. Ergodic Worst-Case O(1/t) Convergence Rate

In this section, we will establish an ergodic worst-case O(1/t) convergence rate for the relaxed PPA in the sense that after t iterations of such an algorithm, we can find

such that

()

with ε = O(1/t) and

Theorem 5. Let {w^k} and be the sequences generated by the relaxed PPA (3a) and (3b), and let G be a symmetric positive semidefinite matrix. For any integer number t > 0, let

()

Then, one has

and

()

Proof. From (14), we have

()

Since F is monotone, from the previous inequality, we have

()

Summing the inequality (29) over k = 0,1, …, t, we obtain

()

Since

is a convex combination of

and thus

. Using the notation of

, we derive

()

The assertion (27) follows from the previous inequality immediately.

It follows from Theorem 4 that the sequence

is bounded. According to (21), the sequence

is also bounded. Therefore, there exists a constant D > 0 such that

()

Recall that

is the average of

. Thus, we have

. For any

, we get

()

Thus, for any given ε > 0, after at most t≔⌈((2D + 1) ²/2γε) − 1⌉ iterations, we have

()

which means that

is an approximate solution of VI (Ω, F) with an accuracy of O(1/t). That is, a worst-case O(1/t) convergence rate of the relaxed PPA in an ergodic sense is established.

Note that this convergence rate is in an ergodic sense and is a convex combination of the previous vectors with equal weights. One may ask if we can establish the same convergence rate in a nonergodic sense directly for the sequence {w^k} generated by the relaxed PPA (3a) and (3b), and this is the main purpose of the next section.

5. Nonergodic Worst-Case O(1/t) Convergence Rate

This section shows that the relaxed PPA has a worst-case O(1/t) convergence rate in a nonergodic sense. First, we establish two important inequalities in the following lemmas.

Lemma 6. Let the sequences {w^k} and be generated by the relaxed PPA (3a) and (3b), and let G be a symmetric positive semidefinite matrix. Then, one has

()

Proof. Setting in (3a), we have

()

Note that (3a) is also true for k≔k + 1, and thus we have

()

Setting

in the previous inequality, we obtain

()

Adding (36) and (38) and using the monotonicity of F, we get (35) immediately.

Lemma 7. Let the sequences {w^k} and be generated by the relaxed PPA (3a) and (3b), and let G be a symmetric positive semidefinite matrix. Then, one has

()

Proof. First, adding the term

()

to the both sides of (35), we get

()

Reordering

in the previous inequality to

, we get

()

Substituting the term

(see (3b)) into the left-hand side of the last inequality, we obtain (39). The proof is completed.

Next, we prove that is monotonically nonincreasing.

Theorem 8. Let the sequences {w^k} and be generated by the relaxed PPA (3a) and (3b), and let G be a symmetric positive semidefinite matrix. Then, one has

()

Proof. Setting and in the identity

()

we obtain

()

Inserting (39) into the first term of the right-hand side of the last equality and using γ ∈ (0,2), we obtain

()

The assertion (43) follows immediately.

With Theorems 4 and 8, we can prove the worst-case O(1/t) convergence rate in a nonergodic sense for the relaxed PPA.

Theorem 9. Let the sequences {w^k} and be generated by the relaxed PPA (3a) and (3b), and let G be a symmetric positive semidefinite matrix. Then, for any integer t ≥ 0, one has

()

Proof. Summing the inequality (21) over k = 0,1, …, t, we obtain

()

According to Theorem 8, the sequence

is monotonically nonincreasing. Therefore, we have

()

The assertion (47) follows from (48) and (49) immediately.

Note that Ω^* is convex and closed (see Theorem 1). Let

. Then, for any given ε > 0, Theorem 9 shows that the relaxed PPA (3a) and (3b) needs at most ⌈d²/(εγ(2 − γ)) − 1⌉ iterations to ensure that

. Recall that

is a solution of VI (Ω, F) if

. In other words, if

, we have

since G is a positive semidefinite matrix. And thus from (3a), it follows that

()

which means that

is a solution of VI (Ω, F) according to (1). A worst-case O(1/t) convergence rate in a nonergodic sense for the relaxed PPA (3a) and (3b) is thus established from Theorem 9.

6. Concluding Remarks

This paper established the worst-case O(1/t) convergence rate in both ergodic and nonergodic senses for the relaxed PPA. Recall that ADMM is a primal application of the relaxed PPA with γ = 1. And thus ADMM also has the same worst-case O(1/t) convergence rate in both ergodic and nonergodic senses.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (Grant no. 11001053), Program for New Century Excellent Talents in University (Grant no. NCET-12-0111), and Natural Science Foundation of Jiangsu Province, China (Grant no. BK2012662).

References

1 Dafermos S. and Nagurney A., Supply and demand equilibration algorithms for a class of market equilibrium problems, Transportation Science. (1989) 23, no. 2, 118–124, https://doi.org/10.1287/trsc.23.2.118.
10.1287/trsc.23.2.118
Web of Science® Google Scholar
2 Ferris M. C. and Pang J. S., Engineering and economic applications of complementarity problems, SIAM Review. (1997) 39, no. 4, 669–713, https://doi.org/10.1137/S0036144595285963, MR1491052, ZBL0891.90158.
10.1137/S0036144595285963
Web of Science® Google Scholar
3 Kinderlehrer D. and Stampacchia G., An Introduction to Variational Inequalities and their Applications, 1980, 88, Academic Press, New York, NY, USA, MR567696.
Google Scholar
4 Nagurney A., Network Economics: A Variational Inequality Approach, 1993, 1, Kluwer Academic, Dordrecht, The Netherlands, https://doi.org/10.1007/978-94-011-2178-1, MR1205777.
10.1007/978-94-011-2178-1
Google Scholar
5 Martinet B., Régularisation D′inéquations Variationnelles par Approximations Successives, Revue Française dInformatique et de Recherche Opérationelle. (1970) 4, 154–158, MR0298899, ZBL0215.21103.
Google Scholar
6 Rockafellar R. T., Monotone operators and the proximal point algorithm, SIAM Journal on Control and Optimization. (1976) 14, no. 5, 877–898, MR0410483, https://doi.org/10.1137/0314056, ZBL0358.90053.
10.1137/0314056
Web of Science® Google Scholar
7 Rockafellar R. T., Augmented Lagrangians and applications of the proximal point algorithm in convex programming, Mathematics of Operations Research. (1976) 1, no. 2, 97–116, MR0418919, https://doi.org/10.1287/moor.1.2.97, ZBL0402.90076.
10.1287/moor.1.2.97
Google Scholar
8 Gol′shtein E. G. and Tret′yakov N. V., Modified Lagrangians in convex programming and their generalizations, Mathematical Programming Study. (1979) no. 10, 86–97, MR527059.
Web of Science® Google Scholar
9 Glowinski R. and Marrocco A., Sur l′approximation, par éléments finis d′ordre un, et la résolution, par pénalisation-dualité, d′une classe de problèmes de Dirichlet non linéaires, RAIRO. (1975) 9, no. R-2, 41–76, MR0388811, ZBL0368.65053.
Web of Science® Google Scholar
10 Gabay D. and Mercier B., A dual algorithm for the solution of nonlinear variational problems via finite-element approximations, Computers & Mathematics with Applications. (1976) 2, no. 1, 17–40, https://doi.org/10.1016/0898-1221(76)90003-1.
10.1016/0898-1221(76)90003-1
Google Scholar
11 Cai X. J., Gu G. Y., He B. S., and Yuan X. M., A Proximal Point Algorithm Revisit on the Alternating Direction Method of Multipliers, Science China Mathematics.
Google Scholar
12 He B. and Yuan X., On the O(1/n) convergence rate of the Douglas-Rachford alternating direction method, SIAM Journal on Numerical Analysis. (2012) 50, no. 2, 700–709, https://doi.org/10.1137/110836936, MR2914282, ZBL1245.90084.
10.1137/110836936
Web of Science® Google Scholar
13 You Y. F., Fu X. L., and He B. S., Lagrangian PPA-Based Contraction Methods for Linearly Constrained Convex Optimization, 2013, manuscript.
Google Scholar
14 Facchinei F. and Pang J. S., Finite-Dimensional Variational Inequalities and Complementarity Problems, Springer Series in Operations Research, 2003, Springer, New York, NY, USA.
Google Scholar
15 Nesterov Yu., Gradient methods for minimizing composite functions, Mathematical Programming. (2012) https://doi.org/10.1007/s10107-012-0629-5.
10.1007/s10107-012-0629-5
Web of Science® Google Scholar

Citing Literature

All articles

Convergence Analysis of the Relaxed Proximal Point Algorithm

Abstract

1. Introduction

2. Preliminaries

3. The Contraction of the Relaxed Proximal Point Algorithm

4. Ergodic Worst-Case O(1/t) Convergence Rate

5. Nonergodic Worst-Case O(1/t) Convergence Rate

6. Concluding Remarks

Acknowledgments

References

Citing Literature

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

Convergence Analysis of the Relaxed Proximal Point Algorithm

Abstract

1. Introduction

2. Preliminaries

3. The Contraction of the Relaxed Proximal Point Algorithm

4. Ergodic Worst-Case O(1/t) Convergence Rate

5. Nonergodic Worst-Case O(1/t) Convergence Rate

6. Concluding Remarks

Acknowledgments

References

Citing Literature

References

Related

Information