Volume 2013, Issue 1 532041

Research Article

Open Access

Scaled Diagonal Gradient-Type Method with Extra Update for Large-Scale Unconstrained Optimization

Mahboubeh Farid,

Corresponding Author

Mahboubeh Farid

[email protected]

Department of Mathematics, University Putra Malaysia, 43400 Serdang, Selangor, Malaysia upm.edu.my

Search for more papers by this author

Wah June Leong,

Wah June Leong

Department of Mathematics, University Putra Malaysia, 43400 Serdang, Selangor, Malaysia upm.edu.my

Search for more papers by this author

Najmeh Malekmohammadi,

Najmeh Malekmohammadi

Department of Mathematics, Islamic Azad University, South Tehran Branch, Tehran 1418765663, Iran azad.ac.ir

Search for more papers by this author

Mustafa Mamat,

Mustafa Mamat

Department of Mathematics, Faculty of Science and Technology, University Malaysia Terengganu, 21030 Kuala Terengganu, Malaysia umt.edu.my

Search for more papers by this author

Mahboubeh Farid,

Corresponding Author

Mahboubeh Farid

[email protected]

Department of Mathematics, University Putra Malaysia, 43400 Serdang, Selangor, Malaysia upm.edu.my

Search for more papers by this author

Wah June Leong,

Wah June Leong

Department of Mathematics, University Putra Malaysia, 43400 Serdang, Selangor, Malaysia upm.edu.my

Search for more papers by this author

Najmeh Malekmohammadi,

Najmeh Malekmohammadi

Department of Mathematics, Islamic Azad University, South Tehran Branch, Tehran 1418765663, Iran azad.ac.ir

Search for more papers by this author

Mustafa Mamat,

Mustafa Mamat

Department of Mathematics, Faculty of Science and Technology, University Malaysia Terengganu, 21030 Kuala Terengganu, Malaysia umt.edu.my

Search for more papers by this author

First published: 27 March 2013

https://doi.org/10.1155/2013/532041

Citations: 2

Academic Editor: Guanglu Zhou

Share a link

Email
Wechat
Bluesky

Abstract

We present a new gradient method that uses scaling and extra updating within the diagonal updating for solving unconstrained optimization problem. The new method is in the frame of Barzilai and Borwein (BB) method, except that the Hessian matrix is approximated by a diagonal matrix rather than the multiple of identity matrix in the BB method. The main idea is to design a new diagonal updating scheme that incorporates scaling to instantly reduce the large eigenvalues of diagonal approximation and otherwise employs extra updates to increase small eigenvalues. These approaches give us a rapid control in the eigenvalues of the updating matrix and thus improve stepwise convergence. We show that our method is globally convergent. The effectiveness of the method is evaluated by means of numerical comparison with the BB method and its variant.

1. Introduction

In this paper, we consider the unconstrained optimization problem

()

where f(x) is a continuously differentiable function from Rⁿ to R. Given a starting point x₀, using notations g_k = g(x_k) = ∇f(x_k) and B_k as an approximation to the Hessian G_k = [∇²f(x_k)], the quasi-Newton-based methods for solving (1) are defined by the iteration

()

where the stepsize α_k is determined through an appropriate selection. The updating matrix B_k is usually required to satisfy the quasi-Newton equation

()

where s_k−1 = x_k − x_k−1 and y_k−1 = g_k − g_k−1. One of the widely used quasi-Newton method to solve general nonlinear minimization is the BFGS method, which uses the following updating formula:

()

On the numerical aspect, this method supersedes most of the optimization methods; however, it needs O(n²) storage which makes it unsuitable for large-scale problems.

On the other hand, an ingenious stepsizes selection for gradient method was proposed by Barzilai and Borwein [1] in which the updating scheme is defined by

()

where D_k = (1/α_k)I and

Since that, the study of new effective methods in the frame of BB-like gradient methods becomes an interesting research topic for a wide range of mathematical programming; for example, see [2–10]. However, it is well known that BB method cannot guarantee a descent in the objective function at each iteration and the extent of the nonmonotonicity depends in some way on the size of the condition number of objective function [11]. Therefore, the performance of BB method is greatly influenced by the condition of the problem (particularly, condition number of the Hessian matrix). Some new fixed stepsizes gradient-type methods of BB kind are proposed by [12–16] to overcome these difficulties. In contrast with the BB approach in which the stepsize is computed by means of a simple approximation of the Hessian in the form of scalar multiple of identity, these proposed methods consider approximation of the Hessian and its inverse in diagonal matrix form based on the weak secant equation and quasi-cauchy relation, respectively (for more details see [15, 16]). Though these diagonal updating methods are efficient, their performance can be greatly affected by solving ill-conditioned problems. Thus, there is room for improve on the quality of the diagonal updates formulation. Since methods as described in [15, 16] have useful theoretical and numerical properties, it is desirable to derive a new and more efficient updating frame for general functions. Therefore our aim is to improve the quality of diagonal updating when it is poor in approximating Hessian.

This paper is organized as follows. In the next section, we describe our motivation and propose our new-gradient type method. The global convergence of the method under mild assumption will be established in Section 3. Numerical evidence of the vast improvements due to the new approach is given in Section 4. Finally, conclusion is made in the last section.

2. Scaling and Extra Updating

Assume that B_k is positive definite, and let {y_k} and {s_k} be two sequences of n-vectors such that

for all k. Because it is usually difficult to satisfy the quasi-Newton equation (3) with a nonsingular B_k+1 of the diagonal form, one can consider satisfying it in some directions. If we project the quasi-Newton equation (3) (also called the secant equation), in a direction υ such that

, then it gives

()

If υ = s_k is chosen, it leads to the so-called weak-secant relation,

()

Under this weak-secant equation, [15, 16] employ variational technique to derive updating matrix that approximates the Hessian matrix diagonally. The resulting update is derived to be the solution of the following variational problem:

()

and gives the corresponding solution B_k+1 as follows:

()

where

, s_k,i is the ith component of the vector s_k, and tr denotes the trace operator.

Note that when , the resulting B_k+1 is not necessarily positive definite and it is not appropriate for use within a quasi-Newton-based algorithm. Thus, it is desirable to propose a technique to measure the quality of B_k in terms of its Rayleigh quotient and try to find a way to improve “poor” quality B_k before calculating B_k+1. For this purpose, it will be useful to propose, at first quality a criterion to distinguish between poor, and acceptable quality of B_k.

Let us begin by considering the curvature of an objective function, f in direction s_k, which is represented by

()

where

is the average Hessian matrix along s_k. Since it is not practical to compute the eigenvalue of the Hessian matrix in each iteration, we can estimate its relative size on the basis of the scalar

()

If ρ_k > 1, it implies that the eigenvalues of B_k approximated by its Rayleigh are relatively small compared to those of the local Hessian matrix at x_k. In this condition, we find that the strategy of extra update [17] seems to be useful for improving the quality of B_k by rapidly increasing its eigenvalues up to those of the actual Hessian relatively. This is done by updating B_k twice to obtain

()

and use it to obtain, finally, the updated B_k+1:

()

On the other hand, when ρ_k < 1, it implies that the eigenvalue of B_k represented by its Rayleigh is relatively large and we have

. In this case, we should suggest a useful strategy to encounter this drawback. As we reviewed before, the updating scheme may generate nonpositive definite B_k+1 when B_k has large eigenvalues relative to those possible values of

, that is, when

. On the contrary, this argument disappears when the eigenvalues of B_k are small (i.e.,

). This suggests that the scaling should be made to scale down B_k, that is, choosing ρ_k < 1 only when

and take ρ_k = 1, whenever

. Combining these two arguments, we choose the scaling parameter ρ_k such that

()

This scaling resembles the Al-Baali [18] scaling that is applied within the Broyden family. Because the value of γ_k is always <1, then by incorporating the scaling to B_k, it decreases the large eigenvalues of B_k constantly, and consequently we can keep positive definiteness of B_k+1 (since

), which is an important property in descent method. In this case, the following updating:

()

will be used. To this end, we have the following general updating scheme for B_k+1:

()

where

and γ_k are given by (13) and (15), respectively.

An advantage of using (17) is that the positive definiteness of B_k+1 can be guaranteed in all iterations. This property is not exhibited in the other diagonal updating formula such as those in [15, 16]. Note that there is no extra storage required to impose our strategy and the cost of computing is also not increased significantly throughout the entire iteration. Now we can state the steps of our new diagonal-gradient method algorithm with the safeguarding strategy for monotonicity as follows.

2.1. ESDG Algorithm

Step 1. Choose an initial point x₀ ∈ Rⁿ and a positive definite matrix B₀ = I. Let θ ∈ (1,2). Set k : = 0.

Step 2. Compute g_k. If ∥g_k∥ ≤ ϵ, stop.

Step 3. If k = 0, set x₁ = x₀ − g₀/∥g₀∥.

Step 4. Compute , and calculate α_k > 0 such that the following condition holds: where and σ ∈ (0,1) is a given constant.

Step 5. If k ≥ 1, let and compute ρ_k and γ_k by (11) and (15), respectively. If ρ_k < θ then update B_k+1 by (16).

Step 6. If ρ_k ≥ θ then compute and by (12), (13), respectively, and then update as defined B_k+1 (14).

Step 7. Set k : = k + 1, and return to Step 2.

In Step 4, we employ the nonmonotone line search of [19, 20] to ensure the convergence of the algorithm. However, some other line search strategies may also be used.

3. Convergence Analysis

This section is devoted to study the convergence behavior of ESDG method. We will establish the convergence of the ESDG algorithm when applied to the minimization of a strictly convex function. To begin, we give the convergence result, which is due to Grippo et al. [21] for the step generated by the nonmonotone line search algorithm. Here and elsewhere, ∥·∥ denotes the Euclidean norm.

Theorem 1. Assume that f is a strictly convex function and its gradient g satisfies the Lipschitz condition. Suppose that the nonmonotone line search algorithm is employed in a case that the steplength, α_k, satisfies

()

where

, with m ≤ k and σ ∈ (0,1), and the search direction d_k is chosen to obey the following conditions. There exist positive constants c₁ and c₂ such that

()

for all sufficiently large k. Then the iterates x_k generated by the nonmonotone line search algorithm have the property that

()

To prove that the ESDG algorithm is globally convergent, it is sufficient to show that the sequence {∥B_k∥} generated by (17) is bounded both above and below, for all finite k so that its associated search direction satisfies condition (19). Since B_k is diagonal, it is enough to show that each element of B_k, say , i = 1, …, n, is bounded above and below by some positive constants. The following theorem gives the boundedness of {∥B_k∥}.

Theorem 2. Assume that f is strictly convex function where there exist positive constants m and M such that

()

for all x, z ∈ Rⁿ. Let {∥B_k∥} be a sequence generated by the ESDG method. Then ∥B_k∥ is bounded above and below for all finite k, by some positive constants.

Proof. Let be the ith element of B_k. Suppose that B₀ is chosen such that where ω₁, ω₂ are some positive constants. It follows from (17) and the definition of γ in (15) that we have

()

where

()

Moreover, by (21) and (11), we obtain

()

Case 1. When ρ₀ ≤ 1: by (24), one can obtain

()

Thus, it implies that

Case 2. When ρ₀ > 1: from (3), we have

()

Because ρ₀ > 1 also implies that

, using this fact and (24) give

()

Let s_0,M be the largest component in magnitude of s₀, that is,

. Then it follows that

, and (27) becomes

()

Using (28) and the same argument as previously mentioned, we can also show that

()

Hence in both cases,

is bounded above and below, by some positive constants. Since the upper and lower bounds for

are independent of k, respectively, we can proceed by using induction to show that

is bounded, for all finite k.

4. Numerical Results

In this section we present the results of numerical investigation for ESDG method on different test problems. We also compare the performance of our new method with that of the BB method and that of MDGRAD method which is implemented using SMDQN of [22] with a same nonmonotone strategy as the ESDG method. Our experiments are performed on a set of 20 nonlinear unconstrained problems with dimensions ranging from 10 to 10⁴ (Table 1).

Table 1. Test problem and its dimension.

Problem	References
Extended Freudenstein and Roth, Extended Trigonometric,
Broyden Tridiagonal, Extended Beale, Generalized Rosenbrock,	Moré et al. [24]
Extended Tridiagonal 2, Extended Himmelblau, Raydan 2, EG2,
Extended Three Exponential Terms, Raydan 1, Generalized PSC1,
Quadratic QF2, Generalized Tridiagonal 1, Perturbed Quadratic,
Diagonal 2, Diagonal 3, Diagonal 5, Almost perturbed Quadratic,
Hager, diagonal 4	Andrei [23]

These test problems are taken from [23, 24]. The codes are developed with Matlab 7.0. All runs are performed on a PC with Core Duo CPU. For each test, the termination condition is ∥g(x_k)∥ ≤ 10⁻⁴. The maximum number of iterations is set to 1000.

Figures 1 and 2 show the efficiency of ESDG method when compared to MDGRAD and BB methods. Note that ESDG method increases the efficiency of Hessian approximation devoid of increasing the number of storages. Figure 2 also shows the implementation of the ESDG method with BB and MDGRAD methods using the CPU time as a measure. This figure shows that ESDG method is again faster than MDGRAD method in most problems and requires reasonable time to solve large-scale problems when compares to the BB method. Finally, we can conclude that our experimental comparisons indicate that our extension is very beneficial to the performance.

Details are in the caption following the image — **Figure 1**
Open in figure viewer PowerPoint

Performance profile based on iterations.

5. Conclusion

We have presented a new diagonal gradient method for unconstrained optimization. Numerical study of the proposed method when compared with BB and MDGRAD methods is also performed. Based on our numerical experiments, we can conclude that ESDG method is significantly preferable compared to the BB and MDGRAD methods. Particularly, the ESDG method is proven to be a good option for large-scale problems when high-memory locations are required. In view of the remarkable performance of ESDG method, globally converged and with only O(n) storage, we can expect that our proposed method would be useful for unconstrained large-scale optimization problems.

References

1 Barzilai J. and Borwein J. M., Two-point step size gradient methods, IMA Journal of Numerical Analysis. (1988) 8, no. 1, 141–148, https://doi.org/10.1093/imanum/8.1.141, MR967848, ZBL0638.65055.
10.1093/imanum/8.1.141
Web of Science® Google Scholar
2 Birgin E. G., Martínez J. M., and Raydan M., Nonmonotone spectral projected gradient methods on convex sets, SIAM Journal on Optimization. (2000) 10, no. 4, 1196–1211, https://doi.org/10.1137/S1052623497330963, MR1777088, ZBL1047.90077.
10.1137/S1052623497330963
Web of Science® Google Scholar
3 Dai Y.-H. and Fletcher R., Projected Barzilai-Borwein methods for large-scale box-constrained quadratic programming, Numerische Mathematik. (2005) 100, no. 1, 21–47, https://doi.org/10.1007/s00211-004-0569-y, MR2129700, ZBL1068.65073.
10.1007/s00211-004-0569-y
Web of Science® Google Scholar
4 Dai Y.-H. and Liao L.-Z., R-linear convergence of the Barzilai and Borwein gradient method, IMA Journal of Numerical Analysis. (2002) 22, no. 1, 1–10, https://doi.org/10.1093/imanum/22.1.1, MR1880051.
10.1093/imanum/22.1.1
CAS Web of Science® Google Scholar
5 Dai Y.-H., Hager W. W., Schittkowski K., and Zhang H., The cyclic Barzilai-Borwein method for unconstrained optimization, IMA Journal of Numerical Analysis. (2006) 26, no. 3, 604–627, https://doi.org/10.1093/imanum/drl006, MR2241317, ZBL1147.65315.
10.1093/imanum/drl006
Web of Science® Google Scholar
6 Frassoldati G., Zanghirati G., and Zanni L., New adaptive stepsize selections in gradient methods, Journal of Industrial and Management Optimization. (2008) 4, no. 2, 299–312, https://doi.org/10.3934/jimo.2008.4.299, MR2386076, ZBL1161.90524.
10.3934/jimo.2008.4.299
Web of Science® Google Scholar
7 Raydan M., On the Barzilai and Borwein choice of steplength for the gradient method, IMA Journal of Numerical Analysis. (1993) 13, no. 3, 321–326, https://doi.org/10.1093/imanum/13.3.321, MR1225468, ZBL0778.65045.
10.1093/imanum/13.3.321
Web of Science® Google Scholar
8 Raydan M., The Barzilai and Borwein gradient method for the large scale unconstrained minimization problem, SIAM Journal on Optimization. (1997) 7, no. 1, 26–33, https://doi.org/10.1137/S1052623494266365, MR1430555, ZBL0898.90119.
10.1137/S1052623494266365
Web of Science® Google Scholar
9 Zhou B., Gao L., and Dai Y., Monotone projected gradient methods for large-scale box-constrained quadratic programming, Science in China. Series A. (2006) 49, no. 5, 688–702, https://doi.org/10.1007/s11425-006-0688-2, MR2250898, ZBL1112.90056.
10.1007/s11425-006-0688-2
Google Scholar
10 Zhou B., Gao L., and Dai Y.-H., Gradient methods with adaptive step-sizes, Computational Optimization and Applications. (2006) 35, no. 1, 69–86, https://doi.org/10.1007/s10589-006-6446-0, MR2251280, ZBL1121.90099.
10.1007/s10589-006-6446-0
Web of Science® Google Scholar
11 Fletcher R., On the Barzilai-Borwein method, 2001, no. NA/207, Department of Mathematics, University of Dundee, Scotland, UK.
Google Scholar
12 Farid M., Leong W. J., and Hassan M. A., A new two-step gradient-type method for large-scale unconstrained optimization, Computers & Mathematics with Applications. (2010) 59, no. 10, 3301–3307, https://doi.org/10.1016/j.camwa.2010.03.014, MR2651868, ZBL1198.90395.
10.1016/j.camwa.2010.03.014
Web of Science® Google Scholar
13 Farid M. and Leong W. J., An improved multi-step gradient-type method for large scale optimization, Computers & Mathematics with Applications. (2011) 61, no. 11, 3312–3318, https://doi.org/10.1016/j.camwa.2011.04.030, MR2801996, ZBL1222.90044.
10.1016/j.camwa.2011.04.030
Web of Science® Google Scholar
14 Farid M., Leong W. J., and Zheng L., Accumulative approach in multistep diagonal gradient-type method for large-scale unconstrained optimization, Journal of Applied Mathematics. (2012) 2012, 11, 875494, MR2948132, ZBL1254.90226, https://doi.org/10.1155/2012/875494.
10.1155/2012/875494
Google Scholar
15 Hassan M. A., Leong W. J., and Farid M., A new gradient method via quasi-Cauchy relation which guarantees descent, Journal of Computational and Applied Mathematics. (2009) 230, no. 1, 300–305, https://doi.org/10.1016/j.cam.2008.11.013, MR2532311, ZBL1179.65067.
10.1016/j.cam.2008.11.013
Web of Science® Google Scholar
16 Leong W. J., Hassan M. A., and Farid M., A monotone gradient method via weak secant equation for unconstrained optimization, Taiwanese Journal of Mathematics. (2010) 14, no. 2, 413–423, MR2655778, ZBL1203.90148.
10.11650/twjm/1500405798
Web of Science® Google Scholar
17 Al-Baali M., Extra updates for the BFGS method, Optimization Methods and Software. (2000) 13, no. 3, 159–179, https://doi.org/10.1080/10556780008805781, MR1785194, ZBL0957.90115.
10.1080/10556780008805781
Web of Science® Google Scholar
18 Al-Baali M., Numerical experience with a class of self-scaling quasi-Newton algorithms, Journal of Optimization Theory and Applications. (1998) 96, no. 3, 533–553, https://doi.org/10.1023/A:1022608410710, MR1611588.
10.1023/A:1022608410710
Web of Science® Google Scholar
19 Birgin E. G., Martínez J. M., and Raydan M., Inexact spectral projected gradient methods on convex sets, IMA Journal of Numerical Analysis. (2003) 23, no. 4, 539–559, https://doi.org/10.1093/imanum/23.4.539, MR2011339, ZBL1047.65042.
10.1093/imanum/23.4.539
Web of Science® Google Scholar
20 Birgin E. G., Martinez J. M., and Raydan M., Nonmonotone spectral projected gradient methods on convex, Encyclopedia of Optimization. (2009) 3652–3659.
Google Scholar
21 Grippo L., Lampariello F., and Lucidi S., A nonmonotone line search technique for Newton′s method, SIAM Journal on Numerical Analysis. (1986) 23, no. 4, 707–716, https://doi.org/10.1137/0723046, MR849278, ZBL0616.65067.
10.1137/0723046
Web of Science® Google Scholar
22 Leong W. J., Farid M., and Hassan M. A., Scaling on diagonal quasi-Newton update for large-scale unconstrained optimization, Bulletin of the Malaysian Mathematical Sciences Society. (2012) 35, no. 2, 247–256, MR2893452, ZBL1246.65092.
Web of Science® Google Scholar
23 Andrei N., An unconstrained optimization test functions collection, Advanced Modeling and Optimization. (2008) 10, no. 1, 147–161, MR2424936, ZBL1161.90486.
Google Scholar
24 Moré J. J., Garbow B. S., and Hillstrom K. E., Testing unconstrained optimization software, ACM Transactions on Mathematical Software. (1981) 7, no. 1, 17–41, https://doi.org/10.1145/355934.355936, MR607350, ZBL0454.65049.
10.1145/355934.355936
Web of Science® Google Scholar

Citing Literature

All articles

Scaled Diagonal Gradient-Type Method with Extra Update for Large-Scale Unconstrained Optimization

Abstract

1. Introduction

2. Scaling and Extra Updating

2.1. ESDG Algorithm

3. Convergence Analysis

4. Numerical Results

5. Conclusion

References

Citing Literature

Figures

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

Scaled Diagonal Gradient-Type Method with Extra Update for Large-Scale Unconstrained Optimization

Abstract

1. Introduction

2. Scaling and Extra Updating

2.1. ESDG Algorithm

3. Convergence Analysis

4. Numerical Results

5. Conclusion

References

Citing Literature

Figures

References

Related

Information