Volume 2012, Issue 1 875494

Research Article

Open Access

Accumulative Approach in Multistep Diagonal Gradient-Type Method for Large-Scale Unconstrained Optimization

Mahboubeh Farid,

Corresponding Author

Mahboubeh Farid

[email protected]

Department of Mathematics, University Putra Malaysia, Selangor, 43400 Serdang, Malaysia upm.edu.my

Search for more papers by this author

Wah June Leong,

Wah June Leong

Department of Mathematics, University Putra Malaysia, Selangor, 43400 Serdang, Malaysia upm.edu.my

Search for more papers by this author

Lihong Zheng,

Lihong Zheng

School of Computing and Maths, Charles Sturt University, Mitchell, NSW 2795, Australia csu.edu.au

Search for more papers by this author

Mahboubeh Farid,

Corresponding Author

Mahboubeh Farid

[email protected]

Department of Mathematics, University Putra Malaysia, Selangor, 43400 Serdang, Malaysia upm.edu.my

Search for more papers by this author

Wah June Leong,

Wah June Leong

Department of Mathematics, University Putra Malaysia, Selangor, 43400 Serdang, Malaysia upm.edu.my

Search for more papers by this author

Lihong Zheng,

Lihong Zheng

School of Computing and Maths, Charles Sturt University, Mitchell, NSW 2795, Australia csu.edu.au

Search for more papers by this author

First published: 10 July 2012

https://doi.org/10.1155/2012/875494

Citations: 2

Academic Editor: Vu Phat

Share a link

Email
Wechat
Bluesky

Abstract

This paper focuses on developing diagonal gradient-type methods that employ accumulative approach in multistep diagonal updating to determine a better Hessian approximation in each step. The interpolating curve is used to derive a generalization of the weak secant equation, which will carry the information of the local Hessian. The new parameterization of the interpolating curve in variable space is obtained by utilizing accumulative approach via a norm weighting defined by two positive definite weighting matrices. We also note that the storage needed for all computation of the proposed method is just O(n). Numerical results show that the proposed algorithm is efficient and superior by comparison with some other gradient-type methods.

1. Introduction

Consider the unconstrained optimization problem:

()

where f : Rⁿ → R is twice continuously differentiable function. The gradient-type methods for solving (1.1) can be written as

()

where g_k and B_k denote the gradient and the Hessian approximation of f at x_k, respectively. By considering B_k = α_kI, Barzilai and Borwein (BB) [1] give

()

where it is derived by minimizing ∥α_k+1s_k − y_k∥₂ respect to α with s_k = x_k+1 − x_k and y_k = g_k+1 − g_k. Recently, some improved one-step gradient-type methods [2–5] in the frame of BB algorithm were proposed to solve (1.1). It is proposed to let B_k be a diagonal nonsingular approximation to the Hessian and a new approximating matrix B_k+1 to the Hessian is developed based on weak secant equation of Dennis and Wolkowicz [6]

()

In one-step method, data from one previous step is used to revise the current approximation of Hessian. Later Farid and Leong [7, 8] proposed multistep diagonal gradient methods inspired by the multistep quasi-Newton method of Ford [9, 10]. In this multistep framework, a fixed-point approach for interpolating polynomials was derived from data in previous iterations (not only one previous step) [7–10]. General approach of multistep method is based on the measurement of distances in the variable space where the distance of every iterate is measured from one-selected iterate. In this paper, we are interested to develop multistep diagonal updating based on accumulative approach for defining new parameter value of interpolating curve. From this point, the distance is accumulated between consecutive iterates as they are traversed in the natural sequence. For measuring the distance, we need to parameterize the interpolating polynomial through a norm that is defined by a positive definite weighting matrix, say M. Therefore, the performance of the multistep method may be significantly improved by carefully defining the weighting matrix. The rest of paper is organized as follows. In Section 2, we discuss a new multistep diagonal updating scheme based on the accumulative approach. In Section 3, we establish the global convergence of our proposed method. Section 4 presents numerical result and comparisons with BB method and one-step diagonal gradient method are reported. Conclusions are given in Section 5.

2. Derivation of the New Diagonal Updating via Accumulative Approach

This section motivates to state new implicit updates for diagonal gradient-type method through accumulative approach to determining a better Hessian approximation at each iteration. In multistep diagonal updating methods, weak secant equation (1.4) may be generalized by means of interpolating polynomials, instead of employing data just from one previous iteration like in one-step methods. Our aim is to derive efficient strategies for choosing a suitable set of parameters to construct the interpolating curve and investigate the best norm for measurement of the distances required to parameterize the interpolating polynomials. In general, this method obeys the recursive formula of the form

()

where x_k is the kth iteration point, α_k is step length which is determined by a line search, B_k is an approximation to the Hessian in a diagonal form, and g_k is the gradient of f at x_k. Consider a differentiable curve x(τ) in Rⁿ. The derivative of g(x(τ)), at point x(τ^*), can be obtained by applying the chain rule:

()

We are interested to derive a relation that will be satisfied by the approximation of Hessian in diagonal form at x_k+1. If we assume that x(τ) passes through x_k+1 and choose τ^* so that

()

then we have

()

As in this paper, we use two-step method, therefore; we use information of most recent points x_k−1, x_k, x_k+1 and their associated gradients. Consider x(τ) as the interpolating vector polynomial of degree 2:

()

The selection of distinct scalar value τ_j efficiently through the new approach is the main contribution of this paper and will be discussed later in this section. Let h(τ) be the interpolation for approximating the gradient vector:

()

By denoting x(τ₂) = x_k+1 and defining

()

we can obtain our desired relation that will be satisfied by the Hessian approximation at x_k+1 in diagonal form. Corresponding to this two-step approach, weak secant equation will be generalized as follows:

()

Then, B_k+1 can be obtained by using an appropriately modified version of diagonal updating formula in [3] as follows:

()

where

. Now, we attempt to construct an algorithm for finding desired vector r_k and w_k to improve the Hessian approximation. The proposed method is outlined as follows. First, we seek to derive strategies for choosing a suitable set of values τ₀, τ₁, and τ₂. The choice of

is such that to reflect distances between iterates x_k in Rⁿ that are dependent on some metric of the following general form:

()

The establishment on τ_j can be made via the so-called accumulative approach where the accumulating distances (measured by the metric ϕ_M) between consecutive iterates are used to approximate τ_j. This leads to the following definitions (where without loss of generality, we take τ₁ to be origin for value of τ):

()

Then, we can construct the set

as follows:

()

where r_k and w_k are depending on the value of τ. As the set

measures the distances, therefore they need to be parameterized the interpolating polynomials via a norm defined by a positive definite matrix M. It is necessary to choose M with some care, while improving the approximation of Hessian can be strongly influenced via the choice of M. Two choices for the weighting matrix M are considered in this paper. In first choice, if M = I, the ∥·∥_M reduces to the Euclidean norm, and then we obtain the following τ_j values accordingly:

()

The second choice of weighting matrix M is to take M = B_k, where the current B_k is diagonal approximation to the Hessian. By these two means, the measurement of the relevant distances is determined by the properties of the current quadratic approximation (based on B_k) to the objective function:

()

Since B_k is a diagonal matrix, then it is not expensive to compute

at each iteration. The quantity δ is introduced here and defined as follows:

()

and r_k and w_k are given by the following expressions:

()

To safeguard on the possibility of having very small or very large

, we require that the condition

()

is satisfied (we use ɛ₁ = 10⁻⁶ and ɛ₂ = 10⁶). If not, then we replace r_k = s_k and w_k = y_k. More that the Hessian approximation (B_k+1) might not preserve the positive definiteness in each step. One of the fundamental concepts in this paper is to determine an “improved” version of the Hessian approximation to be used even in computing the metric when M = B_k and a weighing matrix as norm should be positive definite. To ensure that the updates remain positive definite, a scaling strategy proposed in [7] is applied. Hence, the new updating formula that incorporates the scaling strategy is given by

()

where

()

This guarantees that the updated Hessian approximation is positive. Finally, the new accumulative MD algorithm is outlined as follows.

2.1. Accumulative MD Algorithm

Step 1. Choose an initial point x₀ ∈ Rⁿ, and a positive definite matrix B₀ = I.

Let k : = 0.

Step 2. Compute g_k. If ∥g_k∥ ≤ ϵ, stop.

Step 3. If k = 0, set x₁ = x₀ − (g₀/∥g₀∥). If k = 1 set r_k = s_k and w_k = y_k go to Step 5.

Step 4. If k ≥ 2 and M = I is considered, compute from (2.13).

Else if M = B_k, compute from (2.14).

Compute δ_k, r_k, w_k and η_k, from (2.15), (2.16), (2.17), and (2.20), respectively.

If , set r_k = s_k and w_k = y_k.

Step 5. Compute and calculate α_k > 0 such that the Armijo [11], condition holds:

, where σ ∈ (0,1) is a given constant.

Step 6. Let , and update B_k+1 by (2.19).

Step 7. Set k : = k + 1, and return to Step 2.

3. Convergence Analysis

This section is devoted to study the convergence of accumulative MD algorithm, when applied to the minimization of a convex function. To begin, we give the following result, which is due to Byrd and Nocedal [12] for the step generated by the Armijo line search algorithm. Here and elsewhere, ∥·∥ denotes the Euclidean norm.

Theorem 3.1. Assume that f is a strictly convex function. Suppose the Armijo line search algorithm is employed in a way that for any d_k with , the step length, α_k satisfies

()

where α_k ∈ [τ, τ^′], 0 < τ < τ^′ and σ ∈ (0,1). Then, there exist positive constants ρ₁ and ρ₂ such that either

()

is satisfied.

We can apply Theorem 3.1 to establish the convergence of some Armijo-type line search methods.

Theorem 3.2. Assume that f is a strictly convex function. Suppose that the Armijo line search algorithm in Theorem 3.1 is employed with d_k chosen to obey the following conditions: there exist positive constants c₁ and c₂ such that

()

for all sufficiently large k. Then, the iterates x_k generated by the line search algorithm have the property that

()

Proof. By (3.4), we have that either (3.2) or (3.6) becomes

()

for some positive constants. Since f is strictly convex, it is also bounded below. Then, (3.1) implies that f(x_k + α_kd_k) − f(x_k) → 0 as k → ∞. This also implies that ∥g_k∥→0 as k → ∞ or at least

()

To prove that the accumulative MD algorithm is globally convergent when applied to the minimization of a convex function, it is sufficient to show that the sequence {∥B_k∥} generated by (2.19)-(2.20) is bounded both above and below, for all finite k so that its associated search direction satisfies condition (3.4). Since B_k is diagonal, it is enough to show that each element of B_k says ; i = 1, …, n is bounded above and below by some positive constants. The following theorem gives the boundedness of {∥B_k∥}.

Theorem 3.3. Assume that f is strictly convex function where there exists positive constants m and M such that

()

for all x, z ∈ Rⁿ. Let {∥B_k∥} be a sequence generated by the accumulative MD method. Then, ∥B_k∥ is bounded above and below for all finite k, by some positive constants.

Proof. Let be the ith element of B_k. Suppose B₀ is chosen such that , where ω₁ and ω₂ are some positive constants.

Case 1. If (2.18) is satisfied, we have

()

By (2.18) and the definition of η_k, one can obtain

()

Thus, if

, then B₁ = η₀B₀ satisfies

()

On the other hand, if

, then

()

where

is the ith component of r₀. Letting

be the largest component (in magnitude) of r₀, that is,

; for all i, then it follows that

, and the property of

, (3.12) becomes

()

Hence,

is bounded above and below, for all i in both occasions.

Case 2. If (2.18) is violated, then the updating formula for B₁ becomes

()

where

is the ith component of s₀,

, and

Because η₀ ≤ 1 also implies that , then this fact, together with the convexity property (3.8), and the definition of η give

()

Using the similar argument as above, that is, by letting

be the largest component (in magnitude) of s₀, then it follows that

()

Hence, in both cases, is bounded above and below, by some positive constants. Since the upper and lower bound for is, respectively, independent to k, we can proceed by using induction to show that is bounded, for all finite k.

4. Numerical Results

In this section, we examine the practical performance of our proposed algorithm in comparison with the BB method and standard one-step diagonal gradient-type method (MD). The new algorithms are referred to as AMD1 and AMD2 when M = I and M = B_k are used, respectively. For all methods we employ Armijo line search [11] where σ = 0.9. All experiments in this paper are implemented on a PC with Core Duo CPU using Matlab 7.0. For each run, the termination condition is that ∥g_k∥ ≤ 10⁻⁴. All attempts to solve the test problems were limited to a maximum of 1000 iterations. The test problems are chosen from Andrei [13] and Moré et al. [14] collections. The detailed test problem is summarized in Table 1. Our experiments are performed on a set of 36 nonlinear unconstrained problems, and the problems vary in size from n = 10 to 10000 variables. Figures 1, 2, and 3 present the Dolan and Moré [15] performance profile for all algorithms subject to the iteration, function call, and CPU time.

Table 1. Test problem and its dimension.

Problem	Dimension	References
Extended Trigonometric, Penalty 1, Penalty 2	10, …, 10000	Moré et al. [14]
Quadratic QF2, Diagonal 4, Diagonal 5, Generalized Tridiagonal 1
Generalized Rosenbrock, Generalized PSC1, Extended Himmelblau
Extended Three Exponential Terms, Extended Block Diagonal BD1
Extended PSC1, Raydan 2, Extended Tridiagonal 2, Extended Powell
Extended Freudenstein and Roth, Extended Rosenbrock	10, …, 10000	Andrei [13]
Extended Beale, Broyden Tridiagonal, Quadratic Diagonal Perturbed	10, …, 1000	Moré et al. [14]
Perturbed Quadratic, Quadratic QF1, Diagonal 1, Diagonal 2, Hager
Diagonal 3, Generalized Tridiagonal 2, Almost perturbed Quadratic
Tridiagonal perturbed quadratic, Full Hessian FH1, Full Hessian FH2
Raydan 1, EG2, Extended White and Holst	10, …, 1000	Andrei [13]

Details are in the caption following the image — **Figure 1**
Open in figure viewer PowerPoint

Performance profile based on Iteration for all problems.

From Figure 1, we see that AMD2 method is the top performer, being more successful than other methods in the number of iteration. Figure 2 shows that AMD2 method requires the fewest function calls. From Figure 3, we observe that the AMD2 method is faster than MD and AMD1 methods and needs reasonable time to solve large-scale problems when compared to the BB method. At each iteration, the proposed method does not require more storage than classic diagonal updating methods. Moreover, a higher-order accuracy in approximating the Hessian matrix of the objective function makes AMD method need less iterations and less function evaluation. The numerical results by the tests reported in Figures 1, 2, and 3 demonstrate clearly the new method AMD2 shows significant improvements, when compared with BB, MD, and AMD1. Generally, M = B_k performs better than M = I. It is most probably due to the fact that B_k is a better Hessian approximation than the identity matrix I.

5. Conclusion

In this paper, we propose a new two-step diagonal gradient method as view of accumulative approach for unconstrained optimization. The new parameterization for multistep diagonal gradient-type method is developed via employing accumulative approach. The new technique is devised for interpolating curves which are the basis of multistep approach. Numerical results show that the proposed method is suitable to solve large-scale unconstrained optimization problems and more stable than other similar methods in practical computation. The improvement that our proposed methods bring does come at a complexity cost of O(n) while others are about O(n²) [9, 10].

References

1 Barzilai J. and Borwein J. M., Two-point step size gradient methods, IMA Journal of Numerical Analysis. (1988) 8, no. 1, 141–148, https://doi.org/10.1093/imanum/8.1.141, 967848, ZBL0638.65055.
10.1093/imanum/8.1.141
Web of Science® Google Scholar
2 Hassan M. A., Leong W. J., and Farid M., A new gradient method via quasi-Cauchy relation which guarantees descent, Journal of Computational and Applied Mathematics. (2009) 230, no. 1, 300–305, https://doi.org/10.1016/j.cam.2008.11.013, 2532311, ZBL1179.65067.
10.1016/j.cam.2008.11.013
Web of Science® Google Scholar
3 Leong W. J., Hassan M. A., and Farid M., A monotone gradient method via weak secant equation for unconstrained optimization, Taiwanese Journal of Mathematics. (2010) 14, no. 2, 413–423, 2655778, ZBL1203.90148.
10.11650/twjm/1500405798
Web of Science® Google Scholar
4 Leong W. J., Farid M., and Hassan M. A., Scaling on diagonal Quasi-Newton update for large scale unconstrained Optimization, Bulletin of the Malaysian Mathematical Sciences Soceity. (2012) 35, no. 2, 247–256.
Web of Science® Google Scholar
5 Waziri M. Y., Leong W. J., Hassan M. A., and Monsi M., A new Newtons method with diagonal jacobian approximation for systems of nonlinear equations, Journal of Mathematics and Statistics. (2010) 6, 246–252, https://doi.org/10.3844/jmssp.2010.246.252.
10.3844/jmssp.2010.246.252
Google Scholar
6 Dennis,J. E.Jr. and Wolkowicz H., Sizing and least-change secant methods, SIAM Journal on Numerical Analysis. (1993) 30, no. 5, 1291–1314, https://doi.org/10.1137/0730067, 1239822, ZBL0802.65081.
10.1137/0730067
Web of Science® Google Scholar
7 Farid M., Leong W. J., and Hassan M. A., A new two-step gradient-type method for large-scale unconstrained optimization, Computers & Mathematics with Applications. (2010) 59, no. 10, 3301–3307, https://doi.org/10.1016/j.camwa.2010.03.014, 2651868, ZBL1198.90395.
10.1016/j.camwa.2010.03.014
Web of Science® Google Scholar
8 Farid M. and Leong W. J., An improved multi-step gradient-type method for large scale optimization, Computers & Mathematics with Applications. (2011) 61, no. 11, 3312–3318, https://doi.org/10.1016/j.camwa.2011.04.030, 2801996, ZBL1222.90044.
10.1016/j.camwa.2011.04.030
Web of Science® Google Scholar
9 Ford J. A. and Moghrabi I. A., Alternating multi-step quasi-Newton methods for unconstrained optimization, Journal of Computational and Applied Mathematics. (1997) 82, no. 1-2, 105–116, https://doi.org/10.1016/S0377%2D0427(97)00075%2D7, 1473534, ZBL0886.65064.
10.1016/S0377-0427(97)00075-7
Web of Science® Google Scholar
10 Ford J. A. and Tharmlikit S., New implicit updates in multi-step quasi-Newton methods for unconstrained optimisation, Journal of Computational and Applied Mathematics. (2003) 152, no. 1-2, 133–146, https://doi.org/10.1016/S0377%2D0427(02)00701%2DX, 1991286.
10.1016/S0377-0427(02)00701-X
Web of Science® Google Scholar
11 Armijo L., Minimization of functions having Lipschitz continuous first partial derivatives, Pacific Journal of Mathematics. (1966) 16, 1–3, 0191071, ZBL0202.46105.
10.2140/pjm.1966.16.1
Web of Science® Google Scholar
12 Byrd R. H. and Nocedal J., A tool for the analysis of quasi-Newton methods with application to unconstrained minimization, SIAM Journal on Numerical Analysis. (1989) 26, no. 3, 727–739, https://doi.org/10.1137/0726042, 997665, ZBL0676.65061.
10.1137/0726042
Web of Science® Google Scholar
13 Andrei N., An unconstrained optimization test functions collection, Advanced Modeling and Optimization. (2008) 10, no. 1, 147–161, 2424936, ZBL1161.90486.
Google Scholar
14 Moré J. J., Garbow B. S., and Hillstrom K. E., Testing unconstrained optimization software, ACM Transactions on Mathematical Software. (1981) 7, no. 1, 17–41, https://doi.org/10.1145/355934.355936, 607350, ZBL0454.65049.
10.1145/355934.355936
Web of Science® Google Scholar
15 Dolan E. D. and Moré J. J., Benchmarking optimization software with performance profiles, Mathematical Programming A. (2002) 91, no. 2, 201–213, https://doi.org/10.1007/s101070100263, 1875515, ZBL1049.90004.
10.1007/s101070100263
Web of Science® Google Scholar

Citing Literature

All articles

Accumulative Approach in Multistep Diagonal Gradient-Type Method for Large-Scale Unconstrained Optimization

Abstract

1. Introduction

2. Derivation of the New Diagonal Updating via Accumulative Approach

2.1. Accumulative MD Algorithm

3. Convergence Analysis

4. Numerical Results

5. Conclusion

References

Citing Literature

Figures

References

Information

About Wiley Online Library

Help & Support

Opportunities

Connect with Wiley

Accumulative Approach in Multistep Diagonal Gradient-Type Method for Large-Scale Unconstrained Optimization

Abstract

1. Introduction

2. Derivation of the New Diagonal Updating via Accumulative Approach

2.1. Accumulative MD Algorithm

3. Convergence Analysis

4. Numerical Results

5. Conclusion

References

Citing Literature

Figures

References

Related

Information