## On-line version ISSN 1807-0302

### Comput. Appl. Math. vol.31 no.1 São Carlos  2012

#### http://dx.doi.org/10.1590/S1807-03022012000100004

The global convergence of a descent PRP conjugate gradient method

Min LiI, *, **; Heying FengII; Jianguo LiuI

IDepartment of Mathematics and Applied Mathematics, Huaihua University,Huaihua, Hunan 418008, P.R. China E-mail: liminmathmatic@hotmail.com
IISchool of Energy and Power Engineering, Huazhong University of Science and Technology Wuhan, Hubei 430074, P.R. China

ABSTRACT

Recently, Yu and Guan proposed a modified PRP method (called DPRP method) which can generate sufficient descent directions for the objective function. They established the global convergence of the DPRP method based on the assumption that stepsize is bounded away from zero. In this paper, without the requirement of the positive lower bound of the stepsize, we prove that the DPRP method is globally convergent with a modified strong Wolfe line search. Moreover, we establish the global convergence of the DPRP method with a Armijo-type line search. The numerical results show that the proposed algorithms are efficient.

Mathematical subject classification: Primary: 90C30; Secondary: 65K05.

Key words: PRP method, sufficient descent property, modified strong Wolfe line search, Armijo-type line search, global convergence.

1 Introduction

In this paper, we consider the unconstrained problem

where f : Rn R is continuously differentiable. Nonlinear conjugate gradient methods are efficient for problem (1.1). The nonlinear conjugate gradient methods generate iterates by letting

with

where αk is the steplength, gk = g(xk) denote the gradient of f at xk, and βk is a suitable scalar.

Well-known nonlinear conjugate method include Fletcher-Reeves (FR), Polak-Ribière-Polyak (PRP), Hestenes-Stiefel (HS), Conjugate-Descent (CD), Liu-Story (LS) and Dai-Yuan (DY) [1-7]. We are particularly interested in the PRP method in which the parameter βk is defined by

Here and throughout, || · || stands the Euclidean norm of vector and yk - 1 = gk - gk - 1. The PRP method has been regarded as one of the most efficient conjugate gradient methods in practical computation and studied extensively.

Now let us simply introduce some results on the PRP method. The global convergence of the PRP method with exact line search has been proved by Polak and Ribière [3] under strong convexity assumption of f. However, Powell constructed an example which showed that the PRP method with exact line searches can cycle infinitely without approaching a solution point [8]. Powell also gave a suggestion to restrict βk = max{, 0} to ensure the convergence. Based on Powell's work, Gilbert and Nocedal [9] conducted an elegant analysis and showed the PRP method is globally convergent if is restricted to be nonnegative and the steplength satisfies the sufficient descent condition dk < -c||gk||2 in each iteration. In [10], Zhang, Zhou and Li proposed a three term PRP method. They calculate dk by

where θk = . It is easy to see from (1.4) that gk = -||gk||2. Zhang, Zhou and Li proved that this three term PRP method convergent globally with a Armijo-type line search.

In [11], Yu and Guan proposed a modified PRP conjugate gradient formula:

where t > 1/4 is a constant. In this paper we simply call it DPRP method. This modified formula is similar to the well-known CG_DESCENT method which was proposed by Hager and Zhang in [13]. Yu and Guan has proved that the directions dk generated by the DPRP method can always satisfied the following sufficient descent condition

In order to obtain the global convergence of the algorithm, Yu and Guan [11] considered the following assumption: there exists a positive constant α* such that αk > α* holds for all indices k. In fact, under this assumption, if the sufficient descent condition dk < -c||g||2 is satisfied, the conjugate gradient methods with Wolfe line search will be globally convergent.

In this paper, we focus on the global convergence of the DPRP method. InSection 2, we describe the algorithm. The convergence properties of the algorithm are analyzed in Section 3 and 4. In Section 5, we report some numerical results to test the proposed algorithms by using the test problems in the CUTEr [23] library.

2 Algorithm

From the Theorem 1 in [11] we can obtain the following useful lemma directly.

Lemma 2.1. Let {dk} be generated by

If dk - 1 0, then (1.6) holds for all τ [, max{0, }].

Proof. The inequality (1.6) holds for τ = from the Theorem 1 in [11]. For any τ (, max{0, }], it is clear that < τ < 0. If dk - 1 > 0, then (1.6) follows from (2.1). If dk - 1 < 0, then we get from (2.1) that

The proof is completed.

Lemma 2.1 shows that the directions generated by (2.1) are sufficient descent directions and this feature is independent of the line search used. Especially,if set

where η > 0 is a constant, it follows from ηk < 0 that

So, the directions generated by (2.2) and (2.3) are descent directions due to Lemma 2.1. The update rule (2.3) to adjust the the lower bound on was originated in [13].

Now we present concrete algorithm as follows.

Algorithm 2.2. (The DPRP method) [11]

Step 0. Given constant ε > 0 and x0 Rn. Set k : = 0;

Step 1. Stop if ||gk|| < ε;

Step 2. Compute dk by (2.2);

Step 3. Determine the steplength αk by a line search;

Step 4. Let xk + 1 = xk + αkdk, if ||gk + 1|| < ε, then stop;

Step 5. Set k := k + 1 and go to Step 2.

It is easy to see that Algorithm 2.1 is well-defined. We analyse the convergence in the next sections. We do not specify the line search to determine the steplength αk. It can be exact or inexact line search. In the next two sections, we devote to the convergence of the Algorithm 2.1 with modified strong Wolfe line search and Armijo-type line search.

3 Convergence of Algorithm 2.1 under modified strong Wolfe line search

In this section, we prove the global convergence of the Algorithm 2.1. We first make the following assumptions.

Assumption 3.1.

(I) The level set

Ω = {x Rn : f(x) < f(x0)}

is bounded.

(II) In some neighborhood N of Ω, function f is continuously differentiable and its gradient is Lipschitz continuous, namely, there exists a constant L > 0 such that

From now on, throughout this paper, we always suppose this assumptionholds. It follows directly from the Assumption 3.1 that there exist two positive constants B and γ1 such that

In the later part of this section, we will devote to the global convergence ofthe Algorithm 2.1 under a modified strong Wolfe line search which determine the steplength αk satisfying the following conditions

where 0 < δ < σ < 1 and M > 0 are constants. The purpose of this modification is to make |dk - 1| < M always hold with the given M. Moreover,we can set M to be large enough such that min{M, σ|dk|} = σ|dk| hold for almost all indices k. At first, we give the following useful lemma whichwas essentially proved by Zoutendijik [19] and Wolfe [20, 21].

Lemma 3.2. Let the conditions in Assumption 3.1 hold, {xk} and {dk} be generated by Algorithm 2.1 with the above line search, then

Proof. We get from the second inequality in (3.3)

From the Lipschtiz condition (3.1), we have

Then

Since f is bounded below and the gradient g(x) is Lipschitz continuous, from the first inequality in (3.3) we have

Then the Zoutendijik condition (3.4) holds from (3.6) and (3.7).

The following lemma is similar to the Lemma 4.1 in [9] and the Lemma 3.1 in [13], which is very useful to prove that the gradients cannot be bounded away from zero.

Lemma 3.3. Let the conditions in Assumption 3.1 hold, {xk} and {dk} be generated by Algorithm 2.1 with the modified strong Wolfe line search. If there exists a constant γ > 0 such that

then

where uk = dk/||dk||.

Proof. Since ||gk|| > γ > 0, it follows from (1.6) that dk 0 for each k.We get from (3.4) and (3.6)

where t > is a constant. Here, we define:

By (2.2), (3.11) and (3.12), we have

Since the uk are unit vectors,

||rk|| = ||ukδkuk – 1|| = ||δkuk – uk–1||.

Since δk > 0, we have

We get from the definition of

Setting c = γ1 + and combing (3.13) with (3.14), we have

Summing (3.15) over k and combing with the relation (3.10), we have

The proof is completed.

Now we state a property for in (2.2), which is called Property(*) proposed by Gilbert and Nocedal in [9].

Property(*). Suppose that Assumption 3.1 and inequality (3.8) hold. We say that the method has Property(*) if there exist constant b > 1 and λ > 0 such that

where sk - 1 = xk - xk - 1 = αk - 1dk - 1.

Lemma 3.4. If the modified strong Wolfe line search is used in Algorithm 2.1, then the scalar satisfies the Property(*).

Proof. By the definition of in (2.2), we have

Under the Assumption 3.1, if (3.8) holds, we get from (3.3) and (1.6)

If sk - 1 < λ, then

(3.17) together with (3.18) implies the conditions in Property(*) hold by setting

The proof is completed.

Let N* denote the set of positive integer set. For λ > 0 and a positive integer Δ, we define the index set:

Let || denote the number of elements in . We have the following Lemma which was essentially proved by Gilbert and Nocedal in [9].

Lemma 3.5. Let xk and dk be generated by Algorithm 2.1 with the modified strong Wolfe line search. If (3.8) holds, then there exists a constant λ > 0 such that for any Δ N* and any index k0, there is an index k > 0 such that

The proof of Lemma 3.5 is similar to that of the Lemma 4.2 in [9], so we omit here. The next theorem, which is modification of Theorem 4.3 in [9], shows that the proposed Algorithm 2.1 is globally convergent.

Theorem 3.6. Let the conditions in Assumption 3.1 hold, and {xk} be generated by Algorithm 2.1 with the modified strong Wolfe line search (3.3), then either ||gk|| = 0 for some k or

Proof. We suppose for contradiction that both

Denote γ = inf{||gk|| : k > 0}. It is clear that

Therefore the conditions of Lemma 3.3 and 3.5 hold. Make use of Lemma 3.4, we can complete the proof in the same way as Theorem 4.3 in [9], so we omit here too.

4 Convergence of the Algorithm 2.1 with a Armijo-type line search

In this section, we will prove the global convergence of the Algorithm 2.1 under an Armijo-type line search which determine

αk = max{ρj, j = 1, 2, ...}

satisfying

where δ1 > 0 and ρ (0, 1).

We get from the definition of Algorithm 2.1 that the function value sequence {f(xk)} is decreasing. And what's more, if f is bounded from below, from (4.1) we have

Under Assumption 3.1, we have the following useful lemma.

Lemma 4.1. Let the conditions in Assumption 3.1 hold, {xk} and {dk} be generated by Algorithm 2.1 with the above Armijo-type line search. If there exists a constant γ > 0 such that

then there exists a constant T > 0 such that

Proof. By the definition of dk and the inequalities (3.1) and (3.2), we have

(4.2) implies that αk||dk||2 0 as k . Then for any constant b (0,1), there exist a index k0 such that

Then

where c = γ1 + is a constant. For any k > k0 we have

We can get (4.4) by setting

Based on Lemma 4.6 we give the next global convergent theorem for Algorithm 2.1 under the above Armijo-type line search.

Theorem 4.2. Let the conditions in Assumption 3.1 hold, {xk} be generatedby Algorithm 2.1 with the above Armijo-type line search, then either ||gk|| = 0 for some k or

Proof. We suppose for the sake of contradiction that ||gk|| 0 for all k > 0 and lim infk ||gk|| 0. Denote γ = inf{||gk|| : k > 0}. It is clear that

At first, we consider the case that lim infk αk > 0. From (4.2) we have lim infk ||dk|| = 0. This together with (1.6) gives lim infk ||gk|| = 0, which contradicts (4.6).

On the other hand, if lim infk αk = 0, then there exists a infinite index set K such that

The Step 3 of the Algorithm 2.1 implies that ρ-1αk does not satisfy (4.1). Namely

By the Lipschitz condition (3.1) and the mean value theorem, there is a ξk [0, 1], such that

This together with (4.8), (4.4) and (1.6) gives

(4.7) and (4.9) imply that lim infk K, k ||gk|| = 0. This also yields contradiction, then the proof is completed.

5 Numerical results

In this section, we do some numerical experiments to test Algorithm 2.1 with the modified strong Wolfe line search, and compare the performance with some existing conjugate gradient methods including the PRP+ method developed by Gilbert and Nocedal [9], the CG_DESCENT method proposed by Hager and Zhang [13], and the MPRP method proposed by Zhang et al., in [10].

The PRP+ code was obtained from Nocedal's web page at

http://www.ece.northwestern.edu.nocedalsoftware.html

and the CG_DESCENT code from Hager's web page at

http://www.math.ufl.edu/~hager/papers/CG. All codes were written inFortran77 and ran on a PC with 2.8 GHZ CPU processor and 2GB RAM memory and Linux (Fedora 10 with GCC 4.3.2) operation system. We stop the iteration if ||gk|| < 10-6 or the total number of iterations is larger than 10000.

We tested all the unconstrained problems in CUTEr library [23] with their default dimensions. By the end of 2010, there were 164 unconstrained optimization problems in the CUTEr library. Moreover, five of them failed to install on our system for the limit storage space.

In order to get relatively better t value in Algorithm 2.1, we tested the problems with t = 0.3, 0.5, 0.7, 0.9, 1.1, 1.3, 1.5. We obtained from the data that Algorithm 2.1 with t = 1.3 performed slightly better than others. So in this section, we only compare the numerical results for Algorithm 2.1 with t = 1.3 with CG_DESCENT, MPRP and PRP+ method. The problems and their dimensions are listed in Table 1. Table 2 list all the numerical results, which include the total number of iterations (Iter), the total number of function evaluations (Nf), the total number of gradient evaluations (Ng), the CPU time (Time) in seconds, respectively. In Table 2, "–" means the method failed and "NaN" means that the cost function generated a "NaN" for the function value.

We used the profiles by Dolan and Moré [24] to compare the performance of those methods. We discarded the data of the problems for which differentmethods converged to different local minimizers. We consider that twomethods converged to the same solution if the optimal costs f1 and f2 satisfythe following conditions

The problems which were discarded by this process were BROYDN7D, CHAINWOO, NONCVXUN and SENSORS.

Figures 1-4 show the performance of the above four methods relative to the CPU time (in second), the total number of iterations, the total number of function evaluations, the total number of gradient evaluations, respectively. Forexample, the performance of the four algorithms relative to CPU time means that, for each method, we plot the fraction P of problems for which the method is within a factor τ of the best time. The left side of the figure gives the percentage of the test problems for which a method is the fastest; the right side gives the percentage of the test problems that were successfully solved by each of the methods. The top curve is the method that solved the most problems in a time that was within a factor τ of the best time. For convenience, we give the meanings of these methods in the figures.

• "CG_DESCENT" stands for the CG_DESCENT method with theapproximate Wolfe line search [13]. Here we use the Fortran77 (version 1.4) code obtained from Hager's web site and use the default parameters there.

• "PRP+" shows the PRP+ method with Wolfe line search proposed in [22].

• "MPRP" means the three term PRP method proposed in [10] with thesame line search as "CG_DESCENT" method.

• "Algorithm 2.1" means Algorithm 2.1 with the modified strong Wolfeline search (3.3). We determine the steplength αk which satisfies the conditions in (3.3) or the following approximate conditions:

where A = σ|ϕ'(0)|, ϕ(α) = f(xk + αdk), the constant є is the estimate to the error of ϕ(0) at iteration xk. The first inequality in (5.1) is obtained by replacing ϕ(α) with its approximation function

in the modified strong Wolfe conditions (3.3). The approximate conditions in (5.1) are modifications to the approximate Wolfe conditions proposed by Hager and Zhang in [13]. We refer to [13] for more details. The Algorithm 2.1 code is a modification of the CG_DESCENT subroutine proposed by Hager and Zhang [13]. We use the default parameter there. Moreover, we set M = 1030 for (3.3) to ensure that M > σ|dk| hold for most indices k. If σ|dk| > M, we determine the steplength αk which satisfies the first inequality in (3.3). The code of Algorithm 2.1 was posted on the web page at: http://blog.sina.com.cn/liminmath.

We can see from Figures 1-4 that the curves "Algorithm 2.1", "CG_DESCENT" and "MPRP" are very close. This means that the performances of Algorithm 2.1, CG_DESCENT and MPRP methods are similar. That is to say, Algorithm 2.1 is efficient for solving unconstrained optimal problems since CG_DESCENT has been regarded as one of the most efficient nonlinear conjugate gradients. We also note that the performances of Algorithm 2.1, CG_DESCENT and MPRP methods are much better than that of the PRP+ method.

6 Conclusion

We have established the convergence of a sufficient descent PRP conjugate gradient method with a modified strong Wolfe line search method and an Armijo-type line search. It was shown from the numerical results that Algorithm 2.1 was efficient for solving the unconstrained problems in CUTEr library, and the numerical performance was similar to that of the wellknown CG_DESCENT and MPRP methods.

Acknowledgements. The authors would like to thank the referees and Prof. DongHui Li for giving us many valuable suggestions and comments, which improves this paper greatly. We are very grateful to Prof. W.W. Hager, H. Zhang and Prof. J. Nocedal for providing their codes, and Zhang, Zhou and Li forthe MPRP method as well.

REFERENCES

[1] M.R. Hestenes and E.L. Stiefel, Methods of conjugate gradients for solving linear systems. J. Research Nat. Bur. Standards., 49 (1952), 409-436.         [ Links ]

[2] R. Fletcher and C. Reeves, Function minimization by conjugate gradients. Comput. J., 7 (1964), 149-154.         [ Links ]

[3] B. Polak and G. Ribière, Note sur la convergence de méthodes de directions conjuguées. Rev. Française Informat. Recherche Opérationnelle, 16 (1969), 35-43.         [ Links ]

[4] B.T. Polyak, The conjugate gradient method in extreme problems. USSR Comp. Math. Math. Phys., 9 (1969), 94-112.         [ Links ]

[5] R. Flectcher Practical Method of Optimization I: Unconstrained Optimization. John Wiley & Sons. New York (1987).         [ Links ]

[6] Y. Liu and C. Storey, Efficient generalized conjugate gradient algorithms, Part 1: Theory. J. Optim. Theory Appl., 69 (1991), 177-182.         [ Links ]

[7] Y.H. Dai and Y. Yuan, A nonlinear conjugate gradient method with a strong global convergence property. SIAM J. Optim., 10 (2000), 177-182.         [ Links ]

[8] M.J.D. Powell, Nonvonvex minimization calculations and the conjugate gradient method, in: Lecture Notes in Mathematics. Springer-Verlag, Berlin (1984).         [ Links ]

[9] J.C. Gilbert and J. Nocedal, Global convergence properties of conjugate gradient methods for optimization. SIAM. J. Optim., 2 (1992), 21-42.         [ Links ]

[10] L. Zhang, W. Zhou and D. Li A descent modified Polak-Ribière-Polyak conjugate gradient method and its global convergence. IMA J. Numer. Anal., 26 (2006), 629-640.         [ Links ]

[11] G.H. Yu and L.T. Guan, Modified PRP Methods with sufficient desent propertyand their convergence properties. Acta Scientiarum Naturalium Universitatis Sunyatseni(Chinese), 45(4) (2006), 11-14.         [ Links ]

[12] Y.H. Dai, Conjugate Gradient Methods with Armijo-type Line Searches. Acta Mathematicae Applicatae Sinica (English Series), 18 (2002), 123-130.         [ Links ]

[13] W.W. Hager and H. Zhang, A new conjugate gradient method with guaranteed descent and an efficient line search. SIAM J. Optim., 16 (2005), 170-192.         [ Links ]

[14] M. Al-Baali, Descent property and global convergence of the Fletcher-Reeves method with inexact line search. IMA J. Numer. Anal., 5 (1985), 121-124.         [ Links ]

[15] Y.H. Dai and Y. Yuan, Convergence properties of the Fletcher-Reeves method. IMA J. Numer. Anal., 16(2) (1996), 155-164.         [ Links ]

[16] Y.H. Dai and Y. Yuan, Convergence Properties of the Conjugate Descent Method. Advances in Mathematics, 25(6) (1996), 552-562.         [ Links ]

[17] L. Zhang, New versions of the Hestenes-Stiefel nolinear conjugate gradient method based on the secant condition for optimization. Computational and Applied Mathematics, 28 (2009), 111-133.         [ Links ]

[18] D. Li and Fukushima, A modified BFGS method and its global convergence in nonconvex minimization. J. Comput. Appl. Math., 129 (2001), 15-35.         [ Links ]

[19] G. Zoutendijk, Nonlinear Programming, Computational Methods, in: Integer and Nonlinear Programming. J. Abadie (Ed.), North-Holland, Amsterdam (1970), 37-86.         [ Links ]

[20] P. Wolfe, Convergence conditions for ascent methods. SIAM Rev., 11 (1969), 226-235.         [ Links ]

[21] P. Wolfe, Convergence conditions for ascent methods. II: Some corrections. SIAM Rev., 13 (1969), 185-188.         [ Links ]

[22] J.J. Moré and D.J. Thuente, Line search algorithms with guaranteed suficient decrease. ACM Trans. Math. Software, 20 (1994), 286-307.         [ Links ]

[23] I. Bongartz, A.R. Conn, N.I.M. Gould and P.L. Toint, CUTE: Constrained and unconstrained testing environments. ACM Trans. Math. Software, 21 (1995), 123-160.         [ Links ]

[24] E.D. Dolan and J.J. Moré, Benchmarking optimization software with performance profiles. Math. Program., 91 (2002), 201-213.         [ Links ]

[25] W.W. Hager and H. Zhang, A survey of nonlinear conjugate gradient methods. Pacific J. Optim., 2 (2006), 35-58.         [ Links ]

[26] D.F. Shanno, Conjugate gradient methods with inexact searchs. Math. Oper. Res., 3 (1978), 244-256.         [ Links ]

[27] L. Zhang, W. Zhou and D. Li, Global convergence of a modified Fletcher-Reeves conjugate gradient method with Armijo-type line search. Numer. Math., 104(2006), 561-572.         [ Links ]

[28] W.W. Hager and H. Zhang, Algorithm 851: CG_DESCENT, A conjugate gradient method with guaranteed descent. ACM T. Math. Software., 32 (2006), 113-137.         [ Links ]