Services on Demand
Article
Indicators
 Cited by SciELO
 Access statistics
Related links
 Similars in SciELO
Share
Computational & Applied Mathematics
Online version ISSN 18070302
Comput. Appl. Math. vol.31 no.1 São Carlos 2012
http://dx.doi.org/10.1590/S180703022012000100004
The global convergence of a descent PRP conjugate gradient method
Min Li^{I,} ^{*}^{,} ^{**}; Heying Feng^{II}; Jianguo Liu^{I}
^{I}Department of Mathematics and Applied Mathematics, Huaihua University,Huaihua, Hunan 418008, P.R. China Email: liminmathmatic@hotmail.com
^{II}School of Energy and Power Engineering, Huazhong University of Science and Technology Wuhan, Hubei 430074, P.R. China
ABSTRACT
Recently, Yu and Guan proposed a modified PRP method (called DPRP method) which can generate sufficient descent directions for the objective function. They established the global convergence of the DPRP method based on the assumption that stepsize is bounded away from zero. In this paper, without the requirement of the positive lower bound of the stepsize, we prove that the DPRP method is globally convergent with a modified strong Wolfe line search. Moreover, we establish the global convergence of the DPRP method with a Armijotype line search. The numerical results show that the proposed algorithms are efficient.
Mathematical subject classification: Primary: 90C30; Secondary: 65K05.
Key words: PRP method, sufficient descent property, modified strong Wolfe line search, Armijotype line search, global convergence.
1 Introduction
In this paper, we consider the unconstrained problem
where f : R^{n }→ R is continuously differentiable. Nonlinear conjugate gradient methods are efficient for problem (1.1). The nonlinear conjugate gradient methods generate iterates by letting
with
where α_{k} is the steplength, g_{k} = g(x_{k}) denote the gradient of f at x_{k}, and β_{k} is a suitable scalar.
Wellknown nonlinear conjugate method include FletcherReeves (FR), PolakRibièrePolyak (PRP), HestenesStiefel (HS), ConjugateDescent (CD), LiuStory (LS) and DaiYuan (DY) [17]. We are particularly interested in the PRP method in which the parameter β_{k} is defined by
Here and throughout,  ·  stands the Euclidean norm of vector and y_{k}_{  1} = g_{k } g_{k }_{ 1}. The PRP method has been regarded as one of the most efficient conjugate gradient methods in practical computation and studied extensively.
Now let us simply introduce some results on the PRP method. The global convergence of the PRP method with exact line search has been proved by Polak and Ribière [3] under strong convexity assumption of f. However, Powell constructed an example which showed that the PRP method with exact line searches can cycle infinitely without approaching a solution point [8]. Powell also gave a suggestion to restrict β_{k} = max{, 0} to ensure the convergence. Based on Powell's work, Gilbert and Nocedal [9] conducted an elegant analysis and showed the PRP method is globally convergent if is restricted to be nonnegative and the steplength satisfies the sufficient descent condition d_{k} < cg_{k}^{2} in each iteration. In [10], Zhang, Zhou and Li proposed a three term PRP method. They calculate d_{k} by
where θ_{k} = . It is easy to see from (1.4) that g_{k} = g_{k}^{2}. Zhang, Zhou and Li proved that this three term PRP method convergent globally with a Armijotype line search.
In [11], Yu and Guan proposed a modified PRP conjugate gradient formula:
where t > 1/4 is a constant. In this paper we simply call it DPRP method. This modified formula is similar to the wellknown CG_DESCENT method which was proposed by Hager and Zhang in [13]. Yu and Guan has proved that the directions d_{k} generated by the DPRP method can always satisfied the following sufficient descent condition
In order to obtain the global convergence of the algorithm, Yu and Guan [11] considered the following assumption: there exists a positive constant α* such that α_{k} > α* holds for all indices k. In fact, under this assumption, if the sufficient descent condition d_{k} < cg^{2} is satisfied, the conjugate gradient methods with Wolfe line search will be globally convergent.
In this paper, we focus on the global convergence of the DPRP method. InSection 2, we describe the algorithm. The convergence properties of the algorithm are analyzed in Section 3 and 4. In Section 5, we report some numerical results to test the proposed algorithms by using the test problems in the CUTEr [23] library.
2 Algorithm
From the Theorem 1 in [11] we can obtain the following useful lemma directly.
Lemma 2.1. Let {d_{k}} be generated by
If d_{k  1} ≠ 0, then (1.6) holds for all τ ∈ [, max{0, }].
Proof. The inequality (1.6) holds for τ = from the Theorem 1 in [11]. For any τ ∈ (, max{0, }], it is clear that < τ < 0. If d_{k}_{  1} > 0, then (1.6) follows from (2.1). If d_{k}_{  1} < 0, then we get from (2.1) that
The proof is completed.
□
Lemma 2.1 shows that the directions generated by (2.1) are sufficient descent directions and this feature is independent of the line search used. Especially,if set
where η > 0 is a constant, it follows from η_{k} < 0 that
So, the directions generated by (2.2) and (2.3) are descent directions due to Lemma 2.1. The update rule (2.3) to adjust the the lower bound on was originated in [13].
Now we present concrete algorithm as follows.
Algorithm 2.2. (The DPRP method) [11]
Step 0. Given constant ε > 0 and x_{0} ∈ R^{n}. Set k : = 0;
Step 1. Stop if g_{k}_{∞} < ε;
Step 2. Compute d_{k} by (2.2);
Step 3. Determine the steplength α_{k} by a line search;
Step 4. Let x_{k}_{ + 1} = x_{k }+ α_{k}d_{k}, if g_{k }_{+ 1}_{∞} < ε, then stop;
Step 5. Set k := k + 1 and go to Step 2.
It is easy to see that Algorithm 2.1 is welldefined. We analyse the convergence in the next sections. We do not specify the line search to determine the steplength α_{k}. It can be exact or inexact line search. In the next two sections, we devote to the convergence of the Algorithm 2.1 with modified strong Wolfe line search and Armijotype line search.
3 Convergence of Algorithm 2.1 under modified strong Wolfe line search
In this section, we prove the global convergence of the Algorithm 2.1. We first make the following assumptions.
Assumption 3.1.
(I) The level set
Ω = {x ∈ R^{n} : f(x) < f(x_{0})}
is bounded.
(II) In some neighborhood N of Ω, function f is continuously differentiable and its gradient is Lipschitz continuous, namely, there exists a constant L > 0 such that
From now on, throughout this paper, we always suppose this assumptionholds. It follows directly from the Assumption 3.1 that there exist two positive constants B and γ_{1} such that
In the later part of this section, we will devote to the global convergence ofthe Algorithm 2.1 under a modified strong Wolfe line search which determine the steplength α_{k} satisfying the following conditions
where 0 < δ < σ < 1 and M > 0 are constants. The purpose of this modification is to make d_{k}_{  1} < M always hold with the given M. Moreover,we can set M to be large enough such that min{M, σd_{k}} = σd_{k} hold for almost all indices k. At first, we give the following useful lemma whichwas essentially proved by Zoutendijik [19] and Wolfe [20, 21].
Lemma 3.2. Let the conditions in Assumption 3.1 hold, {x_{k}} and {d_{k}} be generated by Algorithm 2.1 with the above line search, then
Proof. We get from the second inequality in (3.3)
From the Lipschtiz condition (3.1), we have
Then
Since f is bounded below and the gradient g(x) is Lipschitz continuous, from the first inequality in (3.3) we have
Then the Zoutendijik condition (3.4) holds from (3.6) and (3.7).
□
The following lemma is similar to the Lemma 4.1 in [9] and the Lemma 3.1 in [13], which is very useful to prove that the gradients cannot be bounded away from zero.
Lemma 3.3. Let the conditions in Assumption 3.1 hold, {x_{k}} and {d_{k}} be generated by Algorithm 2.1 with the modified strong Wolfe line search. If there exists a constant γ > 0 such that
then
where u_{k} = d_{k}/d_{k}.
Proof. Since g_{k} > γ > 0, it follows from (1.6) that d_{k} ≠ 0 for each k.We get from (3.4) and (3.6)
where t > is a constant. Here, we define:
By (2.2), (3.11) and (3.12), we have
Since the u_{k} are unit vectors,
rk = u_{k} – δ_{k}u_{k}_{ – 1} = δ_{k}u_{k} – u_{k}_{–1}.
Since δ_{k} > 0, we have
We get from the definition of
Setting c = γ_{1} + and combing (3.13) with (3.14), we have
Summing (3.15) over k and combing with the relation (3.10), we have
The proof is completed.
Now we state a property for in (2.2), which is called Property(*) proposed by Gilbert and Nocedal in [9].
Property(*). Suppose that Assumption 3.1 and inequality (3.8) hold. We say that the method has Property(*) if there exist constant b > 1 and λ > 0 such that
where s_{k }_{ 1} = x_{k } x_{k }_{ 1} = α_{k }_{ 1}d_{k}_{  1}.
Lemma 3.4. If the modified strong Wolfe line search is used in Algorithm 2.1, then the scalar satisfies the Property(*).
Proof. By the definition of in (2.2), we have
Under the Assumption 3.1, if (3.8) holds, we get from (3.3) and (1.6)
If s_{k}_{  1} < λ, then
(3.17) together with (3.18) implies the conditions in Property(*) hold by setting
The proof is completed.
□
Let N* denote the set of positive integer set. For λ > 0 and a positive integer Δ, we define the index set:
Let  denote the number of elements in . We have the following Lemma which was essentially proved by Gilbert and Nocedal in [9].
Lemma 3.5. Let x_{k} and d_{k} be generated by Algorithm 2.1 with the modified strong Wolfe line search. If (3.8) holds, then there exists a constant λ > 0 such that for any Δ ∈ N* and any index k_{0}, there is an index k > 0 such that
The proof of Lemma 3.5 is similar to that of the Lemma 4.2 in [9], so we omit here. The next theorem, which is modification of Theorem 4.3 in [9], shows that the proposed Algorithm 2.1 is globally convergent.
Theorem 3.6. Let the conditions in Assumption 3.1 hold, and {x_{k}} be generated by Algorithm 2.1 with the modified strong Wolfe line search (3.3), then either g_{k} = 0 for some k or
Proof. We suppose for contradiction that both
Denote γ = inf{g_{k} : k > 0}. It is clear that
Therefore the conditions of Lemma 3.3 and 3.5 hold. Make use of Lemma 3.4, we can complete the proof in the same way as Theorem 4.3 in [9], so we omit here too.
4 Convergence of the Algorithm 2.1 with a Armijotype line search
In this section, we will prove the global convergence of the Algorithm 2.1 under an Armijotype line search which determine
α_{k} = max{ρ^{j}, j = 1, 2, ...}
satisfying
where δ_{1} > 0 and ρ ∈ (0, 1).
We get from the definition of Algorithm 2.1 that the function value sequence {f(x_{k})} is decreasing. And what's more, if f is bounded from below, from (4.1) we have
Under Assumption 3.1, we have the following useful lemma.
Lemma 4.1. Let the conditions in Assumption 3.1 hold, {x_{k}} and {d_{k}} be generated by Algorithm 2.1 with the above Armijotype line search. If there exists a constant γ > 0 such that
then there exists a constant T > 0 such that
Proof. By the definition of d_{k} and the inequalities (3.1) and (3.2), we have
(4.2) implies that α_{k}d_{k}^{2 }→ 0 as k → ∞. Then for any constant b ∈ (0,1), there exist a index k_{0} such that
Then
where c = γ_{1 }+ is a constant. For any k > k_{0} we have
We can get (4.4) by setting
□
Based on Lemma 4.6 we give the next global convergent theorem for Algorithm 2.1 under the above Armijotype line search.
Theorem 4.2. Let the conditions in Assumption 3.1 hold, {x_{k}} be generatedby Algorithm 2.1 with the above Armijotype line search, then either g_{k} = 0 for some k or
Proof. We suppose for the sake of contradiction that g_{k} ≠ 0 for all k > 0 and lim inf_{k }_{→} _{∞} g_{k} ≠ 0. Denote γ = inf{g_{k} : k > 0}. It is clear that
At first, we consider the case that lim inf_{k }_{→} _{∞} α_{k} > 0. From (4.2) we have lim inf_{k }_{→} _{∞} d_{k} = 0. This together with (1.6) gives lim inf_{k }_{→} _{∞}g_{k} = 0, which contradicts (4.6).
On the other hand, if lim inf_{k }_{→} _{∞} α_{k} = 0, then there exists a infinite index set K such that
The Step 3 of the Algorithm 2.1 implies that ρ^{1}α_{k} does not satisfy (4.1). Namely
By the Lipschitz condition (3.1) and the mean value theorem, there is a ξ_{k} ∈ [0, 1], such that
This together with (4.8), (4.4) and (1.6) gives
(4.7) and (4.9) imply that lim inf_{k} _{∈} _{K, k }_{→} _{∞} g_{k} = 0. This also yields contradiction, then the proof is completed.
□
5 Numerical results
In this section, we do some numerical experiments to test Algorithm 2.1 with the modified strong Wolfe line search, and compare the performance with some existing conjugate gradient methods including the PRP+ method developed by Gilbert and Nocedal [9], the CG_DESCENT method proposed by Hager and Zhang [13], and the MPRP method proposed by Zhang et al., in [10].
The PRP+ code was obtained from Nocedal's web page at
http://www.ece.northwestern.edu.nocedalsoftware.html
and the CG_DESCENT code from Hager's web page at
http://www.math.ufl.edu/~hager/papers/CG. All codes were written inFortran77 and ran on a PC with 2.8 GHZ CPU processor and 2GB RAM memory and Linux (Fedora 10 with GCC 4.3.2) operation system. We stop the iteration if g_{k}_{∞} < 10^{6} or the total number of iterations is larger than 10000.
We tested all the unconstrained problems in CUTEr library [23] with their default dimensions. By the end of 2010, there were 164 unconstrained optimization problems in the CUTEr library. Moreover, five of them failed to install on our system for the limit storage space.
In order to get relatively better t value in Algorithm 2.1, we tested the problems with t = 0.3, 0.5, 0.7, 0.9, 1.1, 1.3, 1.5. We obtained from the data that Algorithm 2.1 with t = 1.3 performed slightly better than others. So in this section, we only compare the numerical results for Algorithm 2.1 with t = 1.3 with CG_DESCENT, MPRP and PRP+ method. The problems and their dimensions are listed in Table 1. Table 2 list all the numerical results, which include the total number of iterations (Iter), the total number of function evaluations (Nf), the total number of gradient evaluations (Ng), the CPU time (Time) in seconds, respectively. In Table 2, "–" means the method failed and "NaN" means that the cost function generated a "NaN" for the function value.
We used the profiles by Dolan and Moré [24] to compare the performance of those methods. We discarded the data of the problems for which differentmethods converged to different local minimizers. We consider that twomethods converged to the same solution if the optimal costs f_{1} and f_{2} satisfythe following conditions
The problems which were discarded by this process were BROYDN7D, CHAINWOO, NONCVXUN and SENSORS.
Figures 14 show the performance of the above four methods relative to the CPU time (in second), the total number of iterations, the total number of function evaluations, the total number of gradient evaluations, respectively. Forexample, the performance of the four algorithms relative to CPU time means that, for each method, we plot the fraction P of problems for which the method is within a factor τ of the best time. The left side of the figure gives the percentage of the test problems for which a method is the fastest; the right side gives the percentage of the test problems that were successfully solved by each of the methods. The top curve is the method that solved the most problems in a time that was within a factor τ of the best time. For convenience, we give the meanings of these methods in the figures.

"CG_DESCENT" stands for the CG_DESCENT method with theapproximate Wolfe line search [13]. Here we use the Fortran77 (version 1.4) code obtained from Hager's web site and use the default parameters there.

"PRP+" shows the PRP+ method with Wolfe line search proposed in [22].

"MPRP" means the three term PRP method proposed in [10] with thesame line search as "CG_DESCENT" method.

"Algorithm 2.1" means Algorithm 2.1 with the modified strong Wolfeline search (3.3). We determine the steplength α_{k} which satisfies the conditions in (3.3) or the following approximate conditions:
where A = σϕ'(0), ϕ(α) = f(x_{k }+ αd_{k}), the constant є is the estimate to the error of ϕ(0) at iteration x_{k}. The first inequality in (5.1) is obtained by replacing ϕ(α) with its approximation function
in the modified strong Wolfe conditions (3.3). The approximate conditions in (5.1) are modifications to the approximate Wolfe conditions proposed by Hager and Zhang in [13]. We refer to [13] for more details. The Algorithm 2.1 code is a modification of the CG_DESCENT subroutine proposed by Hager and Zhang [13]. We use the default parameter there. Moreover, we set M = 10^{30} for (3.3) to ensure that M > σd_{k} hold for most indices k. If σd_{k} > M, we determine the steplength α_{k} which satisfies the first inequality in (3.3). The code of Algorithm 2.1 was posted on the web page at: http://blog.sina.com.cn/liminmath.
We can see from Figures 14 that the curves "Algorithm 2.1", "CG_DESCENT" and "MPRP" are very close. This means that the performances of Algorithm 2.1, CG_DESCENT and MPRP methods are similar. That is to say, Algorithm 2.1 is efficient for solving unconstrained optimal problems since CG_DESCENT has been regarded as one of the most efficient nonlinear conjugate gradients. We also note that the performances of Algorithm 2.1, CG_DESCENT and MPRP methods are much better than that of the PRP+ method.
6 Conclusion
We have established the convergence of a sufficient descent PRP conjugate gradient method with a modified strong Wolfe line search method and an Armijotype line search. It was shown from the numerical results that Algorithm 2.1 was efficient for solving the unconstrained problems in CUTEr library, and the numerical performance was similar to that of the wellknown CG_DESCENT and MPRP methods.
Acknowledgements. The authors would like to thank the referees and Prof. DongHui Li for giving us many valuable suggestions and comments, which improves this paper greatly. We are very grateful to Prof. W.W. Hager, H. Zhang and Prof. J. Nocedal for providing their codes, and Zhang, Zhou and Li forthe MPRP method as well.
REFERENCES
[1] M.R. Hestenes and E.L. Stiefel, Methods of conjugate gradients for solving linear systems. J. Research Nat. Bur. Standards., 49 (1952), 409436. [ Links ]
[2] R. Fletcher and C. Reeves, Function minimization by conjugate gradients. Comput. J., 7 (1964), 149154. [ Links ]
[3] B. Polak and G. Ribière, Note sur la convergence de méthodes de directions conjuguées. Rev. Française Informat. Recherche Opérationnelle, 16 (1969), 3543. [ Links ]
[4] B.T. Polyak, The conjugate gradient method in extreme problems. USSR Comp. Math. Math. Phys., 9 (1969), 94112. [ Links ]
[5] R. Flectcher Practical Method of Optimization I: Unconstrained Optimization. John Wiley & Sons. New York (1987). [ Links ]
[6] Y. Liu and C. Storey, Efficient generalized conjugate gradient algorithms, Part 1: Theory. J. Optim. Theory Appl., 69 (1991), 177182. [ Links ]
[7] Y.H. Dai and Y. Yuan, A nonlinear conjugate gradient method with a strong global convergence property. SIAM J. Optim., 10 (2000), 177182. [ Links ]
[8] M.J.D. Powell, Nonvonvex minimization calculations and the conjugate gradient method, in: Lecture Notes in Mathematics. SpringerVerlag, Berlin (1984). [ Links ]
[9] J.C. Gilbert and J. Nocedal, Global convergence properties of conjugate gradient methods for optimization. SIAM. J. Optim., 2 (1992), 2142. [ Links ]
[10] L. Zhang, W. Zhou and D. Li A descent modified PolakRibièrePolyak conjugate gradient method and its global convergence. IMA J. Numer. Anal., 26 (2006), 629640. [ Links ]
[11] G.H. Yu and L.T. Guan, Modified PRP Methods with sufficient desent propertyand their convergence properties. Acta Scientiarum Naturalium Universitatis Sunyatseni(Chinese), 45(4) (2006), 1114. [ Links ]
[12] Y.H. Dai, Conjugate Gradient Methods with Armijotype Line Searches. Acta Mathematicae Applicatae Sinica (English Series), 18 (2002), 123130. [ Links ]
[13] W.W. Hager and H. Zhang, A new conjugate gradient method with guaranteed descent and an efficient line search. SIAM J. Optim., 16 (2005), 170192. [ Links ]
[14] M. AlBaali, Descent property and global convergence of the FletcherReeves method with inexact line search. IMA J. Numer. Anal., 5 (1985), 121124. [ Links ]
[15] Y.H. Dai and Y. Yuan, Convergence properties of the FletcherReeves method. IMA J. Numer. Anal., 16(2) (1996), 155164. [ Links ]
[16] Y.H. Dai and Y. Yuan, Convergence Properties of the Conjugate Descent Method. Advances in Mathematics, 25(6) (1996), 552562. [ Links ]
[17] L. Zhang, New versions of the HestenesStiefel nolinear conjugate gradient method based on the secant condition for optimization. Computational and Applied Mathematics, 28 (2009), 111133. [ Links ]
[18] D. Li and Fukushima, A modified BFGS method and its global convergence in nonconvex minimization. J. Comput. Appl. Math., 129 (2001), 1535. [ Links ]
[19] G. Zoutendijk, Nonlinear Programming, Computational Methods, in: Integer and Nonlinear Programming. J. Abadie (Ed.), NorthHolland, Amsterdam (1970), 3786. [ Links ]
[20] P. Wolfe, Convergence conditions for ascent methods. SIAM Rev., 11 (1969), 226235. [ Links ]
[21] P. Wolfe, Convergence conditions for ascent methods. II: Some corrections. SIAM Rev., 13 (1969), 185188. [ Links ]
[22] J.J. Moré and D.J. Thuente, Line search algorithms with guaranteed suficient decrease. ACM Trans. Math. Software, 20 (1994), 286307. [ Links ]
[23] I. Bongartz, A.R. Conn, N.I.M. Gould and P.L. Toint, CUTE: Constrained and unconstrained testing environments. ACM Trans. Math. Software, 21 (1995), 123160. [ Links ]
[24] E.D. Dolan and J.J. Moré, Benchmarking optimization software with performance profiles. Math. Program., 91 (2002), 201213. [ Links ]
[25] W.W. Hager and H. Zhang, A survey of nonlinear conjugate gradient methods. Pacific J. Optim., 2 (2006), 3558. [ Links ]
[26] D.F. Shanno, Conjugate gradient methods with inexact searchs. Math. Oper. Res., 3 (1978), 244256. [ Links ]
[27] L. Zhang, W. Zhou and D. Li, Global convergence of a modified FletcherReeves conjugate gradient method with Armijotype line search. Numer. Math., 104(2006), 561572. [ Links ]
[28] W.W. Hager and H. Zhang, Algorithm 851: CG_DESCENT, A conjugate gradient method with guaranteed descent. ACM T. Math. Software., 32 (2006), 113137. [ Links ]
Received: 12/XII/10.
Accepted: 17/IV/11.
#CAM307/10.
* Corresponding author
** This work is supported by NSF grant HHUY0710 of HuaiHua University, China.