Acessibilidade / Reportar erro

On the divergence of line search methods

Abstract

We discuss the convergence of line search methods for minimization. We explain how Newton's method and the BFGS method can fail even if the restrictions of the objective function to the search lines are strictly convex functions, the level sets of the objective functions are compact, the line searches are exact and the Wolfe conditions are satisfied. This explanation illustrates a new way to combine general mathematical concepts and symbolic computation to analyze the convergence of line search methods. It also illustrate the limitations of the asymptotic analysis of the iterates of nonlinear programming algorithms.

line search methods; convergence


On the divergence of line search methods

Walter F. Mascarenhas

Instituto de Matemática e Estatística, Universidade de São Paulo, Rua do Matão 1010, Cidade Universitária, 05508-090 São Paulo, SP, Brazil, E-mail: walterfm@ime.usp.br

ABSTRACT

We discuss the convergence of line search methods for minimization. We explain how Newton's method and the BFGS method can fail even if the restrictions of the objective function to the search lines are strictly convex functions, the level sets of the objective functions are compact, the line searches are exact and the Wolfe conditions are satisfied. This explanation illustrates a new way to combine general mathematical concepts and symbolic computation to analyze the convergence of line search methods. It also illustrate the limitations of the asymptotic analysis of the iterates of nonlinear programming algorithms.

Mathematical subject classification: 20E28, 20G40, 20C20.

Key words: line search methods, convergence.

1 Introduction

Line search methods are fundamental algorithms in nonlinear programming. Their theory started with Cauchy [4] and they were implemented in the first electronic computers in the late 1940's and early 1950's. They have been intensively studied since then and today they are widely used by scientists and engineers. Their convergence theory is well developed and is described at length in many good surveys, as [11], and even in text books, like [2] and [12].

Line search methods, as discussed in this work, are used to solve the unconstrained minimization problem for a smooth function f:

We see them as discrete dynamical systems of the form

where {xk} Ì n is the sequence we expect to converge to the solution of problem (1) and ek contains auxiliary information specific to each method. At the kth step we choose a search direction dk and analyze f along the line {xk + wdk,w Î }. We search for a step size akÎ such that the sequence xk satisfies constraints

that are simple and try to force the xk to converge to a local minimizer of f.

For example, Newton's method for minimizing a function can be written in the framework (2) by taking

D(f,xk,ak,dk,ek) = –Ñ2f(xk)–1Ñf(xk)

and E = 0. The BFGS method could be considered by taking ek Î n×n and setting

for gk = Ñf(xk), gk+1 = Ñf(xk + akdk) and sk = akdk. A typical example of constraints C on the stepsize ak in (3) are the Wolfe conditions:

where 0 < s < b < 1. Usually, condition (6) enforces a sufficient decay in the value of f from step k to k + 1 and condition (7) leads to steps sk = xk+1 – xk which are not too short.

A typical theorem about the convergence of line search methods look like this one adapted from page 212 in Nocedal and Wright's book [12]:

Theorem 1. Suppose f: n ® has continuous second order derivatives,

is bounded and there exist C > 0 such that

If the n × n matrix e0 is symmetric positive definite and the iterates xk generated by the BFGS method, as described in (2)-(5), satisfy the Wolfe conditions (6)-(7) then the sequence {xk} converges to a minimizer of f (which is unique since f is strictly convex by (9)).

Starting in the 1960's, similar theorems have been proved for several line search methods. Since that time people have tried to find more general results regarding the convergence of line search methods. Most researchers were happy with constraints like the Wolfe conditions and acknowledged their need, because it is easy to enforce them in practice and it is also easy to build examples in which results like theorem 1 are false if similar constraints are not imposed. However, convexity constraints like (9) were regarded as too strong and undesirable. Much effort was devoted to eliminating them but progress was slow and frustrating due to the nonlinear nature of expressions like (5). As a consequence, M. Powell, one of the leading researchers in this area, wrote in [16] that:

Moreover, theoretical studies have suggested several improvements to algorithms, and they provide a broad view of the subject that is very helpful to research. However, because of the difficulty of analyzing nonlinear calculations, the vast majority of theoretical questions that are important to the performance of optimization algorithms in practice are unanswered...

A final answer regarding the need of the convexity hypothesis (9) in theorem 1 was first published in 2002 by Y. Dai [6] and it was somewhat surprising: if f is not convex then the iterates generated by the BFGS method may never approach any point z such that Ñf(z) = 0.

We were unaware of Dai's work and in the year his result was published we found a similar answer regarding the convergence of the BFGS method [13]. Our approach, however, was quite different from his. Our work was based on the observation that equations (4)-(7) have symmetries which can be exploited to build a counterexample for theorem 1 without the hypothesis (9). After the publication of [13] we generalized the argument used in that paper and our purpose in this work is to present this generalization.

Our approach is motivated by the work started by S. Lie in the late 1800's, in which symmetries have lead to remarkable solutions of nonlinear differential equations [3]. We developed a technique to produce examples with search lines as in Figure 1.


The line search methods we discuss are invariant with respect to orthogonal changes of variables and scaling, in the sense that if they assign a step sk = xk+1 – xk to the point xk and objective function F, Q is an orthogonal matrix and l Î then the step k corresponding to the objective function (x) = F(l–1Qtx) at the point k = lQxk is k = lQsk. We argue that in relevant cases these symmetries lead to iterates as in Figure 1.

Actually, the possibility of cyclic behavior for line search methods was already mentioned by Curry in 1944 [5]. It was also discussed in [6, 9, 13, 15]. Here we go one step further and present a systematic way to build examples that display this behavior. The qualitative behavior of the iterates in our examples is captured by the concepts of flower and dandelion described in section 3. In intuitive terms the iterates can be seen as defining the petals of a flower and their accumulation points lie in a cycle that define the flower's core. When the iterates approach the core along well defined directions we say that the flower is a dandelion.

In sections 2 and 7 we present concrete examples of dandelions, for Newton's method and the BFGS method, respectively. In these examples the line searches are exact, the first Wolfe condition is satisfied and the restrictions of the objective functions to the search lines are strictly convex, the level sets of f are compact, but yet the iterates have the cyclic asymptotic behavior illustrated in Figure 1. The BFGS and Newton's methods are among the most important line search methods and our examples refute the following conjecture:

If when applying the BFGS or Newton's methods we choose the first local minimizer along the search line then the iterates converge to a local minimizer of the objective function.

Besides symmetries, this work is based on a theorem proved by H. Whitney in 1934 [17]. Whitney's theorem regards the extension of Cm functions from subsets of n to n. It says that if a function F and its partial derivatives up to order m are defined in a subset E of n and F's Taylor series up to order m behave properly in E then F can be extended to a Cm function in n. Whitney's theorem is a handy tool to highlight the weak points of nonlinear programming algorithms.

In section 2 we illustrate how symmetries and Whitney's theorem can be combined to analyze nonlinear line search methods in particular situations as if they were linear. This analysis is not adversely affected by the number of dimensions. To the contrary, as we go to higher dimensions the number of free parameters at our disposal increases. We are then able to observe phenomena contrary to our 2 or 3 dimensional intuition. However, as the experience with Lax Pairs has shown [3], "exploiting symmetry" is easier said than done. The algebraic manipulations necessary to implement our ideas can be overwhelming. Although the example for Newton's method presented in section 2 is a direct consequence of symmetry, we would not be able to build the example for the BFGS method in section 7 without the software Mathematica. Fortunately, today we have the luxury of tools like Mathematica and can focus on the fundamental geometrical aspects of the line search methods.

Our arguments can be adapted to objective functions with mth order Lipschitz continuous derivatives or to the more general class (n) of functions discussed by C. Fefferman in [7], but we do not aim for utmost generality and restrict ourselves to objective functions with Lipschitz continuous second order derivatives, so that we can speak in terms of gradients and Hessians and avoid the use of higher order multilinear forms. On the other hand, [1] indicates that things are different for analytic objective functions, mainly because these functions are "rigid" and it is not possible to change them only locally, or more technically, due to the lack of analytic partitions of the unity.

This work has six more sections and an appendix. Section 2 motivates our approach by using it to analyze the convergence of Newton's method. The technical concepts that formalize our arguments are presented in sections 3 and 4. Section 5 discusses the Wolfe conditions and section 6 explains how to build examples in which the objective function is convex along the search lines. In section 7 we combine the results from the previous sections to build an example of divergence for the BFGS method. In the appendix we prove our claims.

Finally, we would like to emphasize that it is important to look at the results in this work from a broad perspective. The examples presented here should not be taken as evidence against the use of Newton's method or the BFGS method. To the contrary, these methods perform quite well in practice. In real life numerical algorithms are implemented in floating point arithmetic and rounding errors would break our examples apart (and introduce other subtle problems). This work highlights the limitations of the asymptotic analysis of these algorithms. Our examples show that, even if taken to extremes, complex nonlinear calculations may not be able to explain the practical behavior of nonlinear programming algorithms. The main advice one can extract from this work is the following rule of thumb:

If you find that it is difficult to prove that a line search method converges under certain conditions, and other people have tried the same for a couple of decades and did not succeed, then consider the possibility that in theory the method may actually fail under these conditions, even if all your numerical experiments indicate that it always converge.

2 Newton's method

We now describe a family examples of divergence for Newton's method for minimization. The examples are parameterized by the step size a: given a > 0 we build an example in which all step-sizes ak are equal to a. This section motivates the theory presented later on. Although the geometry underlying the examples is accurately described by Figure 1, the algebraic details make they look more complex than they really are. Thus, we suggest that you pay little attention to the formulae and focus on the structure of our argument, which can be summarized as follows:

(a) we guess general expressions for the iterates xk, function values fk, gradients gk and Hessians hk which we believe to be compatible with the symmetries in Newton's method and the theory presented below.

(b) we plug these expressions into the formula that define Newton's method and obtain equations relating our guesses in item (a).

(c) we solve these equations and the next sections guarantee the existence of an objective function F such that F(xk) = fk, ÑF(xk) = gk, Ñ2F(xk) = hk and Ñ2F(xk + wsk) sk > 0 for w Î and k big enough.

Following this recipe, we decomposed

6 as a direct sum of a three dimensional "horizontal" subspace and a three dimensional "vertical" subspace and tried iterates xk, function values fk, gradients gk and Hessians hk of the form1 1 The matrices hk are not positive definite and in practice one would take another search direction if, for example, this fact was detected during a Cholesky factorization of hk. However, to keep the algebra as simples as possible, in this work we do not enforce the condition that the Hessians Ñ 2 f( xk) are s.p.d. In [14] we show that Newton's method may fail even Ñ 2 f( xk) is s.p.d. for all k and the Wolfe conditions are satisfied. :

for l Î (0,1), = (h, v) and = (h, v) with h, v, h and vÎ 3 and

where I is the 3 × 3 identity matrix, Qh and Qv are 3 × 3 orthogonal matrices, A is a symmetric 3 × 3 matrix and C is a 3 × 3 matrix. We then concluded that

are convenient: they are simple and after picking them we still have the freedom to choose , A and C in order to satisfy the hypothesis of the theory presented in the next sections and obtain iterates xk which are consistent with the formula

that defines Newton's method with step-size a. If we replace ÑF(xk) and Ñ2F(xx) in (14) by gk and hk in (11) then l, Qh and Qv cancel out and we obtain the equations

where

Notice that, due to the invariance of Newton's method with respect to orthogonal changes of variables and scaling, there is no "k" in (15)-(16). Equations (15) yields

Equations (10) and (11) show that

gk = 2–k
t, gk+1 = 2–(k+1)
t D(2)Q, and fk+1 – fk = –2–(k+1). Therefore, if

then sk = xk+1 – xk is a descent direction, the line searches are exact (gk+1 = 0) and the first Wolfe condition

fk+1 – fk < sgk

holds for

0 < s < min {1, – /(2t)}.

We now apply the results from the next sections: items 4c and 4d in the definition of seed in section 4 require that

and section 6 says that to guarantee the convexity of the objective function along the search lines we should ask for

To complete the specification of the terms in (10)-(11) we chose the 9 entries of C and the 6 independent entries of A in order to satisfy (15)-(20). These equations and inequalities are linear in A and C and the following matrices satisfy them:

Equations (10)-(13), (16)-(19) and (21) define iterates and function values, gradients and Hessians of the objective function at them. The next sections guarante the existence of an objective function F with Lipschitz continuous second order derivatives such that F(xk) = fk, ÑF(xk) = gk and Ñ2F(xk) = hk. Moreover, neither the vectors D'(0) and D'(0) nor the vectors D(0) and D(0)D(2) are aligned2 2 D'(l) here is the derivative of D(l) with respect to l. and the lines k = {D(0) xk + wD(0)sk, w Î } are such that r Ç k = for all r + 1 < k < r + 5 and theorem 3 and lemmas 1 and 3 in the following sections show that we have much freedom to chose the value of F(x) along the search segments {xk + wsk, w Î [0,1]}: if the function y: [0,1] ® has Lipschitz continuous second order derivatives and

then F can be chosen so that F(xk + wsk) = (1/2)ky(w) for w Î [0,1] and k large. In fact, condition (20) and theorem 4 in section 6 show that F can be chosen so that F(xk + wsk)sk > 0 for w Î and k large and the level sets

W(f,z) = {x Î n such that f(x) < z}

are bounded.

3 Flowers and Dandelions

We now present a framework to apply Whitney's theorem to study the convergence of line search methods. We describe examples in which the iterates xk and the function values fk, the gradients gk and the Hessians hk of the objective function are grouped into p converging subsequences, which we call petals (see Figure 2). The limits of these subsequences {xk}, {fk}, {gk} and {hk} are the members of periodic sequences {ck}, {jk}, {gk} and {qk}, so that limq ® ¥ xpq+r = cr for all r and cr+p = cr, limq ® ¥fpq+r = jr for all r and jr+p = jr, gr+p = gr and qr+p = qr. In formal terms:

Definition 1. A flower (n, p, l, xk, fk, gk, hk) is a collection formed by

1. l Î (0,1) and positive integers n and p.

2. Sequences {xk} and {ck} in n and a constant M > 0 such that

(a) xi = xjÛ i = j.

(b) cj = ckÛ j º k mod p.

(c) lk< M||xk - ck||< M2lk.

3. Sequences {fk}, {jk} Ì , {gk}, {gk} Ì n and {hk}, {qk} Ì n such that

where

n is the set of n × n symmetric matrices.


Notice that the mod in item 2.b and equation (22) in the definition above imply that the sequences {ck}, {gk} and {qk} have period p and item 2.c implies that as k ® ¥ the sequence xk accumulates at the limit cycle defined by the ck. The next definitions and theorems relates the fk, gk and hk in a flower to an objective function F with Lipschitz continuous second derivatives:

Definition 2. Suppose U Ì n and V Ì p. We define Lipm(U,V) as the space of functions F: U ® V with Lipschitz continuous mth derivatives. If V = then we call this space simply by Lipm(U).

Definition 3. We define LC2(n) as the set of functions f in Lip2(n) for which there exists constants Cf and RfÎ , which depend on f, such that if ||x||> Rf then Ñ2f(x) is positive definite and ||Ñ2f(x)-1||< Cf.

The class LC2(n) is interesting because if f Î LC2(n) then there exists a constant Df such that

As a consequence, the level sets

W(f,z) = {x Î n such that f(x) < z}

are compact, as required by many theorems regarding the convergence of line search methods, and all points x with Ñf(x) = 0 are such that ||x|| < DfCf. Thus, the elements of LC2(n) have a compact set of critical points and if an algorithm fails to find one of them then the algorithm is to be blamed, not the objective function. The next theorem shows that flowers can be interpolated by functions in LC2(n):

Theorem 2. Given a flower

(n, p, l, xk, fk, gk, hk) there exists F Î LC2() such that F(xk) = fk, ÑF(xk) = gk and Ñ2F(xk ) = hk for all k.

If the flower is a dandelion then we can improve this result and specify the objective function and its derivatives along the segments {xk + wsk, w Î [0,1]}:

Theorem 3. If the functions {Fk}, {Gk} and {Hk} and the intervals {[ak,bk]} are compatible with the dandelion (m, n, p, l, xk, fk, gk, hk) then there exists k0Î and F Î LC2(n) such that F(xk+ wsk) = Fk(w, lk), ÑF(xk + wsk) = Gk(w, lk) and Ñ2F(xk + wsk) = Hk(w, lk) for k > k0 and w Î [ak, bk ].

This theorem will make sense after you read the following definitions:

Definition 4. A flower (n, p, l, xk, fk, gk, hk) is a dandelion if there exist functions XkÎ Lip2([0,1],n) such that, for all k and Sk(z) = Xk+1(z) - Xk(z),

(a) Xk+p = Xk and xk = Xk(lk).

(b) The vectors Sk(0), (0) and (0) are linearly independent.

(c) The vectors Sk(0), Sk+1(0) and (0) are linearly independent.

Definition 5. The intervals [ak,bk], defined for k Î , are compatible with a dandelion if

(a) ak = ak+p< 0 and bk+p = bk> 1 for k Î .

(b) The segments {Xk(0) + wSk(0), w Î [ak, bk]} and {Xr(0) + wSr(0), w Î [ar, br]} are disjoint if r + 1 < k < r + p - 1.

Definition 6. For all k Î , consider functions FkÎ Lip2(2), GkÎ Lip1 (2, n) and HkÎ Lip0(2, n). We say that {Fk}, {Gk} and {Hk} are compatible with a dandelion

if Fk+p = Fk, Gk+p = Gk and Hk+p = Hk and there exists k0 such that if k > k0, i Î {0,1}, w Î and z = lk then

for Vk(w) = (1 - w)(0) + w(0) and Uk(w) = (1 - w)(0) + w(0).

If we do not care about the derivatives of F in directions normal to the segments {xk + wsk, w Î [0,1]} then the search for compatible functions Fk, Gk and Hk is simplified by the following lemma

Lemma 1. Let (m, n, p, l, xk, fk, gk, hk) be a dandelion and, for k Î , functions FkÎ Lip2(2) for which Fk+p = Fk. If there exist k0Î such that

for i Î {0,1} and k > k0 then there exist functions GkÎ Lip1(2,n) and HkÎ Lip0(2, n) such that and {Fk}, {Gk} and {Hk } are compatible.

Therefore, we can focus on Fk(w,z) and neglect Hk(w,z) and Gk(w,z) for w Ï {0,1}. In the next section we describe a class of dandelions for which we can restrict ourselves to functions Fk of the form Fk(w,z) = yk(w).

4 Symmetric dandelions and their seeds

In this section we present a family of dandelions that includes the examples for the BFGS and Newton's methods mentioned in the introduction. These dandelions combine symmetries imposed by an orthogonal matrix Q with contractions dictated by a diagonal matrix D. They are defined by their seeds:

Definition 7. A seed S(n, p, d, Q, k, k, k, k) is a collection formed by

1. n, p Î and d = (d1... dn) Î n such that p > 3 and dn = max di.

2. A n × n orthogonal matrix Q such that Qp = I and the diagonal matrix D(z) with ith diagonal entry equal to

commutes with Q for z Î .

3. A sequence {k} Ì n such that k+p = k, the points cr = D(0)Qr

r are distinct for 0< r < p and the vectors

are such that, for 0 < r < p, the vectors D'(0)r and D'(0)r are not aligned and neither are the vectors D(0)r and D(0)r+1.

4. Sequences {k} Ì , {k} Ì n and {k} Ì n such that

(a)

k+p = k, k+p = k and k+p = k for all k.

(b) (

k)ij = 0 if di + dj > dn.

(c) If dl = dn - 1 and J = {j | dj > 0} then

(d) If dn< 2, L = {l | dl = dn} and S = {(i,j) | di + dj = dn, di > 0, dj > 0} then

A seed and l Î (0,1) generate a dandelion through the formulae

where

This result is formalized by the following lemma:

Lemma 2. If S(n, p, d, Q, k, k, k, k) is a seed and l Î (0,1) then there exists a unique dandelion (S) with Xk, fk, gk, hk, jk, gk and qk as in (34)-(39). Moreover, (S)'s steps sk = xk+1 - xk are given by

for

k defined in (31).

It is easy to apply lemma 1 and theorem 3 to (S):

Lemma 3. If S(n, p, d, Q, k, k, k, k) is a seed, l Î (0,1), xk, fk, gk and hk are given by (34)-(37) and the functions yrÎ Lip2(), defined for 0< r < p, satisfy

for i Î {0,1} then the functions Fk(w,z) = yk mod p(w) satisfy (30).

To build a symmetric dandelion and specify the value of the objective function along the search segments {xk + wsk, w Î [0,1]} it is enough to find the right seed and functions yr. Once we find them, the existence of an objective function F with ÑF(xk) = gk, Ñ2F(xk) = hk and F(xk + wsk) = yk mod p(w) for xk, fk, gk and hk in (34)-(37) is guaranteed by lemma 3 and theorem 3. To find a seed we proceed as in section 2: we plug (34)-(37) into the expressions that define

(a) the method of our interest,

(b) the compatibility conditions in the definition of seed and theorem 3,

(c) additional constraints, like the Wolfe conditions in section 5 and the convexity conditions in section 6.

and analyze the result. If equations (34)-(37) are compatible with the method's symmetries, as they are for the BFGS and Newton's methods, then we have a chance of handling these constraints and may even find closed form solutions for them. If equations (34)-(37) are not related to the method's symmetries then the dandelion will not bloom.

5 The Wolfe conditions

In this section we show that the Wolfe conditions

and

may not prevent the cyclic behavior illustrated in figures 1 and 2. In fact, we can have cyclic behavior even if we replace the second Wolfe condition (43) by the stronger requirement gk+1 = 0, which is called exact line search condition. These conditions are invariant with respect to orthogonal changes of variables and scaling and can be easily checked for the dandelions coming from a seed:

Lemma 4. Let S(n, p, d, Q, k, k, k, k) be a seed and (m, n, p, l, xk, fk, gk, hk ) the corresponding dandelion.

(a) If

= 0 for 0< r < p then
gk+1 = 0 for all k.

(b) If

< 0 for 0< r < p then the first Wolfe condition (42) holds for

(c) If

< 0 for 0< r < p then the second Wolfe condition (43) is verified for

6 Convexity along the search lines

Convexity is an important simplifying assumption in optimization. It is tempting to conjecture that strict convexity along the search lines guarantees the convergence of line search methods3 3 By strict convexity along the search line we mean that the directional second derivatives Ñ 2 F( xk + wsk) sk are positive. . Sometimes even stronger conjectures are made, like having convergence if we choose a global minimizer or the first local minimizer along the search line. We now show that strict convexity along the search lines does not rule out the cyclic behavior depicted in figures 1 and 2. To do that, we present a theorem that yields an objective function which is strictly convex along the search lines of a dandelion that comes from a seed:

Theorem 4. Let S(n, p, d, Q, k, k, k, k) be a seed, l Î (0,1) and xk, ck, fk, gk and hk be given by (34)-(37). Let be the convex hull of {ck, 0< k < p}. Consider the lines k = {ck + w(ck+1 - ck), w Î } and assume that rÇ kÇ = for r + 1 < k < r + p - 1. If, for 0< r < p,

then there exist a function F Î LC2(n) and k0 such that if k > k0 then F(xk) = fk, ÑF(xk) = gk, Ñ2F(xk) = hk and Ñ2F(xk + wsk) sk > 0 for all w Î .

If the vector d that defines the seed S has more than two entries equal to 0 then the lines r and k and r + 1 < k < r + p - 1 are disjoint for almost all choices of the points k. Therefore, to build examples of divergence in which the objective function is strictly convex along the search lines we can focus on (44) and check the intersections later, just to make sure we were not unlucky. This is what we did in section 2 and will do again in the next section.

7 The BFGS method

In this section we present an example of divergence for the BFGS method in which the objective function is strictly convex along the search lines. The essence of this example is already present in [13] and we suggest that you read this reference as an introduction to this section. Our purpose here is to show that the concepts of flower, dandelion and seed reduce the construction of examples like the one in [13] to the solution of algebraic problems. Software like Mathematica or Mapple can decide if such algebraic problems have solutions and find accurate approximation to these solutions. The validation of the example becomes then a question of using these approximate solutions wisely to check the requirements in the definitions of flower, dandelion and seed and the hypothesis of the lemmas and theorems in the previous sections.

We analyze the BFGS method with exact line searches, i.e.,

gk+1 = 0. In this case the Hessian approximations Bk are updated according to the formula:

where gk = ÑF(xk). The iterates xk evolve according to

Equations (45)-(46) are invariant with respect to orthogonal changes of variables and scaling, in the sense that if Q is an orthogonal matrix, l Î and F, Bk and xk satisfy equations (45)-(46) then (x) = F(l-1Qtx), k = l-2QBkQt and k = lQxk satisfy them too. It is hard to exploit these symmetries because the BFGS method was conceived to correct the matrices Bk. However, it has an additional symmetry: if we take Bk of the form

combined with the conditions

then equation (46) is automatically satisfied and (45) holds if

for rk such that

Notice that if we assume (48) and take ak = 1 for 0 < k < n, ak in (50) for k > n then the ak are positive and the vectors gk,... gk+n-1 are linearly independent, because if we suppose that µjgk+j = 0 with µ0 = 1 then (48) leads to the contradiction 0 = µjgk+j = gk < 0. Therefore, under (48)-(50) the matrices Bk defined by (47) are positive definite and to build an example of divergence for the BFGS method we can ignore ak and Bk and focus on (48)-(49).

To apply the theory above we plug (34)-(37) into (48) and (49) and deduce that a seed S(n, p, d, Q, k, k, k, k) is consistent with the BFGS method if

where Z = D(l-1)Q. If we fix Z then (52) becomes similar to an eigenvalue problem, with the r's playing the role of eigenvalues and the 's of eigenvectors. We solved equations (52) for k and rk in the case

l = and n = 6, as described in the lemma

Lemma 5. If Q and D(l) are the matrices above, dn = 3, n = 6 and l = then there exist vectors kÎ 6 and coefficients and rkÎ that satisfy (52) and are such that k+11 = k'

k+11 = k and the six vectors in the set

are linearly independent for each k Î .

The linear independence of the vectors D(l-j)Qj

k+j in lemma 5 implies that we can solve the following linear systems of equations on the vectors k:

The vectors

k obtained from (54) define steps sk by (40):

sk = Xkk for X = D(l)Q

and we claim that the points

satisfy

and lead to points xk = Xk

k such that sk = xk+1 - xk. In fact, using (56) it is straightforward to deduce that xk+1 - xk = Xk+1
k+1 - Xk
k = Xk
k = sk. To verify (57), notice that (56) yields

where

Now, since

k+11 = k, equation (54) implies that k+11 = k and

Combining this equation, (55) and (59) we conclude that D = 0 and equation (58) and the identity

j-66 = j lead to (57).

In the proof of lemma 5 we show that the

k can be chosen so that the vectors k defined by (54) satisfy the linear independence requirements in the definition of seed. The k and k above, d = (0,0,0,1,1,3),

are also compatible with this definition. As a consequence, n = 6, p = 66, d, Q

k, k, k and k define a seed S. This seed leads to a dandelion (S) with steps sk and gradients gk compatible with the BFGS method with the matrices Bk in (45) and step sizes ak in (50). Equations (34)-(37) and (54) imply that the function values and gradients associated to (S) satisfy

Lemma 4 shows that the iterates xk corresponding to satisfy the first Wolfe condition for 0 < s < 1 - l3. Moreover, almost all choices of the vectors k in the proof of lemma 5 lead to lines k = {D(0)Qk

k + wD(0)Qk
k, w Î } such that r Ç k = for r + 1 < k < r + 65 and D'(0)sk+1¹ 0. Equation (60) shows that hksk > 0 and hk+1sk > 0. As a result, equation (61) and theorem 4 yield an objective function F Î LC2(6) such that the iterates xk generated by applying the BFGS method to F with x0 = 0 and Bk above satisfy Ñ2F(xk + wsk)sk > 0 for k large and w Î .

This completes the presentation of an example showing that there is not enough strength in the definition of the BFGS method, the exact line search condition and the Wolfe conditions to prevent the cyclic behavior in figures 1 and 2, even when the objective function is strictly convex along the search lines and has compact level sets.

You may now be wondering if we could not find a simpler example, in dimension less than 6. Unfortunately, we were not able to find such example because the eigenvalue problem (52) does not seem to have real solutions rk ¹ 0 for n < 6. Notice that this is a purely algebraic issue, which has nothing to do with nonlinear programming. Similarly, the minimum dimension at which we could build counter examples with the techniques described here for other methods could be bigger or smaller than 6, depending on the particular symmetries of the method and the existence of solutions for the corresponding algebraic problems.

8 Appendix

We now prove the results stated in the previous sections. Our main tool is the following corollary of the Whitney's extension theorem:

Lemma 6. Let E be a bounded subset of n and suppose F: E ® , G: E ® n and H: E ® n are functions with domain E and M is a constant. If

then there existsÎ Lip2(n) such that (x) = F(x), Ñ(x) = G(x) and Ñ2(x) = H(x) for x Î E. Moreover, there exists constants C and R such that if ||x|| > R then Ñ2(x) is positive definite and ||Ñ2(x)-1|| < R.

Whitney's theorem is stated in different levels of generality in the pure mathematics literature. The most complete and general approach is due to C. Fefferman [7, 8] but, unfortunately, it is stated in a language that is too abstract for the average non linear programming researcher. In page 48 of [10], L. Hörmander presents a more concrete version of the theorem for functions in Cm. This version of the theorem is stated using n-uples to denote partial derivatives, i.e., if a = (a1,...,an) Î n, |a| = åai and f is a function with partial derivatives of order |a| then ¶af(x) is defined as

Moreover, a! = Õai and xa = Õ. Using this notation, the arguments used to prove theorem 2.3.6 in [10] can be adapted to prove the following version of Whitney's theorem:

Theorem 5. Let E be a bounded set in n and consider, for each a Î n with |a|< m, a Lipschitz continuous function ua : E ® . If there exists a constant M such that

for all x,y Î E and a Î n with |a|< m then there exists f Î Lipm(n) such that af(x) = ua(x) for all a Î n with |a|< m and x Î E. Moreover, there exists a constant C such that af(x)|< C for all x Î n and a Î n with |a|< m.

This theorem allows us to extend much of the discussion in this work to Lipm(n) for m > 2. However, for m > 2 we cannot group the partial derivatives in vectors (gradients) and matrices (Hessians) as we did in the statement of lemma 6. As a consequence, the geometry behind the examples gets blurred by the technicalities as m increases and we decided it would be best to focus on the consequences of lemma 6 instead of exploring more general results like theorem 5.

We can apply Whitney's result to a dandelion with xk = Xk(lk) because if

and d > 0 is small and the intervals {[ak,bk]} are compatible with then the distance between points in the 2 dimensional surface

can be estimated in terms of w and z:

Lemma 7. Given a dandelion with xk = Xk(lk), compatible intervals {[ak, bk]} and k in (65) there exists d > 0 such that if wj,wkÎ [ak,bk], zj, zkÎ [0,d], yj = j(wj,zj), yk = k(wk,zk) and ||yj - yk|| < d then either

and in case (a)

in case (b)

and in case (c)

We also use the next lemmas. After the statement of these lemmas we prove the theorems and the paper ends with the proofs of the lemmas.

Lemma 8. Consider E Ì n, constants K > 1, d > 0 and functions F: E ® , G: E ® n and H: E ® n. If for all x, z Î E there exist m Î and y1, ..., ymÎ E such that, for y0 = x and ym+1 = z,

then all x,y Î E with ||x - y|| < d satisfy (62)-(64) with M = 3 K4.

Lemma 9. If the dandelion (m, n, p, l, xk, fk, gk, hk) and the functions {Fk}, {Gk} and {Hk} are compatible and the equations (26)-(29) hold for k > k0 then there exists M Î such that if j,k > k0 and u,w Î then

for Dh = k(u, lk) -k(w, lk), Dv = k(w, lj) -k(w, lk) and as in (65).

Lemma 10. If f0, f1, g0, g1, h0 and h1Î are such that g0 < f1 - f0 < g1, h1 > 0 and h2 > 0 then there exists y Î Lip2() such that y(0) = f0, y(1) = f1, y'(0) = g0, y'(1) = g1, y''(0) = h0, y''(1) = h1 and y''(w) > 0 for all w Î .

Lemma 11. Given a function y Î Lip1(2) such that y(i,0) = 0 and Ñy(i,0) = 0 for i Î {0,1} there exists f Î Lip1(2) such that

(a) f(i,z) = 0 and Ñf(i,z) = 0 for i Î {0,1} and z Î .

(b) f(w,0) = y(w,0) and Ñf(w,0) = Ñy(w,0) for w Î .

Lemma 12. Consider l Î (0,1), a function F Î Lip2(2), functions Y1 and Y2 in Lip2([0,1],n) and Y3(z) = Y2(z) - Y1(z). If the vectors (0), (0) and Y3(0) are linearly independent and, for i Î {0,1},

then there exist d > 0, G Î Lip1(2,n) and H Î Lip0(2, n) such that if z Î [0,d] and i Î {0,1} then

for

Proof of Theorem 2. Item 2 in the definition of flower in section 3 implies that there exists d > 0 such that if ||xk - xj|| < d then j º k mod p . The periodicity of jk, gk and qk given by (22), item 2 in definition 1 and the bounds (22)-(25) imply that if x and z belong to the set E = {xk, k Î } È {ck, k Î } and ||x - z|| < d then there exists k such that m = 1 and y1 = ck satisfy the conditions (70)-(73) in lemma 8. Therefore, lemmas 6 and 8 imply that there exists a function i as required by theorem 2.

Proof of Theorem 3. Consider k0 and d obtained from lemma 7 and

Lemma 7 and the three equations in (26) imply that the expressions

define functions F, G and H with domain E, i.e., if j(wj, lj) = k(wk, lk) then

Lemmas 7 and 9 show that the functions F, G and H above satisfy the hypothesis of lemma 8 and theorem 3 follows from lemma 6.

Proof of Theorem 4. For r > 0, consider the compact convex set

Since

k Ç r Ç = for r + 1 < k < r + p - 1 there exists > 0 such that kÇ r Ç = for the same k and r. For each 0 < r < p there exist ar < 0 and br > 1 such that

r Ç = {Xr(0) + wSr(0), w Î [ar,br]}.

Extending the definition of ak and bk by periodicity, ak = ak mod p and bk = bk mod p , we obtain intervals [ak,bk] compatible with . Combining lemmas 1, 3 and 10 and theorem 3 we obtain k1 and Î Lip2(n) such that if k > k1 then (xk) = fk, Ñ(xk) = gk, Ñ2(xk) = hk and Ñ2(xk + wsk) sk > 0 for w Î [ak,bk].

The points Xk(0) belong to the interior of the compact convex set and there exist ck < 0 and dk > 1 such that ck+p = ck, dk+p = dk and

k Ç = Xk(0) + wSk(0), w Î [ck,dk]

and vectors Uk, Vk such that Uk+p = Uk, Vk+p = Vk, Sk(0) < 0, Sk(0) > 0 and

Let L be a Lipschitz constant for the second derivatives of , µ > 0 such that

and t: ® be the function t(w) = max(w,0)3. The function

and k0 > k1 such that and ,

(xk + bksk - Xk(0) - ckSk(0)) > 0 and (xk + bksk - Xk(0) - dkSk(0)) > 0

for k > k0 are as required by theorem 4. In fact, F coincides with in and if k > k0 and w > bk then

Proof of Lemma 1. Let F be the function obtained by applying theorem 2 to . Lemma 12 applied to the functions = Fk - F, Y1 = Xk and Y2 = Xk+1 yield functions k and k such that Gk(x) = ÑF(x) + k(x) and Hk(x) = Ñ2F(x) + k(x) are as claimed in lemma 1.

Proof of Lemma 2. Equations (34)-(37) show that

where ei Î n is the vector with eii = 1 and eij = 0 for i ¹ j, Item 4b in definition 7 implies that if T = {(i,j) such that di + dj < dn} then

Since Qp = I and k+p = k, the Xk in (34) satisfy item (a) in the definition of dandelion. The relations k+p = k, k+p = k and k+p = k imply that fk, gk and hk accumulate at the limits jk, gk and qk in the last column of (34)-(37) and these limits satisfy (22). We now verify (23)-(25). Equation (85) yields (23). The first equation in (84) leads to

where J = {j | dj > 0}. The second equation in (84) and (85) imply that

for U = {u |du = dn - 1} and equations (87) and (32) show that the gk satisfy (24). Equations (38), (39) and (86) lead to

for L and S in item 4b of definition 7, and (25) follows from (33). Finally, the linear independence requirements in items (b) and (c) of definition 4 follow from item 3 in definition 7.

Proof of Lemma 3. Direct computation using (34)-(37) and (41) show that the function Fk satisfies (30).

Proof of Lemma 4. To prove this lemma, plug (34)-(37) into the expressions in the hypothesis of lemma 4 and compare the results with (42)-(43).

Proof of Lemma 5. The matrix Z in (52) can be written as

where the Zi's are the following 1 × 1 and 2 × 2 blocks:

If we identify the blocks Zi with the complex numbers

then (52) can be interpreted as the equations

in the complex variables

j,k given by

In the case rk+11 = rk and r+11 = r equation (91) is equivalent to

where

j = (j,0, j,1, j,2, j,3, ..., j,7, j,10)t and

Equation (94) suggests that we take r0, r1, ..., r10 such that the corresponding matrices A in (95) are singular. Given such r's we can find vectors j ¹ 0 that satisfy (94) and use them to define the vectors taking real and imaginary parts of (92)-(93). This approach leads to the polynomial equations

on the yr = 1/rr. To obtain accurate approximations for appropriated rr we take

y6 = -1.9948, y7 = -0.3737, y8 = -1.2355, y9 = 0.9857, y10 = 0.11717

and apply Newton's method to find the remaining yi. Expression (96) correspond to a system of six real equations and Newton's method starting with

y0 = 2.04831, y1 = 3.33798, y2 = -1.15867, y3 = 0.300795, y4 = -0.634211, y5 = -2.44761,

converges quickly to a solution of this system. This approach led to rational approximations

's for the yi's such that ||det(A(, -zj))|| < 10-1000 for j = 1,2,3,4 and 5 and such that the Jacobian of the system (96) is well conditioned at . A standard argument using Kantorovich's theorem proves the existence of an exact y in a neighborhood of radius 10-500 of our rational approximation. Using the approximation we dropped the last row in each of the matrices A(y, -zj) and computed highly accurate approximations for the five complex vectors jÎ 11 in (94) normalized by the condition (j)1 = 1. Using (92)-(93) we obtained accurate approximations to the vectors j required by lemma 5. Using these approximations we verified that the vectors in (53) are indeed linearly independent and solved the systems (54) for the vectors k. Finally, we computed approximations for the k in (55)-(56) and verified that the corresponding lines k and r in the hypothesis of theorem 4 are at least 10-5 apart. Our computations indicate that the exact lines k and r do not cross. We did a rigorous sensitivity analysis of the computations above and it indicated that 500 digits is precision enough to guarantee that our conclusions apply to the exact y's, 's, 's and 's.

Proof of Lemma 6. Applying the version of Whitney's theorem in page 6 of [8] to w(t) = t and a properly scaled version of the polynomials

Px(y) = F(x) + G(x)t(y - x) + (y - x)tH(x)(y - x)

we obtain a function Î Lip2(n) and a constant C > 0 such that (x) = F(x), Ñ(x) = ÑF(x) and Ñ2(x) = Ñ2F(x) for all x Î E and ||Ñ2(x)|| < 1/C for all x Î n. Let R > 0 be such that 2||x|| < R for all x Î E and consider a polynomial F such that

The constants C and R and the function defined by

satisfies the requirement of lemma 6.

Proof of Lemma 7. By compatibility of {[ak,bk]} and and periodicity (Xk+p = Xk), there exists > 0 for which the segments = {k(w, 0), w Î [ak,bk]} are such that dist > if r + 1 < k < r + p - 1. If d > 0 is small enough then the surfaces

are such that

The definition of dandelion and the Lipschitz continuity of the derivatives of Xk imply that for d > 0 small if v, z Î [0,d], k Î and a, b, c Î then

We also have that

where Dk(v,z) satisfies the inequality

when v,z Î [0,d]. The linear independence requirements in the definition of dandelion imply that (0) ¹ 0 and (100) implies that if d > 0 is small and v,z Î [0,d] then

We now show that d > 0 for which the expressions (97)-(101) above are valid fulfils the requirements in lemma 7. Given yj and yk as in the hypothesis of lemma 7 there exists i Î [k,k - p) such that i º j mod p Since ||yj - yk|| < d and i = j the surfaces and are at most d apart and (97) leaves only the three possibilities: (a) i = k, (b) i = k + 1 or (c) i = k + p - 1. This corresponds to (66) and to complete this proof we now verify the bounds (67)-(69). In case (a) Xj = Xk and Sj = Sk and the definition of yj and yk in the hypothesis and (100) lead to

The bounds (98) and (101) and the abbreviation (wj - wk)Sk(zk) = G yield

and (67) follows from the equations G =

j(wj,zk) - yk and

yj - j(wj,zk) = (1 - wj) (Xk(zj) - Xk(zk)) + wj(Xk+1(zj) - Xk+1(zk)).

In case (b) Xj = Xk+1 and Sj = Sk+1 and (100) lead to

yj - yk = wj Sk+1(zj) + (zj - zk) X'j(zk) + (1 - wk)Sk(zk) + Dj(zk,zj)

and using (99) and (101) we deduce that

||yj - yk|| > 3d ( ||wj Sj(zj)|| + || (zj - zk) X'j(zk)|| + || (1 - wk)Sk(zk)|| )

-|| Dj(zk,zj)|| > d( ||wj Sj(zj)|| + ||Xj(zj) - Xj(zk)|| + || (1 - wk)Sk(zk)|| )

and (68) follows at once. Finally, case (c) is analogous to case (b).

Proof of Lemma 8. Given x,z Î E, let y1,...,ym be as in the hypothesis of lemma 8 and define

y0 = x, ym+1 = z and fj = F(yj), gj = G(yj) and hj = H(yj).

The bounds (71) and (70) yield

and (62) is obtained by taking j = m + 1 in (102). The identity

and the bounds (70), (72) and (102) and imply that

Taking j = m + 1 we deduce that x and z satisfy (63). Finally, notice that

for D = (yi + yi-1 - 2y0)th0(yi - yi-1). The last terms in (104) cancel because

Thus, if we take j = m + 1 then (70), (103) and (104) yield (64) for M = 3K4.

Proof of Lemma 9. The bounds (74) and (77) follow from the Lipschitz continuity of Hk. Equation (75) can be derived from the first equation in (29),

and the Lipschitz continuity of the first derivatives of Gk. Equation (76) is a consequence of the first equation in (27) and (29), (105) and the Lipschitz continuity of the second derivatives of Fk. The bounds (78) and (79) are clearly satisfied if j = k and from now on we assume that j ¹ k. In this case

and Xk(lj) = Xk(lk) + (0)(lj - lk) + (0) (l2j - l2k) + µj,k, where µj,k = (X''(z) - X''(0)) dzdx satisfies

for a Lipschitz constant L for . The bound (106) and Xk Î Lip2([0,1],n) lead to

for

The conditions Fk Î Lip2(2) and Gk Î Lip1(2,n) imply that

The bound (78) follows from the second equation in (29), (107) and (109). Finally, the bound (79) can be deduced from the second equation in (27), equations (28), (106), (107), equation (108) with l = j and l = k and equation (110).

Proof of Lemma 10. Given > 0, consider the function

where (w) is the piecewise cubic given by

with

i = gi - f1 + f0 and i = hi/2 for i Î {0,1}. The function belongs to Lip2() if and only if it has continuous second order derivatives at w = 2 and w = 1 - 2. This condition leads to a linear system of six equations on the six variables and . Solving this system we obtain

The second derivative of is a piecewise linear function with values

at the nodes w = 0, w = , w = 2, w = 1 - 2, w = 1 - and w = 1. The hypothesis implies that 0 < 0 and 1 > 0 and (111) shows that > 0 and < 0 if > 0 is small. Therefore, (112) implies that ''(w) > 0 for all w if is small enough.

Proof of Lemma 11. Applying Whitney's theorem to the set

E = {(0,y) | |y| < 3} È {(1,y) | |y| < 3} È {(x,0) | |x| < 3 } Ì 2

and the functions F: E ® and G : E ® 2 given by

F(w,0) = y(w,0), F(i,z) = 0, G(w,0) = Ñy(w,0), G(i,z) = 0,

for i = 0,1 we obtain a function Î Lip1(2) such that

(w,0) = y(w,0), Ñ(w,0) = Ñy(w,0), (i,z) = 0, Ñ(i,z) = 0

for i = 0,1 and |w|,|z| < 3. Let t : ® be a C¥ function such that t(x) = 1 for |x| < 2 and t(x) = 0 for |x| > 3. The function

f(w,z) = t(z) (t(w)(w,z) + (1 - t(w))y(w,z))

satisfies items (a) and (b) in lemma 11.

Proof of Lemma 12. Let Z1, Z2 and Z3 be the functions defined by

The vectors Z1(0), Z2(0) and Z3(0) are linearly independent. Therefore, there exist W1, W2, W3Î n such that

The implicit function theorem guarantees the existence of d > 0 and functions aij Î Lip1([0,d]), defined for 1 < i,j < 3, such that

and the vectors

are such that

for z Î [0,d]. Lemma 11 applied to y = yields f Î Lip1(2) such that

for w Î and i Î {0,1}. We now show that any G Î Lip1(2,n) such that

for z Î [0,d] satisfies (81) and (82). Equations (80) and (117) imply that G(i,z) = 0 and second and third equations in (81) follows from (114)-(118) and the definition of S(z) and V(w). To verify (82), notice that (117) and (118) lead to

Equation (116) yields (0)t Zj(0) + Ai(0)t (0) = 0 and (114)-(116) imply that

If j = 3 then (0) = (0) = (0) - (0) = Z2(0) -Z1(0) and (120) leads to

Reminding that (0) = (0) and using (116) and (120)-(121) we obtain

Equations (117) and (118) show that

and (82) follows from (116), (118), (122)-(124) and the fact that

To complete this proof we define H as

for

Equations (80) and (117) imply that H(i,z) = 0 for i Î {0,1} and (118) and (116) imply the second equation in (83). Using (117), (126) and (127)-(130) we get

Equations (113), (119) and (122)-(124) show that

Finally, equations (113) shows that the right hand side of (131) is the expansion of ¶Gz (w,0) on the basis {W1,W2,W3} of a tri-dimensional subspace that contains ¶Gz (w,0) and we have shown (83).

Received: 15/IV/06. Accepted: 15/I/07.

#699/06.

  • [1] P. Absil, R. Mahony and B. Andrews, Convergence of the iterates of descent methods for analytic functions. Siam Journal of Optimization, 16(2) (2005), 531-547.
  • [2] D. Bertsekas, Nonlinear Programming, second edition, second printing, Athena Scientific, Belmont, Massachusets, 2003.
  • [3] B. Cantwell, Introduction to Symmetry Analysis, Cambridge University Press, Cambridge UK, (2002).
  • [4] A. Cauchy, Méthodes générales pour la résolution des systémes d'équations simultanées. C.R. Acad. Sci. Par., 25 (1847), 536-538.
  • [5] H. Curry, The Method of steepest descent for non-linear minimization problems. Quart. Appl. Math., 2 (1944), 258-261.
  • [6] Y. Dai, Convergence properties of the BFGS Algorithm. SIAM J. Optim., 13(3) (2002), 693-701.
  • [7] C. Fefferman, A generalized sharp Whitney theorem for jets. Revista Matemática Iberoamericana, 21(2) (2005).
  • [8] C. Fefferman, Extension of Cm,w-Smooth Functions by Linear Operators, manuscript available at: www.math.princeton.edu/ facultypapers/Fefferman To appear in Revista Matematica Iberoamericana.
  • [9] C. Gonzaga, Two Facts on the convergence of the Cauchy Algorithm. Journal of Optimization Theory and Applications, 107(3) (2000), 591-6000.
  • [10] L. Hörmander, The Analysis of Linear Partial Differential Operators I Springer Verlag, Heidelberg, 1983.
  • [11] J. Nocedal, Theory of algorithms for unconstrained optimization. Acta numerica, 1 (1992), 199-242.
  • [12] J. Nocedal and S. Wright, Numerical Optimization Springer series in operations research, Springer, 1999.
  • [13] W.F. Mascarenhas, The BFGS method with exact line searches fails for non-convex objective functions. Math. Prog. Ser B., 99 (2004), 49-61.
  • [14] W.F. Mascarenhas, Newton's iterates can converge to non stationary points, Manuscript Submitted to Mathematical Programming. Preliminary version available on line at: www.ime.usp.br/ ~walterfm/pub/newtonFails.pdf
  • [15] M. Powell: Nonconvex minimization calculations and the conjugate gradient method. In: D.F. Griffiths, ed., Numerical Analysis, Lecture Notes in Mathematics 1066, Springer Verlag, Berlin, (1984), 122-141.
  • [16] M. Powell, Convergence Properties of Algorithms for Nonlinear Optimization SIAM Review, 28(4) (1986), 487-500.
  • [17] H. Whitney, Analytic extensions of differentiable functions defined in closed sets. Trans. Amer. Math. Soc., 36 (1934), 63-89.
  • 1
    The matrices
    hk are not positive definite and in practice one would take another search direction if, for example, this fact was detected during a Cholesky factorization of
    hk. However, to keep the algebra as simples as possible, in this work we do not enforce the condition that the Hessians Ñ
    2
    f(
    xk) are s.p.d. In [14] we show that Newton's method may fail even Ñ
    2
    f(
    xk) is s.p.d. for all
    k and the Wolfe conditions are satisfied.
  • 2
    D'(l) here is the derivative of
    D(l) with respect to l.
  • 3
    By strict convexity along the search line we mean that the directional second derivatives
    Ñ
    2
    F(
    xk + wsk)
    sk are positive.
  • Publication Dates

    • Publication in this collection
      10 May 2007
    • Date of issue
      2007

    History

    • Received
      15 Apr 2006
    • Accepted
      15 Jan 2007
    Sociedade Brasileira de Matemática Aplicada e Computacional Sociedade Brasileira de Matemática Aplicada e Computacional - SBMAC, Rua Maestro João Seppe, nº. 900 , 16º. andar - Sala 163, 13561-120 São Carlos - SP Brasil, Tel./Fax: 55 16 3412-9752 - São Carlos - SP - Brazil
    E-mail: sbmac@sbmac.org.br