Acessibilidade / Reportar erro

Global convergence of a regularized factorized quasi-Newton method for nonlinear least squares problems

Abstract

In this paper, we propose a regularized factorized quasi-Newton method with a new Armijo-type line search and prove its global convergence for nonlinear least squares problems. This convergence result is extended to the regularized BFGS and DFP methods for solving strictly convex minimization problems. Some numerical results are presented to show efficiency of the proposed method. Mathematical subject classification: 90C53, 65K05.

factorized quasi-Newton method; nonlinear least squares; global convergence


Global convergence of a regularized factorized quasi-Newton method for nonlinear least squares problems

Weijun ZhouI; Li ZhangII

ICollege of Mathematics and Computational Science, Changsha University of Science and Technology, Changsha 410004, China. E-mails: weijunzhou@126.com

IICollege of Mathematics and Computational Science, Changsha University of Science and Technology, Changsha 410004, China E-mails: mathlizhang@126.com

ABSTRACT

In this paper, we propose a regularized factorized quasi-Newton method with a new Armijo-type line search and prove its global convergence for nonlinear least squares problems. This convergence result is extended to the regularized BFGS and DFP methods for solving strictly convex minimization problems. Some numerical results are presented to show efficiency of the proposed method.

Mathematical subject classification: 90C53, 65K05.

Key words: factorized quasi-Newton method, nonlinear least squares, global convergence.

1 Introduction

The objective of this paper is to present a globally convergent factorized quasi-Newton method for the nonlinear least squares problem

Here r(x) = (r1(x), ..., rm(x))T is called the residual function which is nonlinear and ║·║ denotes the Euclidean norm. It is easy to show that the gradient and the Hessian of ƒ are given by

where J(x) is the Jacobian matrix of r(x),

and ∇2ri(x) is the Hessian matrix of ri(x).

The nonlinear least squares problem has many applications in applied areas such as data fitting and nonlinear regression in statistics. Many efficient algorithms have been proposed to solve this problem. The special structure of the Hessian leads to a lot of special methods which have different convergence properties for zero and nonzero residual problems [5, 6, 8, 20].

The Gauss-Newton method and the Levenberg-Marquardt method are two well-known methods which have locally quadratic convergence rate for zero residual problems [6]. However, they may converge slowly or even diverge when the problem is nonzero residual or the residual function is highly nonlinear [1]. The main reason is that both methods only use the first order information of ƒ.

To solve nonzero residual problems more efficiently, some structured quasi-Newton methods using the second order information of ƒ have been proposed. These methods are shown to be superlinearly convergent for both zero and nonzero residual problems [5, 8]. However, the iterative matrices of structured quasi-Newton methods are not necessarily positive definite. Therefore the search directions may not be descent directions of ƒ when some line search is used.

To guarantee the positive definite property of the iterative matrices, some factorized quasi-Newton methods have been proposed [15, 16, 18, 19, 20, 21, 22], where the search direction is given by

where Jk = J(xk), rk = r(xk) and Lk is updated according to certain quasi-Newton formula. These methods have been proved to be superlinearly convergent for both zero and nonzero residual problems. But they may not possess quadratical convergence rate for zero residual problems. Based on the idea of [10], Zhang et al. [24] proposed a family of scaled factorized quasi-Newton methods which not only have superlinear convergence rate for nonzero residual problems, but also have quadratical convergence rate for zero residual problems. Under suitable conditions, the iterative matrices of factorized quasi-Newton methods are proved to be positive definite if the initial point is close to the solution point [20, 24].

However, all iterative matrices of factorized quasi-Newton methods may notbe positive definite if the initial point is far from the solution. Therefore, the search direction may not be descent. This is a drawback of factorized quasi-Newton methods to have global convergence when the line search methods are used. Another difficulty for the global convergence is that the iterative matrices and their inverses may not be uniformly bounded. To the best of our knowledge, global convergence of factorized quasi-Newton methods has not been established.

The paper is organized as follows. In Section 2, we propose a regularized factorized quasi-Newton method which guarantees that the iterative matrix is positive definite at each step. We use a new Armijo-type line search to compute the stepsize. Under suitable conditions, we prove that the proposed method converges globally for nonlinear least squares problems. In Section 3, we extend this result to the BFGS and DFP methods for general nonlinear optimization. We show that the regularized BFGS and DFP methods converge globally for strictly convex objective functions. In Section 4, we compare the performance of the proposed method to some existing methods and present some numerical results to show efficiency of the proposed method.

2 Algorithm and global convergence

In this section, we first present the regularized factorized quasi-Newton algorithm and then analyze its global convergence property. The motivation of the method is that the positive definite property of the iterative matrix can be guaranteed by the use of the Levenberg-Marquardt regularization technique. It is important for global convergence of the method to choose suitable regularization parameter and stepsize carefully. Now we give the details of the method.

Algorithm 2.1 (Regularized factorized quasi-Newton method)

Step 1. Give the starting point x0Rn, L0Rm×n, δ, ρ ∈ (0,1), 1 > 0, 2 > 0, r > 0, M > 0. Choose a positive constant β > > 1. Let k := 0.

Step 2. Compute dk by solving the linear equations

where

and

Here ║Bk║ is referred to the Frobenius norm of Bk. For simplicity, we denote

Step 3. (i) If kK2, compute stepsize αk = max{ρ0,ρ1,...} such that

(ii) If kK1, we consider the following two cases.

Case (1): ƒ(xk+dk) < ƒ(xk)+δ

dk. If ƒ(xk+βdk) > ƒ(xk+dk)+δβ
dk, set αk = 1; Otherwise, compute stepsize αk = max{β1,β2,...} satisfying

Case (2): ƒ(xk+dk) > ƒ(xk)+δ

dk. Compute stepsize αk= max{ρ1,...} such that

Step 4. Set xk+1 = xk + αkdk. Update Lk to get Bk+1 = (Jk+1+ Lk+1)T (Jk+1+ Lk+1) by certain quasi-Newton formula. Let k := k + 1 and go to Step 2.

Remark 2.1. (i) Since the matrix Bk is positive semidefinite, the iterativematrix Bk + µkI is positive definite, which ensures that the search direction dk is a descent direction, that is, dk < 0. The choice of µk is based on the ideas of [9, 12, 23].

(ii) The line search (2.4) is different from the standard Armijo line search (2.3) since β > 1 in (2.4). This new Armijo-type line search can accept stepsize as large as possible in Case (1). The following proposition shows that it is well-defined.

Proposition 2.1. The Algorithm 2.1 is well-defined.

Proof. We only need to prove that the line search (2.4) terminates finitely. If it is not true, then for infinite many m, we have

Let m → ∞ in the above inequality, then from β > 1, ƒ(x) > 0 for all x and ƒ(xk + βmdk) < ƒ(xk) < ƒ(x0), we have

which leads to a contradiction. □

In the convergence analysis of Algorithm 2.1, we need the following assumption.

Assumption A.

The level set Ω = {xRn| ƒ(x) < ƒ(x0)} is bounded.

In some neighborhood N of Ω, ƒ is continuously differentiable and its gradient is Lipschitz continuous, namely, there exists a constant L > 0 such that

It is clear that the sequence {xk} generated by Algorithm 2.1 is contained in Ω. Moreover, the sequence {ƒ(xk)} is a descent sequence. Therefore it has a limit ƒ*, that is,

In addition, from Assumption A we get that there is a positive constant γ such that

Now we give some useful lemmas for the convergence analysis of the algorithm.

Lemma 2.2. Let Assumption A hold. Then we have

Proof. It follows directly from (2.3), (2.4), (2.5) and (2.7). □

For convenience, we denote index sets

Then we have

K1 = K3K4.

Lemma 2.3. If k K1, then there exists a constant c1 > 0 such that

Proof. If kK3, then from the line search (2.4), we have

By the mean values theorem and (2.6), we have

The above two inequalities implies

where c1 = > 0. Here we use the conditions β > and δ ∈ (0,1).

If kK4, then from the line search (2.5), we have

Therefore the inequality (2.10) also holds by using similar proof in the case kK3. The proof is then completed since K1 = K3K4. □

Lemma 2.4. If k K2, then there exists a constant c2 > 0 such that

Proof. For kK2, if αk≠ 1, then by the line search (2.3), we also have

Then the conclusion follows directly from the same argument as Lemma 2.3. □

The proof of the following lemma is similar to that of Theorem 2.2.2 of [12], for completeness, we present the proof here.

Lemma 2.5. Let Assumption A hold and the sequence {xk} be generated byAlgorithm 2.1. If K1is infinite, then we have

Proof. If kK1, then µk = 1Bk║. Now we set k = αk/║Bk║ and k = ║Bkdk, then we have

We have from (2.1) that

From Lemma 2.3 and (2.12), we get

This inequality together with (2.13) shows that

It follows from (2.9) and (2.12) that

The inequalities (2.15), (2.16) and (2.13) imply that

Therefore from the above equality and (2.1) we have

The proof is then finished. □

The following theorem shows that Algorithm 2.1 is globally convergent.

Theorem 2.6. Let Assumption A hold and the sequence {xk} be generated by Algorithm 2.1, then we have

Proof. We suppose that the conclusion of the theorem is not true. Then there exists a constant ε > 0 such that for any k > 0, it holds that

From Lemma 2.5, we only need to consider the case that K1 is finite. Therefore there exists k0 > 0 such that for all k > k0, µk = 2gkr.

It follows from (2.1) and (2.17) that

This inequality together with Lemma 2.4 and (2.9) means that

From (2.1), (2.17) and the definition of K2, we have

It follows from the above inequality and (2.18) that

which contradicts with (2.17). This finishes the proof. □

3 Application to nonlinear optimization

In this section, we will extend the result of Section 2 to the BFGS method and theDFP method for the general nonlinear optimization problem

where ƒ : RnR is continuously differentiable.

The BFGS method and the DFP method are two well-known quasi-Newton methods for solving (3.1). Their updated formulas are given by

where yk = gk+1– gk, and sk = xk+1– xk. An important property of both methods is that Bk+1 can inherit the positive definiteness of Bk if yk > 0 [6]. If ƒ is strictly convex or the Wolfe line search is used, then yk > 0 and Bk+1 is well-defined.

During the past three decades, global convergence of the BFGS and DFP methods has received growing interests. When ƒ is convex and the exact line search is used, Powell proved that both methods converge globally [17]. When the Wolfe line search is used, Byrd et al. [3] proved the global convergence of the convex Broyden's class except for the DFP method. Byrd and Nocedal [2] obtained global convergence of the BFGS method with the standard Armijo line search for strongly convex function. For nonconvex optimization, counterexamples in [4, 13] show that the BFGS method may not converge globally when using exact line search or the Wolfe line search.

Li and Fukushima [11] proposed a modified BFGS method which possesses global and superlinear convergence even for nonconvex functions. Zhou and Zhang [25] extended this result to the nonmonotone case. Zhou and Li [26] proposed a modified BFGS method for nonlinear monotone equations and established its global convergence. But these modified BFGS formulas destroy the affine invariability of the standard BFGS formula. To overcome this drawback, Liu [12] proposed a regularized BFGS method and proved that thismethod converges globally for nonconvex functions if the Wolfe line search is used. Liu [12] also proposed a question that whether the regularized BFGS method is globally convergent for strictly convex or nonconvex functions if Armijo-type line search is used. The answer is positive.

In fact, if we update Bk in Algorithm 2.1 by the BFGS update foumula or DFP update formula, then a direct consequence of Theorem 2.6 is that the regularized BFGS and DFP methods converge globally for strictly convex functions.

Corollary 3.1. Let Assumption A hold. If ƒ is strictly convex, then for the regularized BFGS and DFP methods, we have lim infk → ∞gk║ = 0.

4 Numerical experiments

In this section we compare the number of iterations and function evaluations performance of the proposed method to some existing methods such as the Gausss-Newton method for nonlinear least squares problems. The details of six methods are listed as follows.

(1) The Gauss-Newton method, denoted GN, which has the search direction given by Bkd = –gk, Bk = Jk. We compute the stepsize by the standard Armijo line search, that is, compute stepsize αk = max{ρ0, ρ1,...} satisfying

where we choose δ = 0.1, ρ = 0.5.

(2) The Levenberg-Marquardt method, denoted LM, whose search direction is defined by Bkd = –gk, Bk = Jk + µkI, µk = ║gk║. The stepsize is computed by (4.1).

(3) The factorized BFGS method [20], denoted FBFGS, whose search direction is given by Bkd = –gk, and Bk is updated by

where

The stepsize is obtained by (4.1).

(4) Algorithm 2.1, denoted R-FBFGS, where Bk is updated by (4.2). Here we use the following values for the parameters: δ = 0.1, ρ = 0.5, β = 2, M = 104, 1 = 10–8, 2 = 1 and r = 1.

(5) The scaled factorized BFGS method [24], denoted SFBFGS, whose search direction is given by Bkd = –gk, and Bk is updated by

where

The stepsize is computed by (4.1).

(6) Algorithm 2.1, denoted R-SFBFGS, where Bk is updated by (4.3). Here we use the same parameters as R-FBFGS.

All codes were written in Matlab 7.4. In our experiments, we stopped whenever ║gk║ < 10–4 or ƒ(xk) – ƒ(xk+1) < 10-12 max(1, ƒ(xk+1)) or the number of iterations exceeds 104. In the methods R-FBFGS and R-SFBFGS, we set Lk+1 = 0 when zk < 10–20. The test problems are from [14]. Table-0 lists the name and related information of the test problems, where Z, S and L stand for zero residual, small residual and large residual problems, respectively. Tables 1-4 are numerical results of these methods, where

  • iter(it1) is the number of iterations(the number of iterations using the line search (2.4) for the methods R-FBFGS and R-SFBFGS);

  • fn and normg are the number of the function evaluations and the norm of the gradient at the stopping point, respectively;

  • fv and Av stand for the functional evaluation at the stopping point and the average of corresponding measure index, respectively;

  • * means that the number of iterations exceeds 104;

  • Sp is the starting point for the problems BD, VARDIM and KOWOSB. Here we use seven different starting points, that is, x1 = (103, ..., 103)T, x2 = (102, ...,102)T, x3 = (101, ..., 101)T, x4 = (1,..., 1)T, x5 = (10–1, ..., 10–1)T, x6 = (10–2,...,10–2)T, x7 = (10–3,..., 10–3)T.

As can be seen in Table 2, the method R-SFBFGS performed best for theproblem BD since it requires the least number of iterations and function evaluations. The method R-FBFGS was faster than the methods LM and GN. Table 2 also shows that the method LM may not converge for this large residual problem since it fails to solve this problem within 104 iterations when chosen the starting points x1 and x2. Tables 1 and 3 indicate that the method GN was the most efficient method for zero residual problems.

From Tables 1-4, we can see that the methods R-FBFGS and R-SFBFGS were stablest, which shows that both methods can ultimately converge to some local stationary points for the given test problems. Moreover, we observe that the line search (2.4) was rarely used, but it is very efficient for the large residual problems.

In order to show the number of iterations or function evaluations performance of the six methods more clearly, we made Figures 1-2 according to the dada in Tables 1-4 by using the performance profiles of Dolan and Moré [7].


Since the top curve in Figures 1-2 corresponds to GN when τ < 1, this method is clearly the most efficient method for this set of 35 test problems. GN needs the least number of iterations and function evaluations for about 48% and 51% of the test problems. The possible reason is that most test problems are zero or very small residual problems. However, GN only solves 87% of the test problems successfully while the methods R-FBFGS and R-SFBFG can solve all test problems. For τ > 2, the top curves correspond to R-FBFGS and R-SFBFGS, which shows that both methods are best within a factor τ with respective to the number of iterations and function evaluations.

Conclusions

We have proposed a regularized factorized quasi-Newton method with a new Armijo-type line search for nonlinear least squares problems. Under suitable conditions, global convergence of the proposed method is established. We also compared the performance of the proposed method to some existing methods. Numerical results show that the proposed method is promising.

Acknowledgement. Part work of the first author was done when he was visiting Hirosaki University. This work was partially supported by the NSF foundation (10901026) of China, a project (09B001) of Scientific Research Fund of Hunan Provincial Education Department, the Hong Kong Polytechnic University Postdoctoral Fellowship Scheme and a scholarship from the Japanese Government.

Received: 07/IX/09.

Accepted: 11/X/09.

#CAM-126/09.

  • [1] M. Al-Baali and R. Fletcher, Variational methods for non-linear least squares. J. Oper. Res. Soc., 36 (1985), 405-421.
  • [2] R.H. Byrd and J. Nocedal, A tool for the analysis of quasi-Newton methods with application to unconstrained minimization. SIAM J. Numer. Anal., 26 (1989), 727-739.
  • [3] R.H. Byrd, J. Nocedal and Y. Yuan, Global covergence of a class of quasi-Newton method on convex problens. SIAM J. Numer. Anal., 24 (1987), 1171-1190.
  • [4] Y.H. Dai, Convergence properties of the BFGS algorithm. SIAM J. Optim., 13 (2002), 693-701.
  • [5] J.E. Dennis, H.J. Martínez and R.A. Tapia, Convergence theory for the structured BFGS secant method with application to nonlinear least squares. J. Optim. Theory Appl., 61 (1989), 161-178.
  • [6] J.E. Dennis and H.J. Schnabel, Numerical methods for unconstrained optimization and nonlinear equations. Prentice-Hall, Englewood Cliffs, NJ (1983).
  • [7] E.D. Dolan and J.J. Moré, Benchmarking optimization software with performance profiles. Math. Program., 91 (2002), 201-213.
  • [8] J.R. Engels and H.J. Martínez, Local and superlinear for partially known quasi-Newton methods. SIAM J. Optim., 1 (1991), 42-56.
  • [9] J. Fan and Y. Yuan, On the quadratic convergence of the Levenberg-Marquardt method without nonsingularity assumption. Computing, 74 (2005), 23-39.
  • [10] J. Huschens, On the use of product structure in secant methods for nonlinear least squares problems. SIAM J. Optim., 4 (1994), 108-129.
  • [11] D.H. Li and M. Fukushima, A modified BFGS method and its global convergence in nonconvex minimization. J. Comput. Appl. Math., 129 (2001), 15-35.
  • [12] T.W. Liu, BFGS method and its application to constrained optimization. Ph.D. thesis, College of Mathematics and Econometrics, Hunan University, Changsha, China (2007).
  • [13] W.F. Mascarenhas, The BFGS method with exact line searches fails for non-convex objective functions. Math. Program., 99 (2004), 49-61.
  • [14] J.J. Moré, B.S. Garbow, K.E. Hillstrom, Testing unconstrained optimization software. ACM Trans. Math. Softw., 7 (1981), 17-41.
  • [15] H. Ogasawara, A note on the equivalence of a class of factorized Broyden families for nonlinear least squares problems. Linear Algebra Appl., 297 (1999), 183-191.
  • [16] H. Ogasawara and H. Yabe, Superlinear convergence of the factorized structured quasi-Newton methods for nonlinear optimization. Asia-Pacific J. Oper. Res., 17 (2000), 55-80.
  • [17] M.J.D. Powell, On the convergence of the variable metric algorithms. J. Inst. Math. Appl., 7 (1971), 21-36.
  • [18] C.X. Xu, X.F. Ma and M.Y. Kong, A class of factorized quasi-Newton methods for nonlinear least squares problems. J. Comput. Math., 14 (1996), 143-158.
  • [19] H. Yabe and T. Takahashi, Structured quasi-Newton methods for nonlinear least squares problems. TRU Math., 24 (1988), 195-209.
  • [20] H. Yabe and T. Takahashi, Factorized quasi-Newton methods for nonlinear least squares problems. Math. Program., 51 (1991), 75-100.
  • [21] H. Yabe and N. Yamaki, Convergence of a factorized Broyden-like family for nonlinear least squares problems. SIAM J. Optim., 5 (1995), 770-791.
  • [22] H. Yabe and N. Yamaki, Local and superlinear convergence of structured quasi-Newton methods for nonlinear optimization. J. Oper. Res. Soc. Japan, 39 (1996), 541-557.
  • [23] N. Yamashita and M. Fukushima, On the rate of convergence of the Levenberg-Marquardt method. Computing (Supp.), 15 (2001), 237-249.
  • [24] J.Z. Zhang, L.H. Chen and N.Y. Deng, A family of scaled factorized Broyden-like methods for nonlinear least squares problems. SIAM J. Optim., 10 (2000), 1163-1179.
  • [25] W. Zhou and L. Zhang, Global convergence of the nonmonotone MBFGS method for nonconvex unconstrained minimization. J. Comput. Appl. Math., 223 (2009), 40-47.
  • [26] W. Zhou and D. Li, A globally convergent BFGS method for nonlinear monotone equations without any merit functions. Math. Comput., 77 (2008), 2231-2240.

Publication Dates

  • Publication in this collection
    22 July 2010
  • Date of issue
    June 2010

History

  • Accepted
    11 Oct 2009
  • Received
    07 Sept 2009
Sociedade Brasileira de Matemática Aplicada e Computacional Sociedade Brasileira de Matemática Aplicada e Computacional - SBMAC, Rua Maestro João Seppe, nº. 900 , 16º. andar - Sala 163, 13561-120 São Carlos - SP Brasil, Tel./Fax: 55 16 3412-9752 - São Carlos - SP - Brazil
E-mail: sbmac@sbmac.org.br