Acessibilidade / Reportar erro

On the Preconditioned Delayed Weighted Gradient Method

ABSTRACT

In this article a preconditioned version of the Delayed Weighted Gradient Method (DWGM) is presented and analyzed. In addition to the convergence, some nice properties as the A- orthogonality of the current transformed gradient with all the previous gradient vectors as well as finite convergence are demonstrated. Numerical experimentation is also offered, exposing the benefits of preconditioning.

Keywords:
gradient methods; convex quadratic optimization; Krylov subspace methods; preconditioning

1 INTRODUCTION

Many real-life applications lead to large-scale convex quadratic optimization problems. Among other, low-cost gradient methods have widely been effective in solving challenging instances of this class of optimization problems (see for instance 88 E. Birgin, I. Chambouleyron & J.M. Martínez. Estimation of the optical constants and the thickness of thin films using unconstrained optimization. Computational Physics, (151) (1999), 862-880.), (1616 M.A. Figueiredo, R. Nowak & S. Wright. Projection for sparse reconstruction: application to compressed sensing and other inverse problems. IEEE J. Sel. Top. Signal Process, (1) (2007), 586-597.), (2626 T. Serafini, G. Zanghirati & L. Zanni. Gradient projection methods for large quadratic programs and applications in training support vector machines. Optimization Methods and Software , (20) (2005), 353-378. and references therein). Gradient methods for the unconstrained minimization problem

minimize x n f ( x )

generate a sequence of solution approximations x k satisfying

x k + 1 = x k - α k g k ,

where f:n is continuously differentiable, gk=f(xk) and α k > 0. The selection of the step length α k depends on the chosen method. The classical steepest decent (SD) method was proposed in 1010 A. Cauchy. Méthode générale pour la résolution des systemes d’équations simultanées. Comptes Rendus Sci. Paris, 25 (1847), 536-538. to solve nonlinear systems of equations. In this case,

α k SD = argmin α f ( x k - α g k ) . (1.1)

Assuming the objective function f to be a strictly convex quadratic function, that is, for a sym- metric and positive definite (SPD) matrix A ∈ ℝn×n , the unconstrained optimization problem becomes

minimize x n f ( x ) = 1 2 x T A x - b T x . (1.2)

It is well known that the unique minimizer of this problem also solves the linear equation Ax = b, and so we can denote it by A −1 b. Under this assumption, simple calculations on (1.1) give

α k SD = g k T g k g k T A g k .

It is proven the SD method converges Q-linearly 11 H. Akaike. On a successive transformation of probability distribution and its application to the analysis of the optimum gradient method. Annals of the Institute of Statistical Mathematics, 11 (1959), 1-16.. Instead of minimizing the objective function f, the Minimum Gradient (MG) step length 33 R.D. Asmundis, D. di Serafino, W.W. Hager, G. Toraldo & H. Zhang. An efficient gradient method using the Yuan steplength. Computational Optimization and Applications , 59 (2014), 541-563. aims to minimize the gradient norm, i.e.

α k MG = argmin α | | f ( x k - α g k ) | | 2 ,

which can be written as

α k MG = g k T A g k g k T A 2 g k . (1.3)

Despite the optimal properties on the definitions of αkSD and αkMG, the steepest descent method converges slowly and is badly affected by ill conditioning (see 11 H. Akaike. On a successive transformation of probability distribution and its application to the analysis of the optimum gradient method. Annals of the Institute of Statistical Mathematics, 11 (1959), 1-16. and 1818 G.E. Forsythe & T.S. Motzkin. Asymptotic properties of the optimum gradient method, Preliminary report. Bull. Amer. Math. Soc, 57 (1951), 304-305.). An overcome for this issue was proposed by Barzilai and Borwein 55 J. Barzilai & J.M. Borwein. Two-point step size gradient methods. IMA Journal of Numerical Analysis , 8(1) (1988), 141-148.. The Barzilai-Borwein (BB) methods is based on a secant condition. They propose two possible step sizes

α k BB1 = argmin α s ~ k - 1 - α y ~ k - 1 2 and α k BB2 = argmin α 1 α s ~ k - 1 - α y ~ k - 1 2 ,

where s~k-1=xk-xk-1 and y~k-1=f(xk)-f(xk-1). Thus obtaining, respectively

α k BB1 = s ~ k - 1 2 2 s ~ k - 1 T y ~ k - 1 and α k BB2 = s ~ k - 1 T y ~ k - 1 y ~ k - 1 2 2 .

If we restrict f to be a strictly convex quadratic function, we obtain

α k BB1 = g k - 1 T g k - 1 g k - 1 T A g k - 1 = α k - 1 SD and α k BB2 = g k - 1 T A g k - 1 g k - 1 T A 2 g k - 1 = α k - 1 MG .

which satisfy 1414 D. di Serafino, V. Ruggiero, G. Toraldo & L. Zanni. On the steplength selection in gradient methods for unconstrained optimization. Applied Mathematics and Computation, 318 (2018), 176-195.

1 λ 1 α k BB2 α k BB1 1 λ n

where, λ 1 and λ n are the maximum and the minimum eigenvalues of A, respectively. Basically, the BB step-sizes coincide with SD and MG with a retard of −1. The BB methods converge R-linearly 1212 Y.H. Dai & L.Z. Liao. R-linear convergence of the Barzilai and Borwein gradient method. IMA Journal of Numerical Analysis , 22(1) (2002), 1-10.. Friedlander et al. 1919 A. Friedlander, J.M. Martínez, B. Molina & M. Raydan. Gradient method with retards and generalizations. SIAM Journal on Numerical Analysis, 36(1) (1999), 275-289. generalized the BB approach and proposed the Gradient Methods with Retards (GMR). In this case,

α k GMR = g ν ( k ) T g ν ( k ) g ν ( k ) T A g ν ( k ) = α ν ( k ) SD ,

where ν(k) is chosen in the set {k, k − 1, max{0, km}} and m is a given positive integer. Yuan 2929 Y.X. Yuan. A new stepsize for the steepest decent method. Journal of Computational Mathematics, 24(2) (2006), 149-156. proposed the following step length which also includes retard in its definition,

a k Y = 2 1 α k - 1 SD + 1 α k S D + 1 α k - 1 SD - 1 α k S D 2 + 4 g k 2 2 ( α k - 1 SD g k - 1 2 2 ) 2 - 1 .

Yuan’s method present a good performance for small-scale problems. Note that all the gradient method variants presented so far are one-step methods.

Methods which use two step-sizes are alternatives to accelerate gradient based methods, by im- posing retard on the process (see 1111 Y.H. Dai, Y. Huang & X.W. Liu. A family of spectral gradient methods for optimization. Computational Optimization and Applications , 74 (2019), 43-65.). Recently, Oviedo-Leon 2525 H.F. Oviedo-Leon. A delayed weighted gradient method for strictly convex quadratic minimization. Computational Optimization and Applications , 74 (2019), 729-746. proposed to combine a smoothing technique with a delayed gradient step to construct the promising Delayed Weighted Gradient Method (DWGM), which exhibits an impressive fast convergence behavior that com- pares favorable with the conjugate gradient method (CG), sharing other nice properties as finite termination and A-orthogonality of its iterated points (gradients) (see 22 R. Andreani & M. Raydan. Properties of the delayed weighted gradient method. Computational Optimization and Applications, 78 (2021), 167-180.), (2525 H.F. Oviedo-Leon. A delayed weighted gradient method for strictly convex quadratic minimization. Computational Optimization and Applications , 74 (2019), 729-746.). The DWGM can be seen as a variant of the parallel tangent method (PARTAN) in the sense that it uses two line searches in the iteration, with information of previous points to accelerate the gradient method 1818 G.E. Forsythe & T.S. Motzkin. Asymptotic properties of the optimum gradient method, Preliminary report. Bull. Amer. Math. Soc, 57 (1951), 304-305.), (2727 B.V. Shah, R.J. Buehler & O. Kempthorne. Some Algorithms for Minimizing a Function of Several Variables. Journal of the Society for Industrial and Applied Mathematics, 12(1) (1964), 74-92.), (2828 H. Sorenson. Comparison of some conjugate direction procedures for function minimization. Journal of the Franklin Institute, 288(6) (1969), 421-441.. In 2424 J.L. Lamotte, B. Molina & M. Raydan. Smooth and adaptive gradient method with retards. Mathematical and Computer Modelling, 36 (2002), 1161-1168. a smoothing technique is introduced to prevent the so called zigzagging behavior on the sequence of the gradient norms, which is characteristic of CG methods.

In most practical applications, it is convenient to introduce preconditioning to accelerate the convergence of the process. Preconditioners are useful in iterative methods to solve a linear sys- tem. Since the larger the condition number, the larger the rate of convergence, for most iterative solvers, and the preconditioning is meant to decrease the condition number, then it is expected to improve the convergence rate (see for example 77 D. Bertaccini & F. Durastante. “Iterative Methods and Preconditioning for Large and Sparse Linear Systems with Applications”. Chapman and Hall/CRC, New York (2018).). Given the positive definite matrix C 2 (precon- ditioner), the two sided preconditioning designated to deal with the system of equations Ax = b, first solves the related system C −1 AC −1 y = C −1 b, with better condition number, and then Cx = y to obtain the solution x. There are different choices for preconditioning matrices, as Jacobi, in- complete LU factorization, incomplete Cholesky factorization, successive over-relaxation, etc. Preconditioning has been widely implemented to deal with conjugate gradient type methods, (see 77 D. Bertaccini & F. Durastante. “Iterative Methods and Preconditioning for Large and Sparse Linear Systems with Applications”. Chapman and Hall/CRC, New York (2018).). In 22 R. Andreani & M. Raydan. Properties of the delayed weighted gradient method. Computational Optimization and Applications, 78 (2021), 167-180. was noticed finite termination of DWGM in p < n iterations, when the n × n Hessian matrix has only p distinct eigenvalues, as it also happens with CG methods, motivating the use of preconditioning strategies when solving large-scale symmetric and positive definite linear systems. The aim of this work is to propose, analyze and exhibit the numerical behavior of the preconditioned version of the recently introduced DWGM algorithm.

The remainder of this article is organized as follows: In the next section we describe the DWGM, and expose some of its properties. In section 3 we develop the preconditioned version. The anal- ysis of our preconditioned version is the topic of section 4. Numerical experimentation, and conclusions are the topics of sections 5 and 6 respectively.

2. DELAYED WEIGHTED GRADIENT METHOD

We consider the strictly convex quadratic minimization problem (1.2). Since the gradient g(x)f(x)=Ax-b, then the unique global solution A −1 b for the problem (1.2) also solves the linear system Ax = b. For large n, many low cost iterative methods have been proposed and analyzed. The so-called gradient type methods emerge as competitive choices since they show fast linear convergence (see 33 R.D. Asmundis, D. di Serafino, W.W. Hager, G. Toraldo & H. Zhang. An efficient gradient method using the Yuan steplength. Computational Optimization and Applications , 59 (2014), 541-563.), (44 R.D. Asmundis, D. di Serafino, F. Riccio & G. Toraldo. On spectral properties of steepest descent methods. IMA Journal of Numerical Analysis, 33(4) (2013), 1416-1435.), (99 E.G. Birgin, J.M. Martínez & M. Raydan. Spectral projected gradient methods: review and perspectives. Journal of Statistical Software, 60(3) (2014), 1-21.), (1111 Y.H. Dai, Y. Huang & X.W. Liu. A family of spectral gradient methods for optimization. Computational Optimization and Applications , 74 (2019), 43-65.), (1414 D. di Serafino, V. Ruggiero, G. Toraldo & L. Zanni. On the steplength selection in gradient methods for unconstrained optimization. Applied Mathematics and Computation, 318 (2018), 176-195.), (2121 Y. Huang, Y.H. Dai, X.W. Liu & H. Zhang. Gradient methods exploiting spectral properties. Optimization Methods and Software, 35(4) (2020), 681-705.).

From a starting point x 0 ∈ ℝn , consider gk=g(xk). The minimum gradient method (MG) (see 33 R.D. Asmundis, D. di Serafino, W.W. Hager, G. Toraldo & H. Zhang. An efficient gradient method using the Yuan steplength. Computational Optimization and Applications , 59 (2014), 541-563.) is given by the iteration

x k + 1 = x k - α k MG g k .

Here, the step-size αkMG was defined in (1.3). Denoting wk:=Agk, we can write the step size as

α k MG = g k T w k w k 2 2 .

The minimum gradient norm method calculates the next iterated point as the vector alongside the current gradient at which the norm of the next gradient is minimized.

As a two step gradient method, DWGM incorporates a delaying step defined as follows 2525 H.F. Oviedo-Leon. A delayed weighted gradient method for strictly convex quadratic minimization. Computational Optimization and Applications , 74 (2019), 729-746.: The first stage uses the ordinary minimum gradient point yk=xk-αkMGgk. Then, by defining βk=argminβf(xk-1+β(yk-xk-1))2, at the second stage

x k + 1 = x k - 1 + β k ( y k - x k - 1 )

is calculated. It is straightforward to see that f(xk-1+β(yk-xk-1))=gk-1-β(gk-1-rk), for rk=gk-αkwk. This leads to

β k = g k - 1 T ( g k - 1 - r k ) / g k - 1 - r k 2 2 .

By merging the definition of y k into x k+1 and after simple manipulation, the next iterated point can be rewritten as

x k + 1 = ( 1 - β k ) x k - 1 + β k x k - β k α k g k . (2.1)

This expression leads us to interpret the choice of β k as that it decides simultaneously for choos- ing a point on the line passing through the two previous iterated points, and how much to ad- vance alongside the current gradient direction in such way that the gradient on the next iteration is minimized. The resulting DWGM algorithm is:

Algorithm 1
DWGM

Some of the properties that DWGM enjoys, established in 2525 H.F. Oviedo-Leon. A delayed weighted gradient method for strictly convex quadratic minimization. Computational Optimization and Applications , 74 (2019), 729-746. and 22 R. Andreani & M. Raydan. Properties of the delayed weighted gradient method. Computational Optimization and Applications, 78 (2021), 167-180. include the nonnegativity of β k for all k, the monotonic decreasing of {∥g k2} as well as the Q-linear convergence of {g k } to zero when k goes to infinity (which implies that {x k } converges to the unique global minimizer of f ), and finite convergence by using A-orthogonality of the gradient vector at the current iteration with all previous gradient vectors.

3. PRECONDITIONED DWGM

In this section we present the preconditioned DWGM derivation. Preconditioners play an impor- tant role in the theory of iterative methods. Quoting 77 D. Bertaccini & F. Durastante. “Iterative Methods and Preconditioning for Large and Sparse Linear Systems with Applications”. Chapman and Hall/CRC, New York (2018). “... lack of robustness and sometimes erratic convergence behavior are recognized weakness of iterative solvers. These issues hamper the acceptance of iterative methods despite their intrinsic appeal for very large linear systems. Both the efficiency and robustness of iterative techniques can be very much improved by using preconditioning.”

Let M be a positive definite preconditioner, then exists an unique symmetric positive definite matrix C, such that M = C 2 (see 2020 G.H. Golub & C.F. Van Loan. “Matrix Computations”. Johns Hopkins Studies in the Mathematical Sciences. Johns Hopkins University Press, 4th ed. (2012).). This positive definite matrix, as well as it inverse allow us to define the so-called “hat” transformations, that conduce us to transform the space where the original iterated point x k belongs into the transformed space where the x^k are calculated:

A ^ = C - 1 A C - 1 ; b ^ = C - 1 b ; x ^ = C x . (3.1)

Note that  is SPD. Let us define the auxiliary convex quadratic minimization problem

minimize x ^ n f ^ ( x ^ ) = 1 2 x ^ T A ^ x ^ - b ^ T x ^ . (3.2)

Evaluating the function f at x=C-1x^ we obtain f^(x^)=f(x), where f(x) is given in (1.2). Once we have the “hat” quadratic minimization problem (3.2), one could apply the delayed weighted gradient method to solve it. However, the algorithm is better defined if the preconditioning part is incorporated into the DWGM algorithm. The preconditioned DWGM algorithm generates a sequence {x^k} in the transformed space, which is translated to the original space of x. Let us denote the current iterated point by x^kn, and consider g^k=f^(x^k) and w^k=A^g^k. Then the MG step size α^k is defined by α^k=argminα>0f(x^k-αg^k)2. Note that

g ^ k = A ^ x ^ k - b ^ = C - 1 ( A x k - b ) = C - 1 g k ,

and

w ^ k = A ^ g ^ k = C - 1 A C - 1 C - 1 g k .

Now define z k , q k and p k such that,

M z k : = g k , q k : = C w ^ k = A z k and M p k : = q k . (3.3)

The first and third definitions in (3.3) are meant to avoid inverse matrix calculations. From the definition of the step size we obtain α^k=g^kTw^k/w^k22. We transform through (3.3) the numerator as

g ^ k T w ^ k = ( C - 1 g k ) T A ^ ( C - 1 g k ) = g k T M - 1 A M - 1 g k = z k T A z k = z k T q k ,

and the denominator becomes

w ^ k T w ^ k = z k T A C - 1 C - 1 A z k = q k T M - 1 q k = q k T p k ,

obtaining

α ^ k = z k T q k q k T p k . (3.4)

Now, we shall reproduce the smoothing technique as in 2424 J.L. Lamotte, B. Molina & M. Raydan. Smooth and adaptive gradient method with retards. Mathematical and Computer Modelling, 36 (2002), 1161-1168.. Consider the next MG iterated point

y ^ k = x ^ k - α ^ k g ^ k = C x k - α ^ k C - 1 g k = C ( x k - α ^ k z k ) ,

and

r ^ k = g ^ k - α ^ k w ^ k = C - 1 ( g k - α ^ k A z k ) = C - 1 ( g k - α ^ k q k ) ,

from which we express uk:=C-1y^k and vk:=Cr^k by

u k = x k - α ^ k z k and v k = g k - α ^ k q k . (3.5)

The delaying smoothing step-size is defined for the “hat” system as

β ^ k : = argmin β f ^ ( x ^ k - 1 + β ( y ^ k - x ^ k - 1 ) ) 2 ,

which can be written as

β ^ k = g ^ k - 1 T ( g ^ k - 1 - r ^ k ) / g ^ k - 1 - r ^ k 2 2 .

Each factor gives

g ^ k - 1 T ( g ^ k - 1 - r ^ k ) = g k - 1 T C - 1 ( C - 1 g k - 1 - r ^ k ) = g k - 1 T M - 1 ( g k - 1 - v k )

and

g ^ k - 1 - r ^ k 2 2 = ( g ^ k - 1 - r ^ k ) T ( g ^ k - 1 - r ^ k ) = ( g k - 1 - v k ) T M - 1 ( g k - 1 - v k ) ,

from which we obtain

β ^ k = g k - 1 T s k ( g k - 1 - v k ) T s k , where M s k : = g k - 1 - v k . (3.6)

Lastly, from

x ^ k + 1 = x ^ k - 1 + β ^ k ( y ^ k - x ^ k - 1 )

and

g ^ k + 1 = g ^ k - 1 + β ^ k ( r ^ k - g ^ k - 1 )

we obtain

x k + 1 = x k - 1 + β ^ k ( u k - x k - 1 ) and g k + 1 = g k - 1 + β ^ k ( v k - g k - 1 ) . (3.7)

The preconditioned delayed weighted gradient method can be summarized by equations (3.3) - (3.7) in the Algorithm 2. The input data for the algorithm are the coefficient matrix A, the preconditioner M, the observation vector b, and the starting point x 0.

Algorithm 2
PDWGM

The computational cost of the preconditioned delayed weighted gradient method is the compu- tational cost of DWGM plus the cost of solving three linear systems. Of course, three linear sys- tems increases the computational cost. But, as we are working with symmetric definite systems, Jacobi, SSOR or incomplete Cholesky preconditioners are good choices that make the cost not increase excessively. In short, although it is necessary to solve three linear systems per iteration, they all have the same associated matrix which is typically diagonal or triangular. Thus, as men- tioned in section 5, each linear system can be solved with a cost 𝒪(n) or 𝒪(n 2) operations. In the end, we expect the higher computational cost per iteration of PDWGM algorithm is compensated by the smaller number of iterations when compared to the DWGM algorithm. The convergence properties for PDWGM are inherited from DWGM, since the iterations are equivalent through the “hat” transformations.

4. CONVERGENCE

We now present the convergence result of the PDWGM. As the matrix C is a linear operator on a finite-dimensional linear space, then C is continuous. It means that convergence properties for {x^k} or {g^k} follow directly from the convergence of {x k } or {g k }, and vice-versa. Associated to a positive definite matrix B, let us denote the B−norm by ·B=B(·)2.

Lemma 4.1.Let {x k } be the sequence generated by theAlgorithm 1. Then for each k

g k + 1 M - 1 / 2 2 r k M - 1 / 2 2 < g k M - 1 / 2 2 r k - 1 M - 1 / 2 2 . (4.1)

Proof. From Lemma 1 in 2525 H.F. Oviedo-Leon. A delayed weighted gradient method for strictly convex quadratic minimization. Computational Optimization and Applications , 74 (2019), 729-746., when applied to the “hat” problem (3.2), we have g^k+12r^k2<g^k2r^k-12, which in turn can be written as in (4.1). □

The next lemma establishes bounds on the eigenvalues of a product of definite positive matrices in terms of a product on the eigenvalues of their factors.

Lemma 4.2.Given definite positive matrices A and B,λ1(A)λ2(A),,λn(A)>0andλ1(B)λ2(B),,λn(B)>0the respective eigenvalues, then for alli,j,k{1,2,,n}such that j + ki + 1,

λ i ( A B ) λ j ( A ) λ k ( B ) (4.2)

and λ n - j + 1 ( A ) λ n - k + 1 ( B ) λ n - i + 1 ( A B ) . (4.3)

In particular, for i = 1, . . . , n

λ i ( A ) λ n ( B ) λ i ( A B ) λ 1 ( A ) λ i ( B ) . (4.4)

For a proof see [66 D.S. Bernstein. “Matrix Mathematics: Theory, Facts, and Formulas”. Princeton University Press, second ed. (2011)., Fact 88 E. Birgin, I. Chambouleyron & J.M. Martínez. Estimation of the optical constants and the thickness of thin films using unconstrained optimization. Computational Physics, (151) (1999), 862-880.).(1818 G.E. Forsythe & T.S. Motzkin. Asymptotic properties of the optimum gradient method, Preliminary report. Bull. Amer. Math. Soc, 57 (1951), 304-305.).(1717 W. Ford. “Numerical Linear Algebra with Applications: Using MATLAB”. Elsevier Inc, New York (2015).]. We shall apply this lemma to express the convergence rate of Algorithm 1 while running with preconditioning:

Theorem 4.1.Consider the problem(1.2), the sequence {x k } generated byAlgorithm 2, and let the eigenvectors for A 1/2 and M −1 beλ1(A1/2)>λ2(A1/2)>>λn(A1/2)>0andλ1(M-1/2)>λ2(M-1/2)>>λn(M-1/2)>0respectively. Then the sequence {x k } converges to A −1 b Q-linearly with convergence factor

λ 1 ( A 1 / 2 ) λ 1 ( M - 1 / 2 ) - λ n ( A 1 / 2 ) λ n ( M - 1 / 2 ) ( λ 1 ( A 1 / 2 ) + λ n ( A 1 / 2 ) ) λ n ( M - 1 / 2 ) .

Proof. By the Theorem 1 in 2525 H.F. Oviedo-Leon. A delayed weighted gradient method for strictly convex quadratic minimization. Computational Optimization and Applications , 74 (2019), 729-746. applied to the “hat” problem (3.2), we have for each k that

g ^ k + 1 λ ^ 1 - λ ^ n λ ^ 1 + λ ^ n g ^ k , (4.5)

where λ^i stands for the i-th eigenvalue of A^1/2. Now, since A^=M-1/2AM-1/2 we have for each i, λi(A^1/2)=λi((M-1/2AM-1/2)1/2)=λi(A1/2M-1/2). Then we can write the factor as

λ i ( A ^ 1 / 2 ) = λ i ( ( M - 1 / 2 A M - 1 / 2 ) 1 / 2 ) = λ i ( A 1 / 2 M - 1 / 2 )

By using (4.4) twice, with i = 1 and i = n we get

λ 1 ( A 1 / 2 ) λ n ( M - 1 / 2 ) λ 1 ( A 1 / 2 M - 1 / 2 ) λ 1 ( A 1 / 2 ) λ 1 ( M - 1 / 2 )

and

λ n ( A 1 / 2 ) λ n ( M - 1 / 2 ) λ n ( A 1 / 2 M - 1 / 2 ) λ 1 ( A 1 / 2 ) λ n ( M - 1 / 2 ) .

From above relations we obtain

λ 1 ( A 1 / 2 M - 1 / 2 ) - λ n ( A 1 / 2 M - 1 / 2 ) λ 1 ( A 1 / 2 ) λ 1 ( M - 1 / 2 ) - λ n ( A 1 / 2 ) λ n ( M - 1 / 2 )

and

λ 1 ( A 1 / 2 M - 1 / 2 ) + λ n ( A 1 / 2 M - 1 / 2 ) λ 1 ( A 1 / 2 ) λ n ( M - 1 / 2 ) + λ n ( A 1 / 2 ) λ n ( M - 1 / 2 ) ,

which leads to the factor. Now, by using the relation g^k=gkM-1/2, we can write (4.5) as

g k + 1 M - 1 / 2 λ 1 ( A 1 / 2 ) λ 1 ( M - 1 / 2 ) - λ n ( A 1 / 2 ) λ n ( M - 1 / 2 ) ( λ 1 ( A 1 / 2 ) + λ n ( A 1 / 2 ) ) λ n ( M - 1 / 2 ) g k M - 1 / 2 .

It follows that {g k } converges to zero Q-linearly with convergence factor

λ 1 ( A 1 / 2 ) λ 1 ( M - 1 / 2 ) - λ n ( A 1 / 2 ) λ n ( M - 1 / 2 ) ( λ 1 ( A 1 / 2 ) + λ n ( A 1 / 2 ) ) λ n ( M - 1 / 2 )

and hence, since A is positive definite, we also conclude that {x k } tends to the unique minimizer of f when k goes to infinity. □

Let us consider the spectral condition number κ2(A^) of Â. Since preconditioners are meant to improve the conditionning of a matrix, in general we will have κ2(A)κ2(A^). The convergence factor can be rewritten as (κ2(A^)-1)/(κ2(A^)+1). Since the function f (x) = (x − 1)/(x + 1) is increasing and κ2(A)κ2(A^), then the preconditioned DWGM converges in fewer iterations than the ordinary DWGM.

Now, we want to prove that PDWGM admits finite termination, in exact arithmetic.

In 22 R. Andreani & M. Raydan. Properties of the delayed weighted gradient method. Computational Optimization and Applications, 78 (2021), 167-180. it was demonstrated the A−orthogonality property of all previous gradients. Applied to the “hat” problem (3.2), it follows that for all k,

g ^ k T A ^ g ^ j = 0 , j k - 1 . (4.6)

By translating to the original problem we obtain for each k,

g k T M - 1 A M - 1 g j = 0 , j k - 1 . (4.7)

In other words, g k is M −1 AM −1−orthogonal to all previous gradient vectors. Notice that (4.7) can be written as zkTAzj=0, jk-1. The property (4.7) leads to the demonstration of the finite convergence theorem.

Theorem 4.2.For any initial guess x0 ∈ ℝn , PDWGM applied to problem(1.2)generates the iterated points x k , k ≥ 1 such that x n = A −1 b.

Proof. From (4.7) we have that the n vectors g k , k = 0, 1, . . . , n − 1 form an M −1 AM −1−orthogonal set, and then they form a linearly independent set of n vectors in ℝn . Therefore, the next vector, g n ∈ ℝn must be zero to be able to keep the M −1 AM −1−orthogonality with all the previous gradient vectors. Since, g n = Ax nb = 0 we obtain our claim. □

Another nice property of the DWGM algorithm preserved by our preconditioned version is that it terminates at p iterations, where p stands for the number of distinct eigenvalues of M −1 AM −1. The key related results are Lemma 7 and Theorem 9 on 22 R. Andreani & M. Raydan. Properties of the delayed weighted gradient method. Computational Optimization and Applications, 78 (2021), 167-180., which we write without proof at the sequel. Notice that for each k the gradient vector generated by PDWGM belongs to the Krylov subspace 𝒦k+1 (M −1 AM −1 , g 0) depending on A, g 0 and M.

Lemma 4.3.In the algorithm PDWGM, for all k ≥ 1,

g k K k + 1 ( M - 1 A M - 1 , g 0 ) : = s p a n { g 0 , M - 1 A M - 1 g 0 , , ( M - 1 A M - 1 ) k g 0 } .

Theorem 4.3.If M−1AM−1has only p < n distinct eigenvalues, then for any initial guess x0 ∈ ℝn , the algorithm PDWGM generates the iterated points x k , k ≥ 1, such that x p = A −1 b.

Notice from (2.1) that to move from x k to x k+1 the Algorithm 1 looks for a point in the line which passes through x k−1 and x k from which walk in the gradient direction, involving these two search directions. Analogously, in the case of Algorithm 2, the search involves the directions x k−1x k and the transformed gradient z k . The next theorem establishes the A-orthogonality of the sequences of these search directions. First notice that

g ^ k T A ^ ( x ^ k - x ^ k - 1 ) = g k T M - 1 ( g k - g k - 1 ) = z k + 1 T ( g k - g k - 1 ) .

Lemma 4.4.In the algorithm PDWGM, the following statements hold for all k ≥ 1.

  1. z k + 1 T ( v k - g k - 1 ) = 0 .

  2. z k + 1 T ( g k - g k - 1 ) = 0 .

  3. z k + 1 T g k + 1 = z k + 1 T g k = z k + 1 T g k - 1 .

  4. β k = z k - 1 T ( g k - 1 - v k ) / ( g k - 1 - g k M - 1 / 2 2 - [ ( z k T A z k ) 2 / A z k M - 1 / 2 2 ] ) > 1 .

Proof.

  • a) From steps 10, 12 and 13 of the algorithm we get

z k + 1 T ( v k - g k - 1 ) = g k + 1 T M - 1 ( v k - g k - 1 ) = ( g k - 1 + β k ( v k - g k - 1 ) ) T M - 1 ( v k - g k - 1 ) = g k - 1 T M - 1 ( v k - g k - 1 ) + g k - 1 T M - 1 ( g k - 1 - v k ) ( g k - 1 - v k ) T M - 1 ( g k - 1 - v k ) ( v k - g k - 1 ) T M - 1 ( v k - g k - 1 ) = 0 .

  • b) From step 8, (a) and (4.7) we have

z k + 1 T ( g k - g k - 1 ) = g k + 1 T M - 1 ( g k - g k - 1 ) = g k + 1 T M - 1 ( v k + α k A z k - g k - 1 ) = g k + 1 T M - 1 ( v k - g k - 1 ) + α k g k + 1 T M - 1 A M - 1 g k = 0 .

  • c) By steps 8, 12 and 13

z k + 1 T g k + 1 = g k + 1 T M - 1 g k + 1 = g k + 1 T M - 1 ( g k - 1 + β k ( v k - g k - 1 ) ) = g k + 1 T M - 1 ( g k - 1 + β k ( g k - g k - 1 ) - α k β k A z k ) = g k + 1 T M - 1 g k - 1 + β k g k + 1 T M - 1 ( g k - g k - 1 ) - α k β k g k + 1 T M - 1 A M - 1 g k .

The second term is zero by part (b) and the third by (4.7). Then, we obtain zk+1Tgk+1=zk+1Tgk-1. Again from (b) we notice that zk+1Tgk=zk+1Tgk-1 obtaining our claim.

  • d) By Cauchy-Schwarz inequality and step 13 we obtain

z k - 1 T g k = g k - 1 T M - 1 g k = g ^ k - 1 g ^ k < g ^ k - 1 2 = g k - 1 T M - 1 g k - 1 = z k - 1 T g k - 1 .

  • Also,

z k - 1 T v k = z k - 1 T ( g k - α k A z k ) = z k - 1 T g k - α k z k - 1 T A z k .

  • By using the A-orthogonality property (4.7) we obtain zk-1Tvk=zk-1Tgk. Therefore, the last two expressions lead to

z k - 1 T ( g k - 1 - v k ) = z k - 1 T ( g k - 1 - g k ) > 0 .

  • This means that the numerator on step 10 is positive, and so, since the denominator can be written as gk-1-vkM-1/22, also β k > 0 for all k ≥ 0. Now we examine the denominator. By steps 4, 6 and 8, and algebraic manipulation we get

g k - 1 - v k M - 1 / 2 2 = g k - 1 - g k + α k A z k M - 1 / 2 2 = g k - 1 - g k M - 1 / 2 2 + 2 α k z k T A M - 1 ( g k - 1 - g k ) + α k 2 z k T A M - 1 A z k = g k - 1 - g k M - 1 / 2 2 + 2 α k ( z k T A z k - 1 - z k T A z k ) + α k 2 z k T A M - 1 A z k = g k - 1 - g k M - 1 / 2 2 - 2 z k T A z k z k A M - 1 A z k z k T A z k + z k T A z k z k T A M - 1 A z k 2 z k T A M - 1 A z k = g k - 1 - g k M - 1 / 2 2 - ( z k T A z k ) 2 z k T A M - 1 A z k .

  • So, βk=zk-1T(gk-1-vk)/(gk-1-gkM-1/22-[(zkTAzk)2/AzkM-1/22]). Since β k and the numerator zk-1T(gk-1-vk) are strictly positive, then the denominator must also be strictly positive. Then,

0 < ( z k T A z k ) 2 / A z k M - 1 / 2 2 < g k - 1 - g k M - 1 / 2 2 .

  • From (c) we know zkT(gk-gk-1)=0, and hence

0 = ( g k - g k - 1 + g k - 1 ) T M - 1 ( g k - g k - 1 ) = g k - 1 - g k M - 1 / 2 2 - z k - 1 T ( g k - 1 - g k ) ,

  • obtaining that zk-1T(gk-1-gk)=gk-1-gkM-1/22. By step 8 and (4.7) we have

z k - 1 T ( g k - v k ) = - α k z k - 1 T A z k = 0 ,

  • so zk-1Tgk=zk-1Tvk. This fact is used to conclude that the numerator in (d) is strictly bigger than the denominator and since both are positive we finally have β k > 1 for all k.

Now we are ready to establish the abovementioned orthogonality result:

Theorem 4.4.The algorithm PDWGM generates sequences gkand zksuch that, for k ≥ 2,

z k T ( g j - g j - 1 ) = 0 , 1 j k . (4.8)

Proof. From Lemma 4.4 (b) we get z2T(g1-g0)=0, and so the result is true for k = 2. Let us assume by induction on k that (4.8) holds up to k=k^3 and consider the next iteration. So we need to show that zk^+1T(gj-gj-1)=0 for all 1jk^. When j=k^ the result follows directly from part (b) of lemma above. Now, jk^-2, by using steps 8 and 12, the inductive hypothesis and (4.7) we have

z k ^ + 1 T ( g j - g j - 1 ) = g k ^ + 1 T M - 1 ( g j - g j - 1 ) = [ g k ^ - 1 + β k ^ ( v k ^ - g k ^ - 1 ) ] T M - 1 ( g j - g j - 1 ) = [ ( 1 - β k ^ ) g k ^ - 1 + β k v k ^ ] T M - 1 ( g j - g j - 1 ) = β k ^ v k ^ T M - 1 ( g j - g j - 1 ) = β k ^ ( g k ^ - α k ^ A z k ^ ) T M - 1 ( g j - g j - 1 ) = - β k ^ α k ^ g k ^ T M - 1 A M - 1 ( g j - g j - 1 ) = 0 .

Finally, when j=k^-1, by using steps 8, 12 and 13, as well as the inductive hypothesis and (4.7) we have

z k ^ + 1 T ( g k ^ - 1 - g k ^ - 2 ) = g k ^ + 1 T M - 1 ( g k ^ - 1 - g k ^ - 2 ) = g k ^ + 1 T M - 1 ( g k ^ - 1 - v k + v k - g k ^ - 2 ) = g k ^ + 1 T M - 1 ( g k ^ - α k ^ A z k ^ - g k ^ - 2 ) = g k ^ + 1 T M - 1 ( g k ^ - g k ^ - 2 ) = g k ^ + 1 T M - 1 ( g k ^ - 2 + β k ^ - 1 ( v k ^ - 1 - g k ^ - 2 ) - g k ^ - 2 ) = β k ^ - 1 g k ^ + 1 T M - 1 ( v k ^ - 1 - g k ^ - 2 ) = β k ^ - 1 z k ^ + 1 T ( v k ^ - 1 - g k ^ - 2 ) .

Therefore, (βk^-1)zk^-1T(vk^-1-gk^-2)=0. Since βk^>1 for all k ≥ 1 and noticing that zk^-1Tvk^-1=zk^-1Tgk^-1, we conclude zk^-1T(vk^-1-gk^-2)=0, and so (4.8) is established. □

5. NUMERICAL EXPERIMENTS

In this section we show the preconditioned DWGM algorithm has a numerical behaviour similar to the preconditioned CG algorithm. We present some numerical experiments to validate our claim. In all examples we compare the performance of the CG 2222 C.T. Kelley. “Iterative Methods for Linear and Nonlinear Equations”. Society for Industrial and Applied Mathematics (1995)., preconditioned CG 2222 C.T. Kelley. “Iterative Methods for Linear and Nonlinear Equations”. Society for Industrial and Applied Mathematics (1995)., DWGM 2525 H.F. Oviedo-Leon. A delayed weighted gradient method for strictly convex quadratic minimization. Computational Optimization and Applications , 74 (2019), 729-746. and the preconditioned DWGM algorithms. All the experiments were performed on an intel(R) CORE(TM) i7-4770, CPU 3.40 GHz with 16 GB RAM.

We propose, initially, an experiment to compare the performance of DWGM and PDWGM with the most robust iterative method for solving strictly convex quadratic problems, the conjugate gradient method and its preconditioned version.

We obtained seventy matrices from the SuiteSparse Matrix Collection1 1 formerly the University of Florida Sparse Matrix Collection. 1313 T.A. Davis & Y. Hu. The University of Florida Sparse Matrix Collection. ACM Trans. Math. Softw., 38(1) (2011). doi:10.1145/2049662.2049663.
https://doi.org/10.1145/2049662.2049663...
), (2323 S.P. Kolodziej, M. Aznaveh, M. Bullock, J. David, T.A. Davis, M. Henderson, Y. Hu & R. Sandstrom. The SuiteSparse Matrix Collection Website Interface. Journal of Open Source Software, 4(35) (2019), 1244. doi:10.21105/joss.01244.
https://doi.org/10.21105/joss.01244...
. Then we solved the resulting seventy linear systems with the CG, preconditioned CG, DWGM and the precondi- tioned DWGM algorithms with b = [1, 1, . . . , 1]T and x 0 = [0, 0, . . . , 0]T . The stopping criterium used is

f ( x k ) 2 10 - 5 .

For the seventy experiments we used the Jacobi preconditioner 2020 G.H. Golub & C.F. Van Loan. “Matrix Computations”. Johns Hopkins Studies in the Mathematical Sciences. Johns Hopkins University Press, 4th ed. (2012). to perform this experiment. But similar results are obtained if we use different preconditioners, like incomplete Cholesky factorization 1717 W. Ford. “Numerical Linear Algebra with Applications: Using MATLAB”. Elsevier Inc, New York (2015). or SSOR 1717 W. Ford. “Numerical Linear Algebra with Applications: Using MATLAB”. Elsevier Inc, New York (2015)., for example.

Performance profiles 1515 E.D. Dolan & J.J. Moré. Benchmarking optimization software with performance profiles. Mathematical Programming, 91 (2002), 201-213. doi:10.1007/s101070100263.
https://doi.org/10.1007/s101070100263...
comparing CPU time and number of iterations are presented on Figure 1. Note that CG and DWGM algorithms have similar behaviour in terms of CPU time and num- ber of iterations. On the other hand, the comparison between PCG and PDWGM present some differences. First, we can note that, in general, PDWGM converge in less iterations than PCG, but with a higher CPU time. But we emphasize the differences are not significant.

Figure 1
Performance profiles comparing CG, PCG, DWGM and PDWGM.

Table 1 presents, for fourteen selected matrices, the number of iterations (niter), the error norm ek=xk-x*2 and the gradient error norm rk=Axk-b2.

Table 1
Iteration information of CG, PCG, DWGM and PDWGM for fourteen models.

The most evident characteristic that we can observe in Table 1 is that the DWGM algorithm reaches the given tolerance in fewer iterations than the CG algorithm, this fact was also observed in 22 R. Andreani & M. Raydan. Properties of the delayed weighted gradient method. Computational Optimization and Applications, 78 (2021), 167-180.. A second observation is that both PCG and PDWGM have a similar behavior. It means they reach the given tolerance with a similar number of iterations. The CPU times of PCG and PDWGM are also similar, sometimes PCG is faster, sometimes PDWGM is faster. But, overall the CPU times do not differ a lot. Moreover, as shown in the next figure, the gradient curves of PCG and PDWGM are resemblant. In short, the differences between PDWGM and PCG are not significant.

To corroborate the results presented in Table 1 we display six graphs with the conjugate gra- dient, preconditioned conjugate gradient, DWGM and PDWGM number of iterations versus log10(f(xk)2) comparison. As pointed out above, PCG and PDWGM have a similar be- haviour. Note that the gradient curves of PCG and PDWGM have similar pattern. Nevertheless, as pointed out by 2525 H.F. Oviedo-Leon. A delayed weighted gradient method for strictly convex quadratic minimization. Computational Optimization and Applications , 74 (2019), 729-746., the computational cost per iteration for CG is 2n 2 + 9n − 2 flops while the computation cost per iteration for PCG is 2n 2 + 17n − 4 flops. Thus, the computational cost of PCG is the cost of CG plus the resolution of a linear system, while the computational cost of PDWM is the cost of DWGM plus the resolution of three linear systems. Note that, by using the Jacobi preconditioner or a preconditioner based on the incomplete Cholesky factorization the linear systems to be solved, for each iteration are either diagonal or triangular with computational costs of 𝒪(n) and 𝒪(n 2) flops, respectively. As final observation, another effect we note is the smoothed curves of DWGM and PDWGM when compared to the well known “crispness” of the CG and PCG curves.

Figure 2
Comparison between CG, PCG, DWGM and PDWGM for six different models.

The experiment described above has, as objective, the comparison between the DWGM/PDWGM and CG/PCG algorithms. Now we propose a second experiment in order to illustrate the finite termination property in the number of distinct eigenvalues of the PDWGM algorithm (Theorem 4.3), and how preconditioning can influence over the number of steps.

This experiment is based on a construction of a preconditioned matrix with a predefined number of distinct eigenvalues. In the following procedure we present how to construct a matrix A and a preconditioner M = C 2 such that  = C −1 AC −1 has a defined number of distinct eigenvalues.

  • Let Q ∈ ℝn×n be an orthogonal matrix;

  • Let v ∈ ℝn be a vector with random entries between 0 and 1;

  • Define A = QT 1 Q T , with T 1 =diag(v);

  • Let l ∈ ℝp such that l 1 > l 2 > · · · > l p (eigenvalues);

  • Define the algebraic multiplicity n i of each l i , i = 1, . . . , p such that n 1 + n 2 + · · · + n p = n;

  • Define L =diag(l 1 , . . . , l 1 , l 2 , . . . , l 2 , . . . , l p , . . . , l p ) ∈ ℝn×n ;

  • Define a diagonal matrix T 2 such that T 2(i, i) = L(i, i)/T 1(i, i), i = 1, . . . , n;

  • Define M −1 = QT 2 Q T or C −1 = QT 2 1/2 Q T .

Note that,

A ^ = C - 1 A C - 1 = Q T 2 1 / 2 Q T Q T 1 Q T Q T 2 1 / 2 Q T = Q T 2 1 / 2 T 1 T 2 1 / 2 Q T = Q L Q T .

Then, Â is a preconditioned matrix with clustered eigenvalues.

The objective of this experiment is to present some examples of the finite termination of DWGM and PDWGM. Firstly, a random orthogonal matrix Q is defined based on the QR decomposition 2020 G.H. Golub & C.F. Van Loan. “Matrix Computations”. Johns Hopkins Studies in the Mathematical Sciences. Johns Hopkins University Press, 4th ed. (2012). of a given random matrix. Then, the procedure above is run 15 times, generating 15 pairs (A, M). As in the first example we compare DWGM and PDWGM 15 times and then we take the average and the standard deviation of the results.

The results are presented in Table 2. For DWGM and PDWGM we compare the number of iterations, the absolute error, the norm of the residual and CPU time. Moreover, n an p represent the problem dimension and the number of distinct eigenvalues of Â, respectively. Observe that for small instances the number of iterations of the algorithm without preconditioning is proportional to the dimension n, while the number of iterations of the preconditioned is proportional to the number of different eigenvalues p on the problem, as Theorem 4.3 claim.

Table 2
Iteration information of CG, PCG, DWGM and PDWGM.

6. CONCLUSIONS

The delayed weighted gradient method is a two-step gradient method that aims to correct the Cauchy’s point and produce a faster solution than the classical gradient method. We have pre- sented and discussed the derivation of the preconditioned delayed weighted gradient method. Also we have shown the preconditioned conjugate gradient and the preconditioned delayed weighted gradient method have similar results, and preconditioned conjugate gradient has a sim- ilar computational cost per iteration than the preconditioned delayed weighted gradient method. Convergence and theoretical properties of the preconditioned delayed weighted gradient method are proved.

REFERENCES

  • 1
    H. Akaike. On a successive transformation of probability distribution and its application to the analysis of the optimum gradient method. Annals of the Institute of Statistical Mathematics, 11 (1959), 1-16.
  • 2
    R. Andreani & M. Raydan. Properties of the delayed weighted gradient method. Computational Optimization and Applications, 78 (2021), 167-180.
  • 3
    R.D. Asmundis, D. di Serafino, W.W. Hager, G. Toraldo & H. Zhang. An efficient gradient method using the Yuan steplength. Computational Optimization and Applications , 59 (2014), 541-563.
  • 4
    R.D. Asmundis, D. di Serafino, F. Riccio & G. Toraldo. On spectral properties of steepest descent methods. IMA Journal of Numerical Analysis, 33(4) (2013), 1416-1435.
  • 5
    J. Barzilai & J.M. Borwein. Two-point step size gradient methods. IMA Journal of Numerical Analysis , 8(1) (1988), 141-148.
  • 6
    D.S. Bernstein. “Matrix Mathematics: Theory, Facts, and Formulas”. Princeton University Press, second ed. (2011).
  • 7
    D. Bertaccini & F. Durastante. “Iterative Methods and Preconditioning for Large and Sparse Linear Systems with Applications”. Chapman and Hall/CRC, New York (2018).
  • 8
    E. Birgin, I. Chambouleyron & J.M. Martínez. Estimation of the optical constants and the thickness of thin films using unconstrained optimization. Computational Physics, (151) (1999), 862-880.
  • 9
    E.G. Birgin, J.M. Martínez & M. Raydan. Spectral projected gradient methods: review and perspectives. Journal of Statistical Software, 60(3) (2014), 1-21.
  • 10
    A. Cauchy. Méthode générale pour la résolution des systemes d’équations simultanées. Comptes Rendus Sci. Paris, 25 (1847), 536-538.
  • 11
    Y.H. Dai, Y. Huang & X.W. Liu. A family of spectral gradient methods for optimization. Computational Optimization and Applications , 74 (2019), 43-65.
  • 12
    Y.H. Dai & L.Z. Liao. R-linear convergence of the Barzilai and Borwein gradient method. IMA Journal of Numerical Analysis , 22(1) (2002), 1-10.
  • 13
    T.A. Davis & Y. Hu. The University of Florida Sparse Matrix Collection. ACM Trans. Math. Softw., 38(1) (2011). doi:10.1145/2049662.2049663.
    » https://doi.org/10.1145/2049662.2049663
  • 14
    D. di Serafino, V. Ruggiero, G. Toraldo & L. Zanni. On the steplength selection in gradient methods for unconstrained optimization. Applied Mathematics and Computation, 318 (2018), 176-195.
  • 15
    E.D. Dolan & J.J. Moré. Benchmarking optimization software with performance profiles. Mathematical Programming, 91 (2002), 201-213. doi:10.1007/s101070100263.
    » https://doi.org/10.1007/s101070100263
  • 16
    M.A. Figueiredo, R. Nowak & S. Wright. Projection for sparse reconstruction: application to compressed sensing and other inverse problems. IEEE J. Sel. Top. Signal Process, (1) (2007), 586-597.
  • 17
    W. Ford. “Numerical Linear Algebra with Applications: Using MATLAB”. Elsevier Inc, New York (2015).
  • 18
    G.E. Forsythe & T.S. Motzkin. Asymptotic properties of the optimum gradient method, Preliminary report. Bull. Amer. Math. Soc, 57 (1951), 304-305.
  • 19
    A. Friedlander, J.M. Martínez, B. Molina & M. Raydan. Gradient method with retards and generalizations. SIAM Journal on Numerical Analysis, 36(1) (1999), 275-289.
  • 20
    G.H. Golub & C.F. Van Loan. “Matrix Computations”. Johns Hopkins Studies in the Mathematical Sciences. Johns Hopkins University Press, 4th ed. (2012).
  • 21
    Y. Huang, Y.H. Dai, X.W. Liu & H. Zhang. Gradient methods exploiting spectral properties. Optimization Methods and Software, 35(4) (2020), 681-705.
  • 22
    C.T. Kelley. “Iterative Methods for Linear and Nonlinear Equations”. Society for Industrial and Applied Mathematics (1995).
  • 23
    S.P. Kolodziej, M. Aznaveh, M. Bullock, J. David, T.A. Davis, M. Henderson, Y. Hu & R. Sandstrom. The SuiteSparse Matrix Collection Website Interface. Journal of Open Source Software, 4(35) (2019), 1244. doi:10.21105/joss.01244.
    » https://doi.org/10.21105/joss.01244
  • 24
    J.L. Lamotte, B. Molina & M. Raydan. Smooth and adaptive gradient method with retards. Mathematical and Computer Modelling, 36 (2002), 1161-1168.
  • 25
    H.F. Oviedo-Leon. A delayed weighted gradient method for strictly convex quadratic minimization. Computational Optimization and Applications , 74 (2019), 729-746.
  • 26
    T. Serafini, G. Zanghirati & L. Zanni. Gradient projection methods for large quadratic programs and applications in training support vector machines. Optimization Methods and Software , (20) (2005), 353-378.
  • 27
    B.V. Shah, R.J. Buehler & O. Kempthorne. Some Algorithms for Minimizing a Function of Several Variables. Journal of the Society for Industrial and Applied Mathematics, 12(1) (1964), 74-92.
  • 28
    H. Sorenson. Comparison of some conjugate direction procedures for function minimization. Journal of the Franklin Institute, 288(6) (1969), 421-441.
  • 29
    Y.X. Yuan. A new stepsize for the steepest decent method. Journal of Computational Mathematics, 24(2) (2006), 149-156.
  • 1
    formerly the University of Florida Sparse Matrix Collection.

Publication Dates

  • Publication in this collection
    28 July 2023
  • Date of issue
    Jul-Sep 2023

History

  • Received
    17 Jan 2022
  • Accepted
    16 Nov 2022
Sociedade Brasileira de Matemática Aplicada e Computacional - SBMAC Rua Maestro João Seppe, nº. 900, 16º. andar - Sala 163, Cep: 13561-120 - SP / São Carlos - Brasil, +55 (16) 3412-9752 - São Carlos - SP - Brazil
E-mail: sbmac@sbmac.org.br