On the Preconditioned Delayed Weighted Gradient Method

ALEIXO, R.; LARA URDANETA, H.

doi:10.5540/tcam.2023.024.03.00437

ABSTRACT

In this article a preconditioned version of the Delayed Weighted Gradient Method (DWGM) is presented and analyzed. In addition to the convergence, some nice properties as the A- orthogonality of the current transformed gradient with all the previous gradient vectors as well as finite convergence are demonstrated. Numerical experimentation is also offered, exposing the benefits of preconditioning.

Keywords:
gradient methods; convex quadratic optimization; Krylov subspace methods; preconditioning

1 INTRODUCTION

Many real-life applications lead to large-scale convex quadratic optimization problems. Among other, low-cost gradient methods have widely been effective in solving challenging instances of this class of optimization problems (see for instance ⁸8 E. Birgin, I. Chambouleyron & J.M. Martínez. Estimation of the optical constants and the thickness of thin films using unconstrained optimization. Computational Physics, (151) (1999), 862-880.^{), (}¹⁶16 M.A. Figueiredo, R. Nowak & S. Wright. Projection for sparse reconstruction: application to compressed sensing and other inverse problems. IEEE J. Sel. Top. Signal Process, (1) (2007), 586-597.^{), (}²⁶26 T. Serafini, G. Zanghirati & L. Zanni. Gradient projection methods for large quadratic programs and applications in training support vector machines. Optimization Methods and Software , (20) (2005), 353-378. and references therein). Gradient methods for the unconstrained minimization problem

{minimize}_{x \in ℝ^{n}} f (x)

generate a sequence of solution approximations x _k satisfying

x_{k + 1} = x_{k} - α_{k} g_{k},

where $f : ℝ^{n} \to ℝ$ is continuously differentiable, $g_{k} = \nabla f (x_{k})$ and α _k > 0. The selection of the step length α _k depends on the chosen method. The classical steepest decent (SD) method was proposed in ¹⁰10 A. Cauchy. Méthode générale pour la résolution des systemes d’équations simultanées. Comptes Rendus Sci. Paris, 25 (1847), 536-538. to solve nonlinear systems of equations. In this case,

α_{k}^{SD} = {argmin}_{α} f (x_{k} - α g_{k}) .

(1.1)

Assuming the objective function f to be a strictly convex quadratic function, that is, for a sym- metric and positive definite (SPD) matrix A ∈ ℝ^n×n , the unconstrained optimization problem becomes

{minimize}_{x \in ℝ^{n}} f (x) = \frac{1}{2} x^{T} A x - b^{T} x .

(1.2)

It is well known that the unique minimizer of this problem also solves the linear equation Ax = b, and so we can denote it by A ⁻¹ b. Under this assumption, simple calculations on (1.1) give

α_{k}^{SD} = \frac{g_{k}^{T} g_{k}}{g_{k}^{T} A g_{k}} .

It is proven the SD method converges Q-linearly ¹1 H. Akaike. On a successive transformation of probability distribution and its application to the analysis of the optimum gradient method. Annals of the Institute of Statistical Mathematics, 11 (1959), 1-16.. Instead of minimizing the objective function f, the Minimum Gradient (MG) step length ³3 R.D. Asmundis, D. di Serafino, W.W. Hager, G. Toraldo & H. Zhang. An efficient gradient method using the Yuan steplength. Computational Optimization and Applications , 59 (2014), 541-563. aims to minimize the gradient norm, i.e.

α_{k}^{MG} = {argmin}_{α} | | \nabla f (x_{k} - α g_{k}) | |_{2},

which can be written as

α_{k}^{MG} = \frac{g_{k}^{T} A g_{k}}{g_{k}^{T} A^{2} g_{k}} .

(1.3)

Despite the optimal properties on the definitions of $α_{k}^{SD}$ and $α_{k}^{MG}$ , the steepest descent method converges slowly and is badly affected by ill conditioning (see ¹1 H. Akaike. On a successive transformation of probability distribution and its application to the analysis of the optimum gradient method. Annals of the Institute of Statistical Mathematics, 11 (1959), 1-16. and ¹⁸18 G.E. Forsythe & T.S. Motzkin. Asymptotic properties of the optimum gradient method, Preliminary report. Bull. Amer. Math. Soc, 57 (1951), 304-305.). An overcome for this issue was proposed by Barzilai and Borwein ⁵5 J. Barzilai & J.M. Borwein. Two-point step size gradient methods. IMA Journal of Numerical Analysis , 8(1) (1988), 141-148.. The Barzilai-Borwein (BB) methods is based on a secant condition. They propose two possible step sizes

α_{k}^{BB1} = {argmin}_{α} {||{\tilde{s}}_{k - 1} - α {\tilde{y}}_{k - 1}||}_{2} and α_{k}^{BB2} = {argmin}_{α} {||\frac{1}{α} {\tilde{s}}_{k - 1} - α {\tilde{y}}_{k - 1}||}_{2},

where ${\tilde{s}}_{k - 1} = x_{k} - x_{k - 1}$ and ${\tilde{y}}_{k - 1} = \nabla f (x_{k}) - \nabla f (x_{k - 1})$ . Thus obtaining, respectively

α_{k}^{BB1} = \frac{{||{\tilde{s}}_{k - 1}||}_{2}^{2}}{{\tilde{s}}_{k - 1}^{T} {\tilde{y}}_{k - 1}} and α_{k}^{BB2} = \frac{{\tilde{s}}_{k - 1}^{T} {\tilde{y}}_{k - 1}}{{||{\tilde{y}}_{k - 1}||}_{2}^{2}} .

If we restrict f to be a strictly convex quadratic function, we obtain

α_{k}^{BB1} = \frac{g_{k - 1}^{T} g_{k - 1}}{g_{k - 1}^{T} A g_{k - 1}} = α_{k - 1}^{SD} and α_{k}^{BB2} = \frac{g_{k - 1}^{T} A g_{k - 1}}{g_{k - 1}^{T} A^{2} g_{k - 1}} = α_{k - 1}^{MG} .

which satisfy ¹⁴14 D. di Serafino, V. Ruggiero, G. Toraldo & L. Zanni. On the steplength selection in gradient methods for unconstrained optimization. Applied Mathematics and Computation, 318 (2018), 176-195.

\frac{1}{λ_{1}} ⩽ α_{k}^{BB2} ⩽ α_{k}^{BB1} ⩽ \frac{1}{λ_{n}}

where, λ ₁ and λ _n are the maximum and the minimum eigenvalues of A, respectively. Basically, the BB step-sizes coincide with SD and MG with a retard of −1. The BB methods converge R-linearly ¹²12 Y.H. Dai & L.Z. Liao. R-linear convergence of the Barzilai and Borwein gradient method. IMA Journal of Numerical Analysis , 22(1) (2002), 1-10.. Friedlander et al. ¹⁹19 A. Friedlander, J.M. Martínez, B. Molina & M. Raydan. Gradient method with retards and generalizations. SIAM Journal on Numerical Analysis, 36(1) (1999), 275-289. generalized the BB approach and proposed the Gradient Methods with Retards (GMR). In this case,

α_{k}^{GMR} = \frac{g_{ν (k)}^{T} g_{ν (k)}}{g_{ν (k)}^{T} A g_{ν (k)}} = α_{ν (k)}^{SD},

where ν(k) is chosen in the set {k, k − 1, max{0, k − m}} and m is a given positive integer. Yuan ²⁹29 Y.X. Yuan. A new stepsize for the steepest decent method. Journal of Computational Mathematics, 24(2) (2006), 149-156. proposed the following step length which also includes retard in its definition,

a_{k}^{Y} = 2 {(\frac{1}{α_{k - 1}^{SD}} + \frac{1}{α_{k}^{S D}} + \sqrt{{(\frac{1}{α_{k - 1}^{SD}} - \frac{1}{α_{k}^{S D}})}^{2} + \frac{4 {||g_{k}||}_{2}^{2}}{(α_{k - 1}^{SD} {||g_{k - 1}||}_{2}^{2})^{2}}})}^{- 1} .

Yuan’s method present a good performance for small-scale problems. Note that all the gradient method variants presented so far are one-step methods.

Methods which use two step-sizes are alternatives to accelerate gradient based methods, by im- posing retard on the process (see ¹¹11 Y.H. Dai, Y. Huang & X.W. Liu. A family of spectral gradient methods for optimization. Computational Optimization and Applications , 74 (2019), 43-65.). Recently, Oviedo-Leon ²⁵25 H.F. Oviedo-Leon. A delayed weighted gradient method for strictly convex quadratic minimization. Computational Optimization and Applications , 74 (2019), 729-746. proposed to combine a smoothing technique with a delayed gradient step to construct the promising Delayed Weighted Gradient Method (DWGM), which exhibits an impressive fast convergence behavior that com- pares favorable with the conjugate gradient method (CG), sharing other nice properties as finite termination and A-orthogonality of its iterated points (gradients) (see ²2 R. Andreani & M. Raydan. Properties of the delayed weighted gradient method. Computational Optimization and Applications, 78 (2021), 167-180.^{), (}²⁵25 H.F. Oviedo-Leon. A delayed weighted gradient method for strictly convex quadratic minimization. Computational Optimization and Applications , 74 (2019), 729-746.). The DWGM can be seen as a variant of the parallel tangent method (PARTAN) in the sense that it uses two line searches in the iteration, with information of previous points to accelerate the gradient method ¹⁸18 G.E. Forsythe & T.S. Motzkin. Asymptotic properties of the optimum gradient method, Preliminary report. Bull. Amer. Math. Soc, 57 (1951), 304-305.^{), (}²⁷27 B.V. Shah, R.J. Buehler & O. Kempthorne. Some Algorithms for Minimizing a Function of Several Variables. Journal of the Society for Industrial and Applied Mathematics, 12(1) (1964), 74-92.^{), (}²⁸28 H. Sorenson. Comparison of some conjugate direction procedures for function minimization. Journal of the Franklin Institute, 288(6) (1969), 421-441.. In ²⁴24 J.L. Lamotte, B. Molina & M. Raydan. Smooth and adaptive gradient method with retards. Mathematical and Computer Modelling, 36 (2002), 1161-1168. a smoothing technique is introduced to prevent the so called zigzagging behavior on the sequence of the gradient norms, which is characteristic of CG methods.

In most practical applications, it is convenient to introduce preconditioning to accelerate the convergence of the process. Preconditioners are useful in iterative methods to solve a linear sys- tem. Since the larger the condition number, the larger the rate of convergence, for most iterative solvers, and the preconditioning is meant to decrease the condition number, then it is expected to improve the convergence rate (see for example ⁷7 D. Bertaccini & F. Durastante. “Iterative Methods and Preconditioning for Large and Sparse Linear Systems with Applications”. Chapman and Hall/CRC, New York (2018).). Given the positive definite matrix C ² (precon- ditioner), the two sided preconditioning designated to deal with the system of equations Ax = b, first solves the related system C ⁻¹ AC ⁻¹ y = C ⁻¹ b, with better condition number, and then Cx = y to obtain the solution x. There are different choices for preconditioning matrices, as Jacobi, in- complete LU factorization, incomplete Cholesky factorization, successive over-relaxation, etc. Preconditioning has been widely implemented to deal with conjugate gradient type methods, (see ⁷7 D. Bertaccini & F. Durastante. “Iterative Methods and Preconditioning for Large and Sparse Linear Systems with Applications”. Chapman and Hall/CRC, New York (2018).). In ²2 R. Andreani & M. Raydan. Properties of the delayed weighted gradient method. Computational Optimization and Applications, 78 (2021), 167-180. was noticed finite termination of DWGM in p < n iterations, when the n × n Hessian matrix has only p distinct eigenvalues, as it also happens with CG methods, motivating the use of preconditioning strategies when solving large-scale symmetric and positive definite linear systems. The aim of this work is to propose, analyze and exhibit the numerical behavior of the preconditioned version of the recently introduced DWGM algorithm.

The remainder of this article is organized as follows: In the next section we describe the DWGM, and expose some of its properties. In section 3 we develop the preconditioned version. The anal- ysis of our preconditioned version is the topic of section 4. Numerical experimentation, and conclusions are the topics of sections 5 and 6 respectively.

2. DELAYED WEIGHTED GRADIENT METHOD

We consider the strictly convex quadratic minimization problem (1.2). Since the gradient $g (x) \equiv \nabla f (x) = A x - b$ , then the unique global solution A ⁻¹ b for the problem (1.2) also solves the linear system Ax = b. For large n, many low cost iterative methods have been proposed and analyzed. The so-called gradient type methods emerge as competitive choices since they show fast linear convergence (see ³3 R.D. Asmundis, D. di Serafino, W.W. Hager, G. Toraldo & H. Zhang. An efficient gradient method using the Yuan steplength. Computational Optimization and Applications , 59 (2014), 541-563.^{), (}⁴4 R.D. Asmundis, D. di Serafino, F. Riccio & G. Toraldo. On spectral properties of steepest descent methods. IMA Journal of Numerical Analysis, 33(4) (2013), 1416-1435.^{), (}⁹9 E.G. Birgin, J.M. Martínez & M. Raydan. Spectral projected gradient methods: review and perspectives. Journal of Statistical Software, 60(3) (2014), 1-21.^{), (}¹¹11 Y.H. Dai, Y. Huang & X.W. Liu. A family of spectral gradient methods for optimization. Computational Optimization and Applications , 74 (2019), 43-65.^{), (}¹⁴14 D. di Serafino, V. Ruggiero, G. Toraldo & L. Zanni. On the steplength selection in gradient methods for unconstrained optimization. Applied Mathematics and Computation, 318 (2018), 176-195.^{), (}²¹21 Y. Huang, Y.H. Dai, X.W. Liu & H. Zhang. Gradient methods exploiting spectral properties. Optimization Methods and Software, 35(4) (2020), 681-705.).

From a starting point x ₀ ∈ ℝⁿ , consider $g_{k} = g (x_{k})$ . The minimum gradient method (MG) (see ³3 R.D. Asmundis, D. di Serafino, W.W. Hager, G. Toraldo & H. Zhang. An efficient gradient method using the Yuan steplength. Computational Optimization and Applications , 59 (2014), 541-563.) is given by the iteration

x_{k + 1} = x_{k} - α_{k}^{MG} g_{k} .

Here, the step-size $α_{k}^{MG}$ was defined in (1.3). Denoting $w_{k} : = A g_{k}$ , we can write the step size as

α_{k}^{MG} = \frac{g_{k}^{T} w_{k}}{{||w_{k}||}_{2}^{2}} .

The minimum gradient norm method calculates the next iterated point as the vector alongside the current gradient at which the norm of the next gradient is minimized.

As a two step gradient method, DWGM incorporates a delaying step defined as follows ²⁵25 H.F. Oviedo-Leon. A delayed weighted gradient method for strictly convex quadratic minimization. Computational Optimization and Applications , 74 (2019), 729-746.: The first stage uses the ordinary minimum gradient point $y_{k} = x_{k} - α_{k}^{MG} g_{k}$ . Then, by defining $β_{k} = {argmin}_{β \in ℝ} {||\nabla f (x_{k - 1} + β (y_{k} - x_{k - 1}))||}_{2}$ , at the second stage

x_{k + 1} = x_{k - 1} + β_{k} (y_{k} - x_{k - 1})

is calculated. It is straightforward to see that $\nabla f (x_{k - 1} + β (y_{k} - x_{k - 1})) = g_{k - 1} - β (g_{k - 1} - r_{k})$ , for $r_{k} = g_{k} - α_{k} w_{k}$ . This leads to

β_{k} = g_{k - 1}^{T} (g_{k - 1} - r_{k}) / {||g_{k - 1} - r_{k}||}_{2}^{2} .

By merging the definition of y _k into x _k+1 and after simple manipulation, the next iterated point can be rewritten as

x_{k + 1} = (1 - β_{k}) x_{k - 1} + β_{k} x_{k} - β_{k} α_{k} g_{k} .

(2.1)

This expression leads us to interpret the choice of β _k as that it decides simultaneously for choos- ing a point on the line passing through the two previous iterated points, and how much to ad- vance alongside the current gradient direction in such way that the gradient on the next iteration is minimized. The resulting DWGM algorithm is:

Algorithm 1
DWGM

Some of the properties that DWGM enjoys, established in ²⁵25 H.F. Oviedo-Leon. A delayed weighted gradient method for strictly convex quadratic minimization. Computational Optimization and Applications , 74 (2019), 729-746. and ²2 R. Andreani & M. Raydan. Properties of the delayed weighted gradient method. Computational Optimization and Applications, 78 (2021), 167-180. include the nonnegativity of β _k for all k, the monotonic decreasing of {∥g _k ∥₂} as well as the Q-linear convergence of {g _k } to zero when k goes to infinity (which implies that {x _k } converges to the unique global minimizer of f ), and finite convergence by using A-orthogonality of the gradient vector at the current iteration with all previous gradient vectors.

3. PRECONDITIONED DWGM

In this section we present the preconditioned DWGM derivation. Preconditioners play an impor- tant role in the theory of iterative methods. Quoting ⁷7 D. Bertaccini & F. Durastante. “Iterative Methods and Preconditioning for Large and Sparse Linear Systems with Applications”. Chapman and Hall/CRC, New York (2018). “... lack of robustness and sometimes erratic convergence behavior are recognized weakness of iterative solvers. These issues hamper the acceptance of iterative methods despite their intrinsic appeal for very large linear systems. Both the efficiency and robustness of iterative techniques can be very much improved by using preconditioning.”

Let M be a positive definite preconditioner, then exists an unique symmetric positive definite matrix C, such that M = C ² (see ²⁰20 G.H. Golub & C.F. Van Loan. “Matrix Computations”. Johns Hopkins Studies in the Mathematical Sciences. Johns Hopkins University Press, 4th ed. (2012).). This positive definite matrix, as well as it inverse allow us to define the so-called “hat” transformations, that conduce us to transform the space where the original iterated point x _k belongs into the transformed space where the ${\hat{x}}_{k}$ are calculated:

\hat{A} = C^{- 1} A C^{- 1}; \hat{b} = C^{- 1} b; \hat{x} = C x .

(3.1)

Note that Â is SPD. Let us define the auxiliary convex quadratic minimization problem

{minimize}_{\hat{x} \in ℝ^{n}} \hat{f} (\hat{x}) = \frac{1}{2} {\hat{x}}^{T} \hat{A} \hat{x} - {\hat{b}}^{T} \hat{x} .

(3.2)

Evaluating the function f at $x = C^{- 1} \hat{x}$ we obtain $\hat{f} (\hat{x}) = f (x)$ , where f(x) is given in (1.2). Once we have the “hat” quadratic minimization problem (3.2), one could apply the delayed weighted gradient method to solve it. However, the algorithm is better defined if the preconditioning part is incorporated into the DWGM algorithm. The preconditioned DWGM algorithm generates a sequence ${{\hat{x}}_{k}}$ in the transformed space, which is translated to the original space of x. Let us denote the current iterated point by ${\hat{x}}_{k} \in ℝ^{n}$ , and consider ${\hat{g}}_{k} = \nabla \hat{f} ({\hat{x}}_{k})$ and ${\hat{w}}_{k} = \hat{A} {\hat{g}}_{k}$ . Then the MG step size ${\hat{α}}_{k}$ is defined by ${\hat{α}}_{k} = {argmin}_{α > 0} {||\nabla f ({\hat{x}}_{k} - α {\hat{g}}_{k})||}_{2}$ . Note that

{\hat{g}}_{k} = \hat{A} {\hat{x}}_{k} - \hat{b} = C^{- 1} (A x_{k} - b) = C^{- 1} g_{k},

and

{\hat{w}}_{k} = \hat{A} {\hat{g}}_{k} = (C^{- 1} A C^{- 1}) C^{- 1} g_{k} .

Now define z _k , q _k and p _k such that,

M z_{k} : = g_{k}, q_{k} : = C {\hat{w}}_{k} = A z_{k} and M p_{k} : = q_{k} .

(3.3)

The first and third definitions in (3.3) are meant to avoid inverse matrix calculations. From the definition of the step size we obtain ${\hat{α}}_{k} = {\hat{g}}_{k}^{T} {\hat{w}}_{k} / {||{\hat{w}}_{k}||}_{2}^{2}$ . We transform through (3.3) the numerator as

{\hat{g}}_{k}^{T} {\hat{w}}_{k} = (C^{- 1} g_{k})^{T} \hat{A} (C^{- 1} g_{k}) = g_{k}^{T} M^{- 1} A M^{- 1} g_{k} = z_{k}^{T} A z_{k} = z_{k}^{T} q_{k},

and the denominator becomes

{\hat{w}}_{k}^{T} {\hat{w}}_{k} = z_{k}^{T} A C^{- 1} C^{- 1} A z_{k} = q_{k}^{T} M^{- 1} q_{k} = q_{k}^{T} p_{k},

obtaining

{\hat{α}}_{k} = \frac{z_{k}^{T} q_{k}}{q_{k}^{T} p_{k}} .

(3.4)

Now, we shall reproduce the smoothing technique as in ²⁴24 J.L. Lamotte, B. Molina & M. Raydan. Smooth and adaptive gradient method with retards. Mathematical and Computer Modelling, 36 (2002), 1161-1168.. Consider the next MG iterated point

{\hat{y}}_{k} = {\hat{x}}_{k} - {\hat{α}}_{k} {\hat{g}}_{k} = C x_{k} - {\hat{α}}_{k} C^{- 1} g_{k} = C (x_{k} - {\hat{α}}_{k} z_{k}),

and

{\hat{r}}_{k} = {\hat{g}}_{k} - {\hat{α}}_{k} {\hat{w}}_{k} = C^{- 1} (g_{k} - {\hat{α}}_{k} A z_{k}) = C^{- 1} (g_{k} - {\hat{α}}_{k} q_{k}),

from which we express $u_{k} : = C^{- 1} {\hat{y}}_{k}$ and $v_{k} : = C {\hat{r}}_{k}$ by

u_{k} = x_{k} - {\hat{α}}_{k} z_{k} and v_{k} = g_{k} - {\hat{α}}_{k} q_{k} .

(3.5)

The delaying smoothing step-size is defined for the “hat” system as

{\hat{β}}_{k} : = {argmin}_{β \in ℝ} {||\nabla \hat{f} ({\hat{x}}_{k - 1} + β ({\hat{y}}_{k} - {\hat{x}}_{k - 1}))||}_{2},

which can be written as

{\hat{β}}_{k} = {\hat{g}}_{k - 1}^{T} ({\hat{g}}_{k - 1} - {\hat{r}}_{k}) / {||{\hat{g}}_{k - 1} - {\hat{r}}_{k}||}_{2}^{2} .

Each factor gives

{\hat{g}}_{k - 1}^{T} ({\hat{g}}_{k - 1} - {\hat{r}}_{k}) = g_{k - 1}^{T} C^{- 1} (C^{- 1} g_{k - 1} - {\hat{r}}_{k}) = g_{k - 1}^{T} M^{- 1} (g_{k - 1} - v_{k})

and

{||{\hat{g}}_{k - 1} - {\hat{r}}_{k}||}_{2}^{2} = ({\hat{g}}_{k - 1} - {\hat{r}}_{k})^{T} ({\hat{g}}_{k - 1} - {\hat{r}}_{k}) = (g_{k - 1} - v_{k})^{T} M^{- 1} (g_{k - 1} - v_{k}),

from which we obtain

{\hat{β}}_{k} = \frac{g_{k - 1}^{T} s_{k}}{(g_{k - 1} - v_{k})^{T} s_{k}}, where M s_{k} : = g_{k - 1} - v_{k} .

(3.6)

Lastly, from

{\hat{x}}_{k + 1} = {\hat{x}}_{k - 1} + {\hat{β}}_{k} ({\hat{y}}_{k} - {\hat{x}}_{k - 1})

and

{\hat{g}}_{k + 1} = {\hat{g}}_{k - 1} + {\hat{β}}_{k} ({\hat{r}}_{k} - {\hat{g}}_{k - 1})

we obtain

x_{k + 1} = x_{k - 1} + {\hat{β}}_{k} (u_{k} - x_{k - 1}) and g_{k + 1} = g_{k - 1} + {\hat{β}}_{k} (v_{k} - g_{k - 1}) .

(3.7)

The preconditioned delayed weighted gradient method can be summarized by equations (3.3) - (3.7) in the Algorithm 2. The input data for the algorithm are the coefficient matrix A, the preconditioner M, the observation vector b, and the starting point x ₀.

Algorithm 2
PDWGM

The computational cost of the preconditioned delayed weighted gradient method is the compu- tational cost of DWGM plus the cost of solving three linear systems. Of course, three linear sys- tems increases the computational cost. But, as we are working with symmetric definite systems, Jacobi, SSOR or incomplete Cholesky preconditioners are good choices that make the cost not increase excessively. In short, although it is necessary to solve three linear systems per iteration, they all have the same associated matrix which is typically diagonal or triangular. Thus, as men- tioned in section 5, each linear system can be solved with a cost 𝒪(n) or 𝒪(n ²) operations. In the end, we expect the higher computational cost per iteration of PDWGM algorithm is compensated by the smaller number of iterations when compared to the DWGM algorithm. The convergence properties for PDWGM are inherited from DWGM, since the iterations are equivalent through the “hat” transformations.

4. CONVERGENCE

We now present the convergence result of the PDWGM. As the matrix C is a linear operator on a finite-dimensional linear space, then C is continuous. It means that convergence properties for ${{\hat{x}}_{k}}$ or ${{\hat{g}}_{k}}$ follow directly from the convergence of {x _k } or {g _k }, and vice-versa. Associated to a positive definite matrix B, let us denote the B−norm by ${||\cdot||}_{B} = {||B (\cdot)||}_{2}$ .

Lemma 4.1.Let {x _k } be the sequence generated by theAlgorithm 1. Then for each k

{||g_{k + 1}||}_{M^{- 1 / 2}}^{2} \leq {||r_{k}||}_{M^{- 1 / 2}}^{2} < {||g_{k}||}_{M^{- 1 / 2}}^{2} \leq {||r_{k - 1}||}_{M^{- 1 / 2}}^{2} .

(4.1)

Proof. From Lemma 1 in ²⁵25 H.F. Oviedo-Leon. A delayed weighted gradient method for strictly convex quadratic minimization. Computational Optimization and Applications , 74 (2019), 729-746., when applied to the “hat” problem (3.2), we have ${||{\hat{g}}_{k + 1}||}_{2} \leq {||{\hat{r}}_{k}||}_{2} < {||{\hat{g}}_{k}||}_{2} \leq {||{\hat{r}}_{k - 1}||}_{2}$ , which in turn can be written as in (4.1). □

The next lemma establishes bounds on the eigenvalues of a product of definite positive matrices in terms of a product on the eigenvalues of their factors.

Lemma 4.2.Given definite positive matrices A and B, $λ_{1} (A) \geq λ_{2} (A) \geq, \dots, \geq λ_{n} (A) > 0$ and $λ_{1} (B) \geq λ_{2} (B) \geq, \dots, \geq λ_{n} (B) > 0$ the respective eigenvalues, then for all $i, j, k \in {1, 2, \dots, n}$ such that j + k ≤ i + 1,

λ_{i} (A B) \leq λ_{j} (A) λ_{k} (B)

(4.2)

and λ_{n - j + 1} (A) λ_{n - k + 1} (B) \leq λ_{n - i + 1} (A B) .

(4.3)

In particular, for i = 1, . . . , n

λ_{i} (A) λ_{n} (B) \leq λ_{i} (A B) \leq λ_{1} (A) λ_{i} (B) .

(4.4)

For a proof see [⁶6 D.S. Bernstein. “Matrix Mathematics: Theory, Facts, and Formulas”. Princeton University Press, second ed. (2011)., Fact ⁸8 E. Birgin, I. Chambouleyron & J.M. Martínez. Estimation of the optical constants and the thickness of thin films using unconstrained optimization. Computational Physics, (151) (1999), 862-880.^).(¹⁸18 G.E. Forsythe & T.S. Motzkin. Asymptotic properties of the optimum gradient method, Preliminary report. Bull. Amer. Math. Soc, 57 (1951), 304-305.^).(¹⁷17 W. Ford. “Numerical Linear Algebra with Applications: Using MATLAB”. Elsevier Inc, New York (2015).]. We shall apply this lemma to express the convergence rate of Algorithm 1 while running with preconditioning:

Theorem 4.1.Consider the problem(1.2), the sequence {x _k } generated byAlgorithm 2, and let the eigenvectors for A ^1/2 and M ⁻¹ be $λ_{1} (A^{1 / 2}) > λ_{2} (A^{1 / 2}) > \dots > λ_{n} (A^{1 / 2}) > 0$ and $λ_{1} (M^{- 1 / 2}) > λ_{2} (M^{- 1 / 2}) > \dots > λ_{n} (M^{- 1 / 2}) > 0$ respectively. Then the sequence {x _k } converges to A ⁻¹ b Q-linearly with convergence factor

\frac{λ_{1} (A^{1 / 2}) λ_{1} (M^{- 1 / 2}) - λ_{n} (A^{1 / 2}) λ_{n} (M^{- 1 / 2})}{(λ_{1} (A^{1 / 2}) + λ_{n} (A^{1 / 2})) λ_{n} (M^{- 1 / 2})} .

Proof. By the Theorem 1 in ²⁵25 H.F. Oviedo-Leon. A delayed weighted gradient method for strictly convex quadratic minimization. Computational Optimization and Applications , 74 (2019), 729-746. applied to the “hat” problem (3.2), we have for each k that

||{\hat{g}}_{k + 1}|| \leq (\frac{{\hat{λ}}_{1} - {\hat{λ}}_{n}}{{\hat{λ}}_{1} + {\hat{λ}}_{n}}) ||{\hat{g}}_{k}||,

(4.5)

where ${\hat{λ}}_{i}$ stands for the i-th eigenvalue of ${\hat{A}}^{1 / 2}$ . Now, since $\hat{A} = M^{- 1 / 2} A M^{- 1 / 2}$ we have for each i, $λ_{i} ({\hat{A}}^{1 / 2}) = λ_{i} ((M^{- 1 / 2} A M^{- 1 / 2})^{1 / 2}) = λ_{i} (A^{1 / 2} M^{- 1 / 2})$ . Then we can write the factor as

λ_{i} ({\hat{A}}^{1 / 2}) = λ_{i} ((M^{- 1 / 2} A M^{- 1 / 2})^{1 / 2}) = λ_{i} (A^{1 / 2} M^{- 1 / 2})

By using (4.4) twice, with i = 1 and i = n we get

λ_{1} (A^{1 / 2}) λ_{n} (M^{- 1 / 2}) \leq λ_{1} (A^{1 / 2} M^{- 1 / 2}) \leq λ_{1} (A^{1 / 2}) λ_{1} (M^{- 1 / 2})

and

λ_{n} (A^{1 / 2}) λ_{n} (M^{- 1 / 2}) \leq λ_{n} (A^{1 / 2} M^{- 1 / 2}) \leq λ_{1} (A^{1 / 2}) λ_{n} (M^{- 1 / 2}) .

From above relations we obtain

λ_{1} (A^{1 / 2} M^{- 1 / 2}) - λ_{n} (A^{1 / 2} M^{- 1 / 2}) \leq λ_{1} (A^{1 / 2}) λ_{1} (M^{- 1 / 2}) - λ_{n} (A^{1 / 2}) λ_{n} (M^{- 1 / 2})

and

λ_{1} (A^{1 / 2} M^{- 1 / 2}) + λ_{n} (A^{1 / 2} M^{- 1 / 2}) \geq λ_{1} (A^{1 / 2}) λ_{n} (M^{- 1 / 2}) + λ_{n} (A^{1 / 2}) λ_{n} (M^{- 1 / 2}),

which leads to the factor. Now, by using the relation $||{\hat{g}}_{k}|| = {||g_{k}||}_{M^{- 1 / 2}}$ , we can write (4.5) as

{||g_{k + 1}||}_{M^{- 1 / 2}} \leq (\frac{λ_{1} (A^{1 / 2}) λ_{1} (M^{- 1 / 2}) - λ_{n} (A^{1 / 2}) λ_{n} (M^{- 1 / 2})}{(λ_{1} (A^{1 / 2}) + λ_{n} (A^{1 / 2})) λ_{n} (M^{- 1 / 2})}) {||g_{k}||}_{M^{- 1 / 2}} .

It follows that {g _k } converges to zero Q-linearly with convergence factor

\frac{λ_{1} (A^{1 / 2}) λ_{1} (M^{- 1 / 2}) - λ_{n} (A^{1 / 2}) λ_{n} (M^{- 1 / 2})}{(λ_{1} (A^{1 / 2}) + λ_{n} (A^{1 / 2})) λ_{n} (M^{- 1 / 2})}

and hence, since A is positive definite, we also conclude that {x _k } tends to the unique minimizer of f when k goes to infinity. □

Let us consider the spectral condition number $κ_{2} (\hat{A})$ of Â. Since preconditioners are meant to improve the conditionning of a matrix, in general we will have $κ_{2} (A) ⩾ κ_{2} (\hat{A})$ . The convergence factor can be rewritten as $(κ_{2} (\hat{A}) - 1) / (κ_{2} (\hat{A}) + 1)$ . Since the function f (x) = (x − 1)/(x + 1) is increasing and $κ_{2} (A) ⩾ κ_{2} (\hat{A})$ , then the preconditioned DWGM converges in fewer iterations than the ordinary DWGM.

Now, we want to prove that PDWGM admits finite termination, in exact arithmetic.

In ²2 R. Andreani & M. Raydan. Properties of the delayed weighted gradient method. Computational Optimization and Applications, 78 (2021), 167-180. it was demonstrated the A−orthogonality property of all previous gradients. Applied to the “hat” problem (3.2), it follows that for all k,

{\hat{g}}_{k}^{T} \hat{A} {\hat{g}}_{j} = 0, \forall j \leq k - 1 .

(4.6)

By translating to the original problem we obtain for each k,

g_{k}^{T} M^{- 1} A M^{- 1} g_{j} = 0, \forall j \leq k - 1 .

(4.7)

In other words, g _k is M ⁻¹ AM ⁻¹−orthogonal to all previous gradient vectors. Notice that (4.7) can be written as $z_{k}^{T} A z_{j} = 0, \forall j \leq k - 1$ . The property (4.7) leads to the demonstration of the finite convergence theorem.

Theorem 4.2.For any initial guess x₀ ∈ ℝⁿ , PDWGM applied to problem(1.2)generates the iterated points x _k , k ≥ 1 such that x _n = A ⁻¹ b.

Proof. From (4.7) we have that the n vectors g _k , k = 0, 1, . . . , n − 1 form an M ⁻¹ AM ⁻¹−orthogonal set, and then they form a linearly independent set of n vectors in ℝⁿ . Therefore, the next vector, g _n ∈ ℝⁿ must be zero to be able to keep the M ⁻¹ AM ⁻¹−orthogonality with all the previous gradient vectors. Since, g _n = Ax _n − b = 0 we obtain our claim. □

Another nice property of the DWGM algorithm preserved by our preconditioned version is that it terminates at p iterations, where p stands for the number of distinct eigenvalues of M ⁻¹ AM ⁻¹. The key related results are Lemma 7 and Theorem 9 on ²2 R. Andreani & M. Raydan. Properties of the delayed weighted gradient method. Computational Optimization and Applications, 78 (2021), 167-180., which we write without proof at the sequel. Notice that for each k the gradient vector generated by PDWGM belongs to the Krylov subspace 𝒦_k+1 (M ⁻¹ AM ⁻¹ , g ₀) depending on A, g ₀ and M.

Lemma 4.3.In the algorithm PDWGM, for all k ≥ 1,

g_{k} \in K_{k + 1} (M^{- 1} A M^{- 1}, g_{0}) : = s p a n {g_{0}, M^{- 1} A M^{- 1} g_{0}, \dots, (M^{- 1} A M^{- 1})^{k} g_{0}} .

Theorem 4.3.If M⁻¹AM⁻¹has only p < n distinct eigenvalues, then for any initial guess x₀ ∈ ℝⁿ , the algorithm PDWGM generates the iterated points x _k , k ≥ 1, such that x _p = A ⁻¹ b.

Notice from (2.1) that to move from x _k to x _k+1 the Algorithm 1 looks for a point in the line which passes through x _k−1 and x _k from which walk in the gradient direction, involving these two search directions. Analogously, in the case of Algorithm 2, the search involves the directions x _k−1 − x _k and the transformed gradient z _k . The next theorem establishes the A-orthogonality of the sequences of these search directions. First notice that

{\hat{g}}_{k}^{T} \hat{A} ({\hat{x}}_{k} - {\hat{x}}_{k - 1}) = g_{k}^{T} M^{- 1} (g_{k} - g_{k - 1}) = z_{k + 1}^{T} (g_{k} - g_{k - 1}) .

Lemma 4.4.In the algorithm PDWGM, the following statements hold for all k ≥ 1.

$z_{k + 1}^{T} (v_{k} - g_{k - 1}) = 0 .$
$z_{k + 1}^{T} (g_{k} - g_{k - 1}) = 0 .$
$z_{k + 1}^{T} g_{k + 1} = z_{k + 1}^{T} g_{k} = z_{k + 1}^{T} g_{k - 1} .$
$β_{k} = z_{k - 1}^{T} (g_{k - 1} - v_{k}) / ({||g_{k - 1} - g_{k}||}_{M^{- 1 / 2}}^{2} - [(z_{k}^{T} A z_{k})^{2} / {||A z_{k}||}_{M^{- 1 / 2}}^{2}]) > 1 .$

Proof.

a) From steps 10, 12 and 13 of the algorithm we get

z_{k + 1}^{T} (v_{k} - g_{k - 1}) = g_{k + 1}^{T} M^{- 1} (v_{k} - g_{k - 1}) = (g_{k - 1} + β_{k} (v_{k} - g_{k - 1}))^{T} M^{- 1} (v_{k} - g_{k - 1}) = g_{k - 1}^{T} M^{- 1} (v_{k} - g_{k - 1}) + \frac{g_{k - 1}^{T} M^{- 1} (g_{k - 1} - v_{k})}{(g_{k - 1} - v_{k})^{T} M^{- 1} (g_{k - 1} - v_{k})} (v_{k} - g_{k - 1})^{T} M^{- 1} (v_{k} - g_{k - 1}) = 0 .

b) From step 8, (a) and (4.7) we have

z_{k + 1}^{T} (g_{k} - g_{k - 1}) = g_{k + 1}^{T} M^{- 1} (g_{k} - g_{k - 1}) = g_{k + 1}^{T} M^{- 1} (v_{k} + α_{k} A z_{k} - g_{k - 1}) = g_{k + 1}^{T} M^{- 1} (v_{k} - g_{k - 1}) + α_{k} g_{k + 1}^{T} M^{- 1} A M^{- 1} g_{k} = 0 .

c) By steps 8, 12 and 13

z_{k + 1}^{T} g_{k + 1} = g_{k + 1}^{T} M^{- 1} g_{k + 1} = g_{k + 1}^{T} M^{- 1} (g_{k - 1} + β_{k} (v_{k} - g_{k - 1})) = g_{k + 1}^{T} M^{- 1} (g_{k - 1} + β_{k} (g_{k} - g_{k - 1}) - α_{k} β_{k} A z_{k}) = g_{k + 1}^{T} M^{- 1} g_{k - 1} + β_{k} g_{k + 1}^{T} M^{- 1} (g_{k} - g_{k - 1}) - α_{k} β_{k} g_{k + 1}^{T} M^{- 1} A M^{- 1} g_{k} .

The second term is zero by part (b) and the third by (4.7). Then, we obtain $z_{k + 1}^{T} g_{k + 1} = z_{k + 1}^{T} g_{k - 1}$ . Again from (b) we notice that $z_{k + 1}^{T} g_{k} = z_{k + 1}^{T} g_{k - 1}$ obtaining our claim.

d) By Cauchy-Schwarz inequality and step 13 we obtain

z_{k - 1}^{T} g_{k} = g_{k - 1}^{T} M^{- 1} g_{k} = ||{\hat{g}}_{k - 1}|| ||{\hat{g}}_{k}|| < {||{\hat{g}}_{k - 1}||}^{2} = g_{k - 1}^{T} M^{- 1} g_{k - 1} = z_{k - 1}^{T} g_{k - 1} .

Also,

z_{k - 1}^{T} v_{k} = z_{k - 1}^{T} (g_{k} - α_{k} A z_{k}) = z_{k - 1}^{T} g_{k} - α_{k} z_{k - 1}^{T} A z_{k} .

By using the A-orthogonality property (4.7) we obtain $z_{k - 1}^{T} v_{k} = z_{k - 1}^{T} g_{k}$ . Therefore, the last two expressions lead to

z_{k - 1}^{T} (g_{k - 1} - v_{k}) = z_{k - 1}^{T} (g_{k - 1} - g_{k}) > 0 .

This means that the numerator on step 10 is positive, and so, since the denominator can be written as ${||g_{k - 1} - v_{k}||}_{M^{- 1 / 2}}^{2}$ , also β _k > 0 for all k ≥ 0. Now we examine the denominator. By steps 4, 6 and 8, and algebraic manipulation we get

{||g_{k - 1} - v_{k}||}_{M^{- 1 / 2}}^{2} = {||g_{k - 1} - g_{k} + α_{k} A z_{k}||}_{M^{- 1 / 2}}^{2} = {||g_{k - 1} - g_{k}||}_{M^{- 1 / 2}}^{2} + 2 α_{k} z_{k}^{T} A M^{- 1} (g_{k - 1} - g_{k}) + α_{k}^{2} z_{k}^{T} A M^{- 1} A z_{k} = {||g_{k - 1} - g_{k}||}_{M^{- 1 / 2}}^{2} + 2 α_{k} (z_{k}^{T} A z_{k - 1} - z_{k}^{T} A z_{k}) + α_{k}^{2} z_{k}^{T} A M^{- 1} A z_{k} = {||g_{k - 1} - g_{k}||}_{M^{- 1 / 2}}^{2} - 2 (\frac{z_{k}^{T} A z_{k}}{z_{k} A M^{- 1} A z_{k}}) z_{k}^{T} A z_{k} + {(\frac{z_{k}^{T} A z_{k}}{z_{k}^{T} A M^{- 1} A z_{k}})}^{2} z_{k}^{T} A M^{- 1} A z_{k} = {||g_{k - 1} - g_{k}||}_{M^{- 1 / 2}}^{2} - \frac{(z_{k}^{T} A z_{k})^{2}}{z_{k}^{T} A M^{- 1} A z_{k}} .

So, $β_{k} = z_{k - 1}^{T} (g_{k - 1} - v_{k}) / ({||g_{k - 1} - g_{k}||}_{M^{- 1 / 2}}^{2} - [(z_{k}^{T} A z_{k})^{2} / {||A z_{k}||}_{M^{- 1 / 2}}^{2}])$ . Since β _k and the numerator $z_{k - 1}^{T} (g_{k - 1} - v_{k})$ are strictly positive, then the denominator must also be strictly positive. Then,

0 < (z_{k}^{T} A z_{k})^{2} / {||A z_{k}||}_{M^{- 1 / 2}}^{2} < {||g_{k - 1} - g_{k}||}_{M^{- 1 / 2}}^{2} .

From (c) we know $z_{k}^{T} (g_{k} - g_{k - 1}) = 0$ , and hence

0 = (g_{k} - g_{k - 1} + g_{k - 1})^{T} M^{- 1} (g_{k} - g_{k - 1}) = {||g_{k - 1} - g_{k}||}_{M^{- 1 / 2}}^{2} - z_{k - 1}^{T} (g_{k - 1} - g_{k}),

obtaining that $z_{k - 1}^{T} (g_{k - 1} - g_{k}) = {||g_{k - 1} - g_{k}||}_{M^{- 1 / 2}}^{2}$ . By step 8 and (4.7) we have

z_{k - 1}^{T} (g_{k} - v_{k}) = - α_{k} z_{k - 1}^{T} A z_{k} = 0,

so $z_{k - 1}^{T} g_{k} = z_{k - 1}^{T} v_{k}$ . This fact is used to conclude that the numerator in (d) is strictly bigger than the denominator and since both are positive we finally have β _k > 1 for all k.

□

Now we are ready to establish the abovementioned orthogonality result:

Theorem 4.4.The algorithm PDWGM generates sequences g_kand z_ksuch that, for k ≥ 2,

z_{k}^{T} (g_{j} - g_{j - 1}) = 0, \forall 1 \leq j \leq k .

(4.8)

Proof. From Lemma 4.4 (b) we get $z_{2}^{T} (g_{1} - g_{0}) = 0$ , and so the result is true for k = 2. Let us assume by induction on k that (4.8) holds up to $k = \hat{k} \geq 3$ and consider the next iteration. So we need to show that $z_{\hat{k} + 1}^{T} (g_{j} - g_{j - 1}) = 0$ for all $1 \leq j \leq \hat{k}$ . When $j = \hat{k}$ the result follows directly from part (b) of lemma above. Now, $j \leq \hat{k} - 2$ , by using steps 8 and 12, the inductive hypothesis and (4.7) we have

z_{\hat{k} + 1}^{T} (g_{j} - g_{j - 1}) = g_{\hat{k} + 1}^{T} M^{- 1} (g_{j} - g_{j - 1}) = [g_{\hat{k} - 1} + β_{\hat{k}} (v_{\hat{k}} - g_{\hat{k} - 1})]^{T} M^{- 1} (g_{j} - g_{j - 1}) = [(1 - β_{\hat{k}}) g_{\hat{k} - 1} + β_{k} v_{\hat{k}}]^{T} M^{- 1} (g_{j} - g_{j - 1}) = β_{\hat{k}} v_{\hat{k}}^{T} M^{- 1} (g_{j} - g_{j - 1}) = β_{\hat{k}} (g_{\hat{k}} - α_{\hat{k}} A z_{\hat{k}})^{T} M^{- 1} (g_{j} - g_{j - 1}) = - β_{\hat{k}} α_{\hat{k}} g_{\hat{k}}^{T} M^{- 1} A M^{- 1} (g_{j} - g_{j - 1}) = 0 .

Finally, when $j = \hat{k} - 1$ , by using steps 8, 12 and 13, as well as the inductive hypothesis and (4.7) we have

z_{\hat{k} + 1}^{T} (g_{\hat{k} - 1} - g_{\hat{k} - 2}) = g_{\hat{k} + 1}^{T} M^{- 1} (g_{\hat{k} - 1} - g_{\hat{k} - 2}) = g_{\hat{k} + 1}^{T} M^{- 1} (g_{\hat{k} - 1} - v_{k} + v_{k} - g_{\hat{k} - 2}) = g_{\hat{k} + 1}^{T} M^{- 1} (g_{\hat{k}} - α_{\hat{k}} A z_{\hat{k}} - g_{\hat{k} - 2}) = g_{\hat{k} + 1}^{T} M^{- 1} (g_{\hat{k}} - g_{\hat{k} - 2}) = g_{\hat{k} + 1}^{T} M^{- 1} (g_{\hat{k} - 2} + β_{\hat{k} - 1} (v_{\hat{k} - 1} - g_{\hat{k} - 2}) - g_{\hat{k} - 2}) = β_{\hat{k} - 1} g_{\hat{k} + 1}^{T} M^{- 1} (v_{\hat{k} - 1} - g_{\hat{k} - 2}) = β_{\hat{k} - 1} z_{\hat{k} + 1}^{T} (v_{\hat{k} - 1} - g_{\hat{k} - 2}) .

Therefore, $(β_{\hat{k}} - 1) z_{\hat{k} - 1}^{T} (v_{\hat{k} - 1} - g_{\hat{k} - 2}) = 0$ . Since $β_{\hat{k}} > 1$ for all k ≥ 1 and noticing that $z_{\hat{k} - 1}^{T} v_{\hat{k} - 1} = z_{\hat{k} - 1}^{T} g_{\hat{k} - 1}$ , we conclude $z_{\hat{k} - 1}^{T} (v_{\hat{k} - 1} - g_{\hat{k} - 2}) = 0$ , and so (4.8) is established. □

5. NUMERICAL EXPERIMENTS

In this section we show the preconditioned DWGM algorithm has a numerical behaviour similar to the preconditioned CG algorithm. We present some numerical experiments to validate our claim. In all examples we compare the performance of the CG ²²22 C.T. Kelley. “Iterative Methods for Linear and Nonlinear Equations”. Society for Industrial and Applied Mathematics (1995)., preconditioned CG ²²22 C.T. Kelley. “Iterative Methods for Linear and Nonlinear Equations”. Society for Industrial and Applied Mathematics (1995)., DWGM ²⁵25 H.F. Oviedo-Leon. A delayed weighted gradient method for strictly convex quadratic minimization. Computational Optimization and Applications , 74 (2019), 729-746. and the preconditioned DWGM algorithms. All the experiments were performed on an intel(R) CORE(TM) i7-4770, CPU 3.40 GHz with 16 GB RAM.

We propose, initially, an experiment to compare the performance of DWGM and PDWGM with the most robust iterative method for solving strictly convex quadratic problems, the conjugate gradient method and its preconditioned version.

We obtained seventy matrices from the SuiteSparse Matrix Collection¹ 1 formerly the University of Florida Sparse Matrix Collection. ¹³13 T.A. Davis & Y. Hu. The University of Florida Sparse Matrix Collection. ACM Trans. Math. Softw., 38(1) (2011). doi:10.1145/2049662.2049663.
https://doi.org/10.1145/2049662.2049663... ^{), (}²³23 S.P. Kolodziej, M. Aznaveh, M. Bullock, J. David, T.A. Davis, M. Henderson, Y. Hu & R. Sandstrom. The SuiteSparse Matrix Collection Website Interface. Journal of Open Source Software, 4(35) (2019), 1244. doi:10.21105/joss.01244.
https://doi.org/10.21105/joss.01244... . Then we solved the resulting seventy linear systems with the CG, preconditioned CG, DWGM and the precondi- tioned DWGM algorithms with b = [1, 1, . . . , 1]^T and x ₀ = [0, 0, . . . , 0]^T . The stopping criterium used is

{||\nabla f (x_{k})||}_{2} ⩽ 10^{- 5} .

For the seventy experiments we used the Jacobi preconditioner ²⁰20 G.H. Golub & C.F. Van Loan. “Matrix Computations”. Johns Hopkins Studies in the Mathematical Sciences. Johns Hopkins University Press, 4th ed. (2012). to perform this experiment. But similar results are obtained if we use different preconditioners, like incomplete Cholesky factorization ¹⁷17 W. Ford. “Numerical Linear Algebra with Applications: Using MATLAB”. Elsevier Inc, New York (2015). or SSOR ¹⁷17 W. Ford. “Numerical Linear Algebra with Applications: Using MATLAB”. Elsevier Inc, New York (2015)., for example.

Performance profiles ¹⁵15 E.D. Dolan & J.J. Moré. Benchmarking optimization software with performance profiles. Mathematical Programming, 91 (2002), 201-213. doi:10.1007/s101070100263.
https://doi.org/10.1007/s101070100263... comparing CPU time and number of iterations are presented on Figure 1. Note that CG and DWGM algorithms have similar behaviour in terms of CPU time and num- ber of iterations. On the other hand, the comparison between PCG and PDWGM present some differences. First, we can note that, in general, PDWGM converge in less iterations than PCG, but with a higher CPU time. But we emphasize the differences are not significant.

Figure 1
Performance profiles comparing CG, PCG, DWGM and PDWGM.

Table 1 presents, for fourteen selected matrices, the number of iterations (niter), the error norm $(e_{k} = {||x_{k} - x_{*}||}_{2})$ and the gradient error norm $(r_{k} = {||A x_{k} - b||}_{2})$ .

Thumbnail

Table 1
Iteration information of CG, PCG, DWGM and PDWGM for fourteen models.

The most evident characteristic that we can observe in Table 1 is that the DWGM algorithm reaches the given tolerance in fewer iterations than the CG algorithm, this fact was also observed in ²2 R. Andreani & M. Raydan. Properties of the delayed weighted gradient method. Computational Optimization and Applications, 78 (2021), 167-180.. A second observation is that both PCG and PDWGM have a similar behavior. It means they reach the given tolerance with a similar number of iterations. The CPU times of PCG and PDWGM are also similar, sometimes PCG is faster, sometimes PDWGM is faster. But, overall the CPU times do not differ a lot. Moreover, as shown in the next figure, the gradient curves of PCG and PDWGM are resemblant. In short, the differences between PDWGM and PCG are not significant.

To corroborate the results presented in Table 1 we display six graphs with the conjugate gra- dient, preconditioned conjugate gradient, DWGM and PDWGM number of iterations versus $\log_{10} ({||\nabla f (x_{k})||}_{2})$ comparison. As pointed out above, PCG and PDWGM have a similar be- haviour. Note that the gradient curves of PCG and PDWGM have similar pattern. Nevertheless, as pointed out by ²⁵25 H.F. Oviedo-Leon. A delayed weighted gradient method for strictly convex quadratic minimization. Computational Optimization and Applications , 74 (2019), 729-746., the computational cost per iteration for CG is 2n ² + 9n − 2 flops while the computation cost per iteration for PCG is 2n ² + 17n − 4 flops. Thus, the computational cost of PCG is the cost of CG plus the resolution of a linear system, while the computational cost of PDWM is the cost of DWGM plus the resolution of three linear systems. Note that, by using the Jacobi preconditioner or a preconditioner based on the incomplete Cholesky factorization the linear systems to be solved, for each iteration are either diagonal or triangular with computational costs of 𝒪(n) and 𝒪(n ²) flops, respectively. As final observation, another effect we note is the smoothed curves of DWGM and PDWGM when compared to the well known “crispness” of the CG and PCG curves.

Figure 2
Comparison between CG, PCG, DWGM and PDWGM for six different models.

The experiment described above has, as objective, the comparison between the DWGM/PDWGM and CG/PCG algorithms. Now we propose a second experiment in order to illustrate the finite termination property in the number of distinct eigenvalues of the PDWGM algorithm (Theorem 4.3), and how preconditioning can influence over the number of steps.

This experiment is based on a construction of a preconditioned matrix with a predefined number of distinct eigenvalues. In the following procedure we present how to construct a matrix A and a preconditioner M = C ² such that Â = C ⁻¹ AC ⁻¹ has a defined number of distinct eigenvalues.

Let Q ∈ ℝ^n×n be an orthogonal matrix;
Let v ∈ ℝⁿ be a vector with random entries between 0 and 1;
Define A = QT ₁ Q ^T , with T ₁ =diag(v);
Let l ∈ ℝ^p such that l ₁ > l ₂ > · · · > l _p (eigenvalues);
Define the algebraic multiplicity n _i of each l _i , i = 1, . . . , p such that n ₁ + n ₂ + · · · + n _p = n;
Define L =diag(l ₁ , . . . , l ₁ , l ₂ , . . . , l ₂ , . . . , l _p , . . . , l _p ) ∈ ℝ^n×n ;
Define a diagonal matrix T ₂ such that T ₂(i, i) = L(i, i)/T ₁(i, i), i = 1, . . . , n;
Define M ⁻¹ = QT ₂ Q ^T or C ⁻¹ = QT ₂ ^1/2 Q ^T .

Note that,

\hat{A} = C^{- 1} A C^{- 1} = (Q T_{2}^{1 / 2} Q^{T}) (Q T_{1} Q^{T}) (Q T_{2}^{1 / 2} Q^{T}) = Q (T_{2}^{1 / 2} T_{1} T_{2}^{1 / 2}) Q^{T} = Q L Q^{T} .

Then, Â is a preconditioned matrix with clustered eigenvalues.

The objective of this experiment is to present some examples of the finite termination of DWGM and PDWGM. Firstly, a random orthogonal matrix Q is defined based on the QR decomposition ²⁰20 G.H. Golub & C.F. Van Loan. “Matrix Computations”. Johns Hopkins Studies in the Mathematical Sciences. Johns Hopkins University Press, 4th ed. (2012). of a given random matrix. Then, the procedure above is run 15 times, generating 15 pairs (A, M). As in the first example we compare DWGM and PDWGM 15 times and then we take the average and the standard deviation of the results.

The results are presented in Table 2. For DWGM and PDWGM we compare the number of iterations, the absolute error, the norm of the residual and CPU time. Moreover, n an p represent the problem dimension and the number of distinct eigenvalues of Â, respectively. Observe that for small instances the number of iterations of the algorithm without preconditioning is proportional to the dimension n, while the number of iterations of the preconditioned is proportional to the number of different eigenvalues p on the problem, as Theorem 4.3 claim.

Thumbnail

Table 2
Iteration information of CG, PCG, DWGM and PDWGM.

6. CONCLUSIONS

The delayed weighted gradient method is a two-step gradient method that aims to correct the Cauchy’s point and produce a faster solution than the classical gradient method. We have pre- sented and discussed the derivation of the preconditioned delayed weighted gradient method. Also we have shown the preconditioned conjugate gradient and the preconditioned delayed weighted gradient method have similar results, and preconditioned conjugate gradient has a sim- ilar computational cost per iteration than the preconditioned delayed weighted gradient method. Convergence and theoretical properties of the preconditioned delayed weighted gradient method are proved.

REFERENCES

¹
H. Akaike. On a successive transformation of probability distribution and its application to the analysis of the optimum gradient method. Annals of the Institute of Statistical Mathematics, 11 (1959), 1-16.
²
R. Andreani & M. Raydan. Properties of the delayed weighted gradient method. Computational Optimization and Applications, 78 (2021), 167-180.
³
R.D. Asmundis, D. di Serafino, W.W. Hager, G. Toraldo & H. Zhang. An efficient gradient method using the Yuan steplength. Computational Optimization and Applications , 59 (2014), 541-563.
⁴
R.D. Asmundis, D. di Serafino, F. Riccio & G. Toraldo. On spectral properties of steepest descent methods. IMA Journal of Numerical Analysis, 33(4) (2013), 1416-1435.
⁵
J. Barzilai & J.M. Borwein. Two-point step size gradient methods. IMA Journal of Numerical Analysis , 8(1) (1988), 141-148.
⁶
D.S. Bernstein. “Matrix Mathematics: Theory, Facts, and Formulas”. Princeton University Press, second ed. (2011).
⁷
D. Bertaccini & F. Durastante. “Iterative Methods and Preconditioning for Large and Sparse Linear Systems with Applications”. Chapman and Hall/CRC, New York (2018).
⁸
E. Birgin, I. Chambouleyron & J.M. Martínez. Estimation of the optical constants and the thickness of thin films using unconstrained optimization. Computational Physics, (151) (1999), 862-880.
⁹
E.G. Birgin, J.M. Martínez & M. Raydan. Spectral projected gradient methods: review and perspectives. Journal of Statistical Software, 60(3) (2014), 1-21.
¹⁰
A. Cauchy. Méthode générale pour la résolution des systemes d’équations simultanées. Comptes Rendus Sci. Paris, 25 (1847), 536-538.
¹¹
Y.H. Dai, Y. Huang & X.W. Liu. A family of spectral gradient methods for optimization. Computational Optimization and Applications , 74 (2019), 43-65.
¹²
Y.H. Dai & L.Z. Liao. R-linear convergence of the Barzilai and Borwein gradient method. IMA Journal of Numerical Analysis , 22(1) (2002), 1-10.
¹³
T.A. Davis & Y. Hu. The University of Florida Sparse Matrix Collection. ACM Trans. Math. Softw., 38(1) (2011). doi:10.1145/2049662.2049663.
» https://doi.org/10.1145/2049662.2049663
¹⁴
D. di Serafino, V. Ruggiero, G. Toraldo & L. Zanni. On the steplength selection in gradient methods for unconstrained optimization. Applied Mathematics and Computation, 318 (2018), 176-195.
¹⁵
E.D. Dolan & J.J. Moré. Benchmarking optimization software with performance profiles. Mathematical Programming, 91 (2002), 201-213. doi:10.1007/s101070100263.
» https://doi.org/10.1007/s101070100263
¹⁶
M.A. Figueiredo, R. Nowak & S. Wright. Projection for sparse reconstruction: application to compressed sensing and other inverse problems. IEEE J. Sel. Top. Signal Process, (1) (2007), 586-597.
¹⁷
W. Ford. “Numerical Linear Algebra with Applications: Using MATLAB”. Elsevier Inc, New York (2015).
¹⁸
G.E. Forsythe & T.S. Motzkin. Asymptotic properties of the optimum gradient method, Preliminary report. Bull. Amer. Math. Soc, 57 (1951), 304-305.
¹⁹
A. Friedlander, J.M. Martínez, B. Molina & M. Raydan. Gradient method with retards and generalizations. SIAM Journal on Numerical Analysis, 36(1) (1999), 275-289.
²⁰
G.H. Golub & C.F. Van Loan. “Matrix Computations”. Johns Hopkins Studies in the Mathematical Sciences. Johns Hopkins University Press, 4th ed. (2012).
²¹
Y. Huang, Y.H. Dai, X.W. Liu & H. Zhang. Gradient methods exploiting spectral properties. Optimization Methods and Software, 35(4) (2020), 681-705.
²²
C.T. Kelley. “Iterative Methods for Linear and Nonlinear Equations”. Society for Industrial and Applied Mathematics (1995).
²³
S.P. Kolodziej, M. Aznaveh, M. Bullock, J. David, T.A. Davis, M. Henderson, Y. Hu & R. Sandstrom. The SuiteSparse Matrix Collection Website Interface. Journal of Open Source Software, 4(35) (2019), 1244. doi:10.21105/joss.01244.
» https://doi.org/10.21105/joss.01244
²⁴
J.L. Lamotte, B. Molina & M. Raydan. Smooth and adaptive gradient method with retards. Mathematical and Computer Modelling, 36 (2002), 1161-1168.
²⁵
H.F. Oviedo-Leon. A delayed weighted gradient method for strictly convex quadratic minimization. Computational Optimization and Applications , 74 (2019), 729-746.
²⁶
T. Serafini, G. Zanghirati & L. Zanni. Gradient projection methods for large quadratic programs and applications in training support vector machines. Optimization Methods and Software , (20) (2005), 353-378.
²⁷
B.V. Shah, R.J. Buehler & O. Kempthorne. Some Algorithms for Minimizing a Function of Several Variables. Journal of the Society for Industrial and Applied Mathematics, 12(1) (1964), 74-92.
²⁸
H. Sorenson. Comparison of some conjugate direction procedures for function minimization. Journal of the Franklin Institute, 288(6) (1969), 421-441.
²⁹
Y.X. Yuan. A new stepsize for the steepest decent method. Journal of Computational Mathematics, 24(2) (2006), 149-156.

1
formerly the University of Florida Sparse Matrix Collection.

Publication Dates

Publication in this collection
28 July 2023
Date of issue
Jul-Sep 2023

History

Received
17 Jan 2022
Accepted
16 Nov 2022

This is an open-access article distributed under the terms of the Creative Commons Attribution License

[1] ¹
H. Akaike. On a successive transformation of probability distribution and its application to the analysis of the optimum gradient method. Annals of the Institute of Statistical Mathematics, 11 (1959), 1-16.

[2] ²
R. Andreani & M. Raydan. Properties of the delayed weighted gradient method. Computational Optimization and Applications, 78 (2021), 167-180.

[3] ³
R.D. Asmundis, D. di Serafino, W.W. Hager, G. Toraldo & H. Zhang. An efficient gradient method using the Yuan steplength. Computational Optimization and Applications , 59 (2014), 541-563.

[4] ⁴
R.D. Asmundis, D. di Serafino, F. Riccio & G. Toraldo. On spectral properties of steepest descent methods. IMA Journal of Numerical Analysis, 33(4) (2013), 1416-1435.

[5] ⁵
J. Barzilai & J.M. Borwein. Two-point step size gradient methods. IMA Journal of Numerical Analysis , 8(1) (1988), 141-148.

[6] ⁶
D.S. Bernstein. “Matrix Mathematics: Theory, Facts, and Formulas”. Princeton University Press, second ed. (2011).

[7] ⁷
D. Bertaccini & F. Durastante. “Iterative Methods and Preconditioning for Large and Sparse Linear Systems with Applications”. Chapman and Hall/CRC, New York (2018).

[8] ⁸
E. Birgin, I. Chambouleyron & J.M. Martínez. Estimation of the optical constants and the thickness of thin films using unconstrained optimization. Computational Physics, (151) (1999), 862-880.

[9] ⁹
E.G. Birgin, J.M. Martínez & M. Raydan. Spectral projected gradient methods: review and perspectives. Journal of Statistical Software, 60(3) (2014), 1-21.

[10] ¹⁰
A. Cauchy. Méthode générale pour la résolution des systemes d’équations simultanées. Comptes Rendus Sci. Paris, 25 (1847), 536-538.

[11] ¹¹
Y.H. Dai, Y. Huang & X.W. Liu. A family of spectral gradient methods for optimization. Computational Optimization and Applications , 74 (2019), 43-65.

[12] ¹²
Y.H. Dai & L.Z. Liao. R-linear convergence of the Barzilai and Borwein gradient method. IMA Journal of Numerical Analysis , 22(1) (2002), 1-10.

[13] ¹³
T.A. Davis & Y. Hu. The University of Florida Sparse Matrix Collection. ACM Trans. Math. Softw., 38(1) (2011). doi:10.1145/2049662.2049663.
» https://doi.org/10.1145/2049662.2049663

[14] ¹⁴
D. di Serafino, V. Ruggiero, G. Toraldo & L. Zanni. On the steplength selection in gradient methods for unconstrained optimization. Applied Mathematics and Computation, 318 (2018), 176-195.

[15] ¹⁵
E.D. Dolan & J.J. Moré. Benchmarking optimization software with performance profiles. Mathematical Programming, 91 (2002), 201-213. doi:10.1007/s101070100263.
» https://doi.org/10.1007/s101070100263

[16] ¹⁶
M.A. Figueiredo, R. Nowak & S. Wright. Projection for sparse reconstruction: application to compressed sensing and other inverse problems. IEEE J. Sel. Top. Signal Process, (1) (2007), 586-597.

[17] ¹⁷
W. Ford. “Numerical Linear Algebra with Applications: Using MATLAB”. Elsevier Inc, New York (2015).

[18] ¹⁸
G.E. Forsythe & T.S. Motzkin. Asymptotic properties of the optimum gradient method, Preliminary report. Bull. Amer. Math. Soc, 57 (1951), 304-305.

[19] ¹⁹
A. Friedlander, J.M. Martínez, B. Molina & M. Raydan. Gradient method with retards and generalizations. SIAM Journal on Numerical Analysis, 36(1) (1999), 275-289.

[20] ²⁰
G.H. Golub & C.F. Van Loan. “Matrix Computations”. Johns Hopkins Studies in the Mathematical Sciences. Johns Hopkins University Press, 4th ed. (2012).

[21] ²¹
Y. Huang, Y.H. Dai, X.W. Liu & H. Zhang. Gradient methods exploiting spectral properties. Optimization Methods and Software, 35(4) (2020), 681-705.

[22] ²²
C.T. Kelley. “Iterative Methods for Linear and Nonlinear Equations”. Society for Industrial and Applied Mathematics (1995).

[23] ²³
S.P. Kolodziej, M. Aznaveh, M. Bullock, J. David, T.A. Davis, M. Henderson, Y. Hu & R. Sandstrom. The SuiteSparse Matrix Collection Website Interface. Journal of Open Source Software, 4(35) (2019), 1244. doi:10.21105/joss.01244.
» https://doi.org/10.21105/joss.01244

[24] ²⁴
J.L. Lamotte, B. Molina & M. Raydan. Smooth and adaptive gradient method with retards. Mathematical and Computer Modelling, 36 (2002), 1161-1168.

[25] ²⁵
H.F. Oviedo-Leon. A delayed weighted gradient method for strictly convex quadratic minimization. Computational Optimization and Applications , 74 (2019), 729-746.

[26] ²⁶
T. Serafini, G. Zanghirati & L. Zanni. Gradient projection methods for large quadratic programs and applications in training support vector machines. Optimization Methods and Software , (20) (2005), 353-378.

[27] ²⁷
B.V. Shah, R.J. Buehler & O. Kempthorne. Some Algorithms for Minimizing a Function of Several Variables. Journal of the Society for Industrial and Applied Mathematics, 12(1) (1964), 74-92.

[28] ²⁸
H. Sorenson. Comparison of some conjugate direction procedures for function minimization. Journal of the Franklin Institute, 288(6) (1969), 421-441.

[29] ²⁹
Y.X. Yuan. A new stepsize for the steepest decent method. Journal of Computational Mathematics, 24(2) (2006), 149-156.

	mesh3em5				Muu
	niter	e _k	r _k	CPU(s)	niter	e _k	r _k	CPU(s)
CG	13	2.7597e-005	4.1695e-005	0.0010	84	2.1429e-004	9.6983e-009	0.0849
PCG	13	1.8247e-005	3.5884e-005	0.0009	30	1.0021e-004	7.8161e-009	0.0329
DWGM	13	2.9586e-005	4.0226e-005	0.0009	83	3.3700e-004	7.6887e-009	0.1019
PDWGM	13	2.0587e-005	3.4235e-005	0.0019	30	1.2917e-004	6.8319e-009	0.0369
	Chem97ZtZ					bodyy4
	niter	e _k	r _k	CPU(s)	niter	e _k	r _k	CPU(s)
CG	137	3.1782e-010	9.6090e-009	0.0199	273	1.4529e-009	9.7342e-009	0.274857
PCG	32	4.5905e-010	6.6680e-009	0.0059	214	3.7600e-009	9.7342e-009	0.229854
DWGM	135	9.0344e-010	7.5092e-009	0.0239	262	5.7226e-009	9.3859e-009	0.336092
PDWGM	34	1.0241e-010	3.1926e-009	0.0069	212	5.7226e-009	9.7141e-009	0.318831
	bundle1					nasa2146
	niter	e _k	r _k	CPU(s)	niter	e _k	r _k	CPU(s)
CG	242	4.1278e-019	9.3649e-009	0.9597	429	3.9752e-014	9.2289e-009	0.1979
PCG	66	1.8484e-019	7.1148e-009	0.2953	323	4.1317e-014	9.7221e-009	0.1479
DWGM	241	8.7440e-019	9.4834e-009	0.9444	414	1.9318e-013	9.4622e-009	0.1971
PDWGM	65	3.7965e-019	8.3751e-009	0.2758	312	1.3832e-013	9.4843e-009	0.1559
	wathen120					msc00726
	niter	e _k	r _k	CPU(s)	niter	e _k	r _k	CPU(s)
CG	413	2.2624e-009	9.2956e-009	1.4831	1512	9.5563e-014	7.8397e-009	0.3557
PCG	51	2.0597e-010	6.2655e-009	0.1848	126	2.9107e-015	7.0725e-009	0.0339
DWGM	405	6.9872e-009	9.2236e-009	1.6300	1520	3.4609e-013	1.9936e-008	0.3957
PDWGM	50	4.4364e-010	8.1356e-009	0.2158	122	1.7645e-014	7.0100e-009	0.0309
	bcsstm12					Pres Poisson
	niter	e _k	r _k	CPU(s)	niter	e _k	r _k	CPU(s)
CG	4313	1.5650e-005	6.9687e-009	0.8158	2243	5.3538e-005	9.8029e-007	8.7070
PCG	504	7.2098e-006	7.1205e-009	0.0999	767	2.8998e-005	9.8692e-007	3.1102
DWGM	3396	3.0353e-004	1.9472e-008	0.8019	2119	5.7822e-004	4.8639e-006	8.5900
PDWGM	464	9.3194e-005	8.9432e-009	0.1019	758	7.2819e-005	1.2784e-006	3.2261
	msc04515					1138 bus
	niter	e _k	r _k	CPU(s)	niter	e _k	r _k	CPU(s)
CG	4812	4.8974e-014	9.6160e-007	2.9952	2000	4.1944e-005	8.8203e-005	0.2198
PCG	3825	1.4561e-012	8.6307e-007	2.4625	970	1.7307e-005	9.6794e-005	0.1249
DWGM	4698	1.0657e-012	6.0433e-005	3.1961	1966	2.5406e-004	1.0520e-004	0.2188
PDWGM	3771	1.7856e-012	6.8977e-005	2.4694	975	3.1397e-005	6.8587e-005	0.1329
	cbuckle					cvxbqp1
	niter	e _k	r _k	CPU(s)	niter	e _k	r _k	CPU(s)
CG	4944	7.4180e-005	8.7746e-005	17.4160	8148	8.6930e-006	9.8426e-005	23.7433
PCG	970	7.9553e-005	9.7548e-005	3.7778	5103	8.6930e-006	9.9575e-005	16.9593
DWGM	3960	9.6921e-004	9.9202e-005	15.3422	5942	2.7696e-004	9.9960e-005	22.8188
PDWGM	785	0.0021630	9.6522e-005	3.0412	3793	3.1636e-004	9.9852e-005	16.0228

	n=7 and p=3
	niter	e _k	r _k	CPU(s)
DWGM	7 ± 0	2.8042e-013 ± 8.5845e-013	1.7165e-013 ± 5.2349e-013	1.0973e-003 ± 1.0944e-003
PDWGM	3 ± 0	4.6075e-013 ± 1.4180e-012	1.5011e-013 ± 4.6798e-013	6.9978e-004 ± 9.4906e-004
	n=100 and p=11
	niter	e _k	r _k	CPU(s)
DWGM	50.70 ± 10.79	1.3530e-005 ± 1.4383e-005	7.7019e-007 ± 1.7123e-007	1.1192e-002 ± 2.3468e-003
PDWGM	10.90 ± 0.31	1.2297e-007 ± 3.8885e-007	7.0609e-008 ± 2.2328e-007	4.4976e-003 ± 7.0674e-004
	n=1000 and p=51
	niter	e _k	r _k	CPU(s)
DWGM	143.50 ± 26.69	1.0365e-004 ± 4.9420e-005	9.4354e-007 ± 4.3212e-008	0.0785 ± 0.0138
PDWGM	33.00 ± 4.24	1.6387e-006 ± 8.6370e-007	5.6995e-007 ± 1.5505e-007	0.0620 ± 0.0101
	n=7000 and p=623
	niter	e _k	r _k	CPU(s)
DWGM	373.40 ± 73.30	1.1075e-003 ± 7.3672e-004	9.6649e-007 ± 1.7105e-008	14.9948 ± 2.9115
PDWGM	121.40 ± 31.77	2.3420e-006 ± 7.2454e-007	9.1388e-007 ± 5.6455e-008	12.2710 ± 3.2368

Brasil