Acessibilidade / Reportar erro

Spectral projected gradient method for the procrustes problem

Abstracts

We study and analyze a nonmonotone globally convergent method for minimization onclosed sets. This method is based on the ideas from trust-region and Levenberg-Marquardt methods. Thus, the subproblems consists in minimizing a quadratic model of the objective function subject to a given constraint set. We incorporate concepts of bidiagonalization and calculation of the SVD "with inaccuracy" to improve the performance of the algorithm, since the solution of the subproblem by traditional techniques, which is required in each iteration, is computationally expensive. Other feasible methods are mentioned,including a curvilinear search algorithm and a minimization along geodesics algorithm. Finally, we illustrate the numerical performance of the methods when applied to the Orthogonal Procrustes Problem.

orthogonality constraints; nonmonotone algorithm; Orthogonal Procrustes Problem; Spectral Projected Gradient Method


Estudamos e analisamos um método globalmente convergente e não monótono para minimização em conjuntos fechados. Este método está baseado nas ideias dos métodos de região de confiança e Levenberg-Marquardt. Dessa maneira, os subproblemas consistem em minimizar um modelo quadrático da função objetivo sujeito a um conjunto de restrições. Incorporamos conceitos de bidiagonalização e de cálculo da SVD de maneira "inexata" buscando melhorar o desempenho do algoritmo, visto que a solução do subproblema por técnicas tradicionais, necessária em cada iteração, é computacionalmente muito cara. Outros métodos viáveis são citados, entre eles um método de busca curvilinear e um de minimização ao longo de geodésicas. Finalmente, ilustramos o desempenho numérico dos métodos quando aplicados ao Problema de Procrustes Ortogonal.

restrições de ortogonalidade; algoritmo não monótono; Método do Gradiente Projetado Espectral; Problema de Procrustes Ortogonal


Spectral projected gradient method for the procrustes problem

J.B. FranciscoI; T. MartiniII,* * Corresponding author: Tiara Martini.

IDepartment of Mathematics, Federal University of Santa Catarina, 88040-900 Florianópolis, SC, Brazil, grants 479729/2011-5. E-mail: juliano@mtm.ufsc.br

IIApplied Mathematics Department, Mathematics, Statistics, and Cientific Computational Institute, State University of Campinas, 13083-970 Campinas, SP, Brazil. This author was supported by CAPES-PROF, Brazil. E-mail: tiaramartini@gmail.com

ABSTRACT

We study and analyze a nonmonotone globally convergent method for minimization onclosed sets. This method is based on the ideas from trust-region and Levenberg-Marquardt methods. Thus, the subproblems consists in minimizing a quadratic model of the objective function subject to a given constraint set. We incorporate concepts of bidiagonalization and calculation of the SVD "with inaccuracy" to improve the performance of the algorithm, since the solution of the subproblem by traditional techniques, which is required in each iteration, is computationally expensive. Other feasible methods are mentioned,including a curvilinear search algorithm and a minimization along geodesics algorithm. Finally, we illustrate the numerical performance of the methods when applied to the Orthogonal Procrustes Problem.

Keywords: orthogonality constraints, nonmonotone algorithm, Orthogonal Procrustes Problem, Spectral Projected Gradient Method.

RESUMO

Estudamos e analisamos um método globalmente convergente e não monótono para minimização em conjuntos fechados. Este método está baseado nas ideias dos métodos de região de confiança e Levenberg-Marquardt. Dessa maneira, os subproblemas consistem em minimizar um modelo quadrático da função objetivo sujeito a um conjunto de restrições. Incorporamos conceitos de bidiagonalização e de cálculo da SVD de maneira "inexata" buscando melhorar o desempenho do algoritmo, visto que a solução do subproblema por técnicas tradicionais, necessária em cada iteração, é computacionalmente muito cara. Outros métodos viáveis são citados, entre eles um método de busca curvilinear e um de minimização ao longo de geodésicas. Finalmente, ilustramos o desempenho numérico dos métodos quando aplicados ao Problema de Procrustes Ortogonal.

Palavras-chave: restrições de ortogonalidade, algoritmo não monótono, Método do Gradiente Projetado Espectral, Problema de Procrustes Ortogonal.

1 INTRODUCTION

In this paper, we consider the Weighted Orthogonal Procrustes Problem (WOPP). Then, given Xm×n, Ap×m, Bp×q and Cn×q, we address the following constrained optimization problem:

We also consider the problem when the objective function of (1) is replaced by , called here as Orthogonal Procrustes Problem (OPP).

The Procrustes Problem belongs to the class of minimization problems restricted to the set of matrices with orthonormal columns, which often appear in important classes of optimization problems and, as well as the linear and quadratic constraints, has a special structure to be explored. As an example, we mention applications in body rigid movements [23], psychometry [17], multidimensional scaling [9] and Global Positioning System [4].

However, compute its solution is a costly task. In most cases, a solution is possible only computationally through an iterative method capable of dealing with non-stationary points and convex constraints, generating a sequence of feasible points converging to a local minimizer. Generate this feasible sequence carries a high computational cost, because it involves matrix reorthogonalization or generate points along geodesics. In the first case, some decomposition of the matrix, usually Singular Value Decomposition (SVD), is needed, and the second one is related to exponential of matrix or solving partial differential equations. Moreover, generally, these problems are classified as NP-hard. Recently it was proposed a free SVD and geodesic method that is very promising in terms of reduced computational time [24]. In the algorithms section we will talk more about these methods.

The aim of this study is to numerically compare the performance of these methods and improve the performance of the algorithm proposed in [13]. For this purpose we incorporate an alternative approach for calculating the SVD and a bidiagonalization strategy. With the first, our goal is to reduce the computational effort involved in this process, and with the second, we aim to solve a smaller problem, equivalent to the original.

This paper is organized as follows. In Section 2 we present the optimality conditions of the Procrustes Problem. In Section 3 we comment the methods studied. Section 4 is dedicated to the numerical results. Finally, the conclusions are commented in the last section.

Notation: The gradient of f will be denoted by g and the Hessian matrix by H, that is, g(X)ij = and H(X)ij, kl = . Given An×n, tr(A) means the trace of matrix A, and R(A) denote the column space of A.

2 THE ORTHOGONAL PROCRUSTES PROBLEM

The Procrustes Problem aims at analyzing the process of preserving a transformation to a set of forms. This process involves rotation and scaling in a certain data set, in order to move them to a common frame of reference where they can be compared. The known data are arranged in arrays and the rotation transformation to be determined is a matrix with orthonormal columns.

According to Elden & Park [11], the OPP problem with m = n is said to be balanced and the case where n < m is called unbalanced. For the balanced case, we have a closed formula for the solution, namely X* = U VT, where AT B = U ΣVT is the reduced SVD decomposition of AT B. For unbalanced OPP and the WOPP cases, the solution is found through iterative methods, and therefore is not necessarily the global minimum.

Let f : m×n defined as f(X) = and h :m×nn×n, m > n, defined by h(X) = XT X - I, and consider Ω = {Xm×n : h(X) = 0}.

Note that f is a function of class C2 and g(X) = AT(AX C - B)CTm×n. Furthermore, as the Frobenius norm || X ||F = , ∀X ∈ Ω, it follows that Ω is compact and g is Lipschitz continuous at Ω.

The next theorem provides equivalent conditions for a matrix X ∈ Ω to be a stationary point of the problem (1), which greatly simplifies the verification of stationary point.

Theorem 1 Let X ∈ Ω. Then X satisfies the KKT Conditions of (1) if, and only if, g(X) ∈ R(X) and XTg(X) is a symmetric matrix.

Proof. Note that the Lagrangian function of (1) is

l(X, Θ) = f(X) + tr(Θ(XT X - I)),

where Θ ∈

n×n contains the Lagrange multipliers of the problem. It follows that the KKT Conditions of (1) are:

Assume that X satisfies (2). Then g(X) = -X(Θ + ΘT), i.e., g(X) ∈ R(X). Furthermore, as XXT is the projection matrix on R(X), so XTg(X) = -(Θ + ΘT). Therefore, XT g(X) is symmetric.

Now, considers that g(X) ∈ R(X) and XT g(X) is symmetric. Define Λ = , so g(X) + X(Λ + ΛT) = 0, i.e., X satisfies (2).

3 ALGORITHMS

The current section presents a special case of minimization in closed sets method proposed in [13] and briefly describe the methods of minimization along geodesic [1] and curvilinear search [24].

3.1 Spectral Projected Gradient Method

The algorithm of [13] is based on trust region methods [20] and on the ideas of the Levenberg-Marquardt method [21], generating new iterates through the minimization of a quadratic model of the function around the current iterate. We will apply a particular case of this method to solve the problem (1).

Consider the constants 0 < σmin < σmax < ∞, Lu > Lf1 1 Lf is the Lipschitz constant associated with the gradient of f , for each k = 0, 1, 2 ..., choose ∈ [σmin, σmax], and set

where, ρ > 0 is a regularization parameter.

Thus, the quadratic model : m×n is given by:

For the parameter let us choose, whenever possible, the Barzilai-Borwein parameter [3]. Thus, for two consecutive iterates, Xk - 1 and Xk, it follows that

and define

With these choices, the algorithm studied is a variation of the Nonmonotone Spectral Projected Gradient Method [5] for minimization with orthogonal constraints, and the solution of the subproblem

is X* = UVT, where UΣVT is the reduced SVD decomposition of the matrix

We mention that the nonmonotonic line search is based on that devised in [15] and is presented in the following definition.

Definition 1 Let

a sequence defined by xk + 1 = xk + tkdk, dk≠ 0, tk∈ (0,1), γ ∈ (0,1), M + e m(0) = 0. For each k > 1, choose m(k) recursively by 0< m(k)<min{m(k - 1) + 1, M}, and set
= max{f(xk - j) : 0 < j < m(k)}. We say that xk + 1 satisfies Nonmonotone Armijo condition if f(xk + 1)< + γtkf(xk)Tdk.

Thus, we can now establish the algorithm for the problem (1).

The following theorem ensures the global convergence of the iterates sequence to stationary points and its proof can be found in [13].

Theorem 2 Let X be an accumulation point of the sequence generated by Algorithm 1. Then X is a stationary point of the problem (1), which is not a point of local maximum. In addition, if the set of stationary points of the problem is finite, then the sequence converges.


3.2 Minimization method through geodesic

The minimization method through geodesic [1] is a modification of Newton's method for minimization in Stiefel and Grassmann manifolds that takes advantage of those geometries. In problems of unconstrained minimization, Newton's method consists of generating new iterates by subtracting of the current iteration a multiple of the vector H-1f, where H is the Hessian matrix of the function f evaluated on the current iterate. Arias, Edelman and Smith [1] replaced this subtraction restricting the function along a geodesic path in the manifold. The gradient remained as usual (tangent to the surface of restrictions), while the Hessian matrix is obtained from the second differential of the function restricted to geodesic.

Let FXX1, Δ2) = Σij, kl H(X)ij, kl1)ij 2)kl, where Δ1, Δ2m×n, and FXX(Δ) the single tangent vector which satisfies

FXX (Δ, Z) = 〈FXX (Δ), Z〉, ∀ Zm×n.

We present bellow, in general terms, the algorithm of [1], to be referred throughout this paper by sg_min. For more details see [1, 10].

In the experiments presented ahead we use the dogleg option of the package sg_min2 2 The solver used is available in http://web.mit.edu/~ripper/www/sgmin.html (accessed in 29/01/2012) in order to adjust the parameter of the linear search along the geodesic. In this case, instead of t = 1, we obtain a parameter tk in (5) such that f(X(tk)) < f(X).

3.3 Curvilinear search method

The method developed by [24] is a curvilinear search method over a manifold, which constructs a sequence of feasible iterates Xk as it follows: for each k, set

and

The step of the linear search under the curve Y is defined by the Barzilai-Borwein parameter [3]. To ensure global convergence of the method, Wen and Yin [24] adopted the strategy of nonmonotony of Zhang & Hager [26].

In the experiments we used the solver3 3 The solver used is available in http://optman.blogs.rice.edu/ (accessed in 29/01/2012) based on Algorithm 2 of [24]. Below we present the algorithm that, throughout this paper, will be referred OptStiefel.



4 COMPUTATIONAL EXPERIMENTS

Below, we present each problem and we comment the obtained results when comparing the algorithms.

4.1 Numerical experiments and problems

The experiments were performed using Matlab 7.10 on a intel CORE i3-2310M processor, CPU 2.10 GHz with 400 Gb of HD and 4 Gb of Ram. The following values for the constants of Algorithm 1 are used: β1 = 10-4, ξ1 = ξ2 = 5, σmin = 10-10, Lu = Lf, σmax = Lu and M = 7. It is worth emphasizing that the above parameters were selected after a sequence of tests. This selection was focused in the parameters that, on average, showed the best results. For the others methods we used the default parameters presents in the authors' implementations solvers.

We assume convergence when is symmetric and

limiting the execution time (TMAX) into 600 seconds.

For the numerical experiments we consider n = q, p = m, A = PSRT and C = QΛ QT, with P and R random orthogonal matrices, Qn×n Householder matrix, Λ ∈ n×n diagonal with elements uniformly distributed in the interval [, 2] and S diagonal defined for each type of problem. As a starting point X0 we generate a random matrix satisfying X0∈ Ω. All random values were generated with the randn function of Matlab.

Against the exact solution of the problem, we create a known solution Q*m×n (a randon Householder matrix) by taking B = AQ*C, and thus to monitor the behavior of iterates Xk.

The tested problems were taken from [25] and are described below.

Problem 1. The diagonal elements of S are generated by a normal distribution in the interval [10, 12].

Problem 2. The diagonal of S is given by Sii = i + 2ri, where ri are random numbers uniformly distributed in the interval [0, 1].

Problem 3. Each diagonal element of S is generated as follows: Sii = 1 + + 2ri, with ri uniformly distributed in the interval [0, 1].

In the following tables, we denote the number of iterations by Ni, the number of function evaluations by Ne, the residual , the norm of the error in the solution and the time (in seconds). For these values we show the minimal (min), maximum (max) end the average (mean) of the results achieved from ten simulations.

First of all, we will test the efficiency of the nonmonotone line search. Note that the matrix A in Problem 1, that contains very near singular values, is well conditioned. Therefore, the use of nonmonotony can not present a significant reduction in the number of linear search, as seen in Table 1. For matrices of Problems 2 and 3, the singular values of A become more distant from each other as m increases, leaving the matrix ill-conditioned. For these problems, the strategy of nonmonotony shows its potential as can be seen in Table 1 and Figure 1.


Table 2 presents the results achieved for the problems. We can see that for well conditioned problems, PGST and OptStiefel algorithms showed similar performance, but in the presence of ill conditioning, the OptStiefel method shows a significant reduction from required values for convergence in comparison with other methods. In all cases examined, the method sg_min had a performance inferior to the others, and, in some situations, reached the maximum time allowed.

4.1.1 Using the Inexact SVD

Seeking to reduce the computational effort required to calculate the SVD in Algorithm 1, we propose to find this decomposition by an alternative algorithm [7], which builds recursively the SVD through calculation of the left and right dominant singular subspaces of dimension k, (k < n). This technique is appropriate for matrices where the number of rows is much larger than the number of columns.

The process begins with the QR decomposition of Ak, the matrix formed by the first k columns of A, which will be written in the form AkVk = QkRk, with Vk = Ik. Then we proceed to update this decomposition, excluding the singular vector associated with the smallest singular value of the matrix of order m × (k + i) considered and adding the next column of A to be analyzed. Thus, all columns of A are considered, generating only the singular vectors associated with the largest singular values of A. For more information about the method, including error analysis, see [7].

Table 3 displays the time, in seconds, spent by Algorithm 1 in its original form and when it uses the alternative SVD, the latter is referred to SISVD. For the cases q = 5 and q = 10 there is not a significant reduction, because the proportion between m and q is not enough to take advantage of the properties of the method. However, for q = 1 we notice an improvement of 44%, on average, in performance.

4.1.2 Using Golub-Kahan Bidiagonalization

With the purpose to solve large problems, we consider the problem OPP with q = 1 and incorporate to Algorithm 1 the process of Golub-Kahan bidiagonalization presented in [22]. This process is based on the Lanczos algorithm for tridiagonalizing a symmetric matrix [19] and in Golub and Kahan bidiagonalization process [14]. In this case, we project the problem into a subspace, obtaining an equivalent problem, but with smaller dimension.

The Lanczos method consists of generating, from a symmetric matrix An×n and a vector bn, sequences of vectors {υk} and of scalars αk} and {βk}, such that A is reduced to tridiagonal form, that is, VTAV = T with Tn×n tridiagonal and Vn×n, satisfying VT V = I. In turn, the Golub-Kahan bidiagonalization process applies the Lanczos process at a particular symmetrical system, namely

where λ ∈ .

So, after k steps of the Golub-Kahan algorithm we have AVk = Uk + 1Bk and AT Uk + 1 = VkBk + αk + 1υk + 1, where Bkk + 1 × k is bidiagonal, Uk = [u1, u2, ..., uk] ∈ m×k and Vk = [υ1, υ2, ..., υk] ∈ n×k, satisfies Uk + 1 = I and Vk = I.

Setting x = Vky, we have ||Ax - b||F = ||Bky - b||F, with y = xk. Moreover, yT y = 1 if, and only if, xT x = 1. Thus,

where is the ith-column of Vk, is equivalent to

Then we apply the Algorithm 1 to problem (7), whose dimension is smaller than the size of the original problem. We will do the tests for matrices of Problem 1. The tests for the problems 2 and 3 were omitted, once the bidiagonalization process has not reduced the size of these problems, due to the fact that they are ill-conditioned.

The Algorithm 1 with the bidiagonalization process will be called SGKB. In Table 4 we expose the results obtained by applying SGKB, where the DSP is the size of the projected subspace, given by the number of iterations of the bidiagonalization process. The SGKB method showed promising results, since few steps of the bidiagonalization process were necessary and, therefore, the reduction in the size of the problem was significant. Additionally, the method quickly converges to a point very close to the solution.

5 CONCLUSIONS

We present a method to solve the Orthogonal Procrustes Problem with global convergence to stationary points. In order to verify the theoretical results and analyze the performance of the presented algorithm, we apply it in some particular cases of WOPP. The results were promising for the cases analyzed, in terms of reduced computational time, as compared with that obtained by usual methods. Therefore, the algorithm studied can be considered competitive when compared to methods already consolidated.

At the algorithm we incorporated bidiagonalization ideas and calculating the SVD in a alternatively way, which represented a significant improvement in performance when used for least-squares problems, or problems in which the number of rows of the matrix is much greater than the number of columns, respectively.

Additionally, we finalize this work by saying that the same approach here presented can be employed to minimize functionals over the symmetric matrices set. In that case, the ideas given in [16] for symmetric Procrustes Problems can be used to solve the subproblems that appear at Step 2 of Algorithm 1.

ACKNOWLEDGMENTS

The authors would like to thanks the anonymous referees for their carefully reading of the manuscript and for several constructive comments that improve significatively the presentation of this work.

Received on November 30, 2013

Accepted on April 20, 2014

  • [1] T.A. Arias, A. Edelman & S.T. Smith. "The geometry of algorithms with ortogonality constraints". SIAM Journal on Matrix Analysis and Applications, 20(2) (1998), 303-353.
  • [2] A. Aspremont, L.E. Ghaoui, M. Jordan & G.R. Lanckriet. "A direct formulation for sparse pca using semidefinite programming". SIAM Review, 49 (2007), 434-448.
  • [3] J. Barzilai & J.M. Borwein. "Two point step size gradient methods". IMA Journal of Numerical Analysis, 8 (1988), 141-148.
  • [4] T. Bell. "Global Positioning System-Based Attitude Determination and the Orthogonal Procrustes Problem". Journal of guidance, control and dynamics, 26(5) (2003), 820-822.
  • [5] E.G. Birgin, J.M. Martínez & M. Raydan. "Nonmonotone spectral projected gradient methods on convex sets". SIAM Journal on Optimization, 10 (2000), 1196-1211.
  • [6] A.W. Bojanczyk & A. Lutoborski. "The Procrustes Problem for Orthogonal Stiefel Matrices". SIAM Journal on Scientific Computing, 21 (1998), 1291-1304.
  • [7] Y. Chahlaoui, K. Gallivan & P. Van Dooren. "Recursive calculation of dominant singular subspaces". SIAM Journal on Matrix Analysis and Applications, 25(2) (1999), 445-463.
  • [8] W.J. Cook, W.H. Cunningham, W.R. Pulleyblank & A. Schrijver. "Combinatorial optimization". John Wiley & Sons, Canadá, (1997).
  • [9] T.F. Cox & M.A.A. Cox. "Multidimensional scaling". Chapman and Hall, Londres, (2000).
  • [10] A. Edelman & R. Lippert. "Nonlinear Eigenvalue Problems with Orthogonality Constraints". In Templates for the Solution of Algebraic Eigenvalue Problems, A Practical Guide, pp. 290-314, Philadelphia, (2000).
  • [11] L. Eldén & H. Park. "A procrustes problem on Stiefel manifolds". Numerishe Mathematik, 82 (1999), 599-619.
  • [12] J.B. Francisco, J.M. Martínez & L. Martínez. "Globally convergent trust-region methods for self-consistent fiel electronic structure calculations". Journal of Chemical Physics, 121 (2004), 10863-10878.
  • [13] J.B. Francisco & F.S. Viloche Bazán. "Nonmonotone algorithm for minimization on closed sets with application to minimization on Stiefel manifolds". Journal of Computational and Applied Mathematics, 236 (10) (2012), 2717-2727.
  • [14] G.H. Golub & W. Kahan. "Calculating the singular values and pseudoinverse of a matrix". SIAM Journal on Numerical Analysis, 2 (1965), 205-224.
  • [15] L. Grippo, F. Lampariello & S. Lucidi. "A nonmonotone line search technique for Newton's method". SIAM Journal on Numerical Analysis, 23(4) (1986), 707-716.
  • [16] N.J. Higham. "The symmetric procrustes problem". BIT Numerical Mathematics, 28(1) (1988), 133-143.
  • [17] J.R. Hurley & R.B. Cattell. "The procrustes program: Producing direct rotation to test a hypothesized factor structure". Behavioral Science, 7 (1962), 258-262.
  • [18] E. Klerk, M. Laurent & P.A. Parrilo. "A PTAS for the minimization of polynomials of fixed degree over the simplex". Theoretical Computer Science, pp. 210-225, (2006).
  • [19] C. Lanczos. "An iteration method for the solution of the eigenvalue problem of linear differential and integral operators". Journal of Research of the National Bureau of Standards, 45(4) (1950), 255-282.
  • [20] J.M. Martínez & S.A. Santos. "A trust region strategy for minimization on arbitrary domains". Mathematical Programming, 68 (1995), 267-302.
  • [21] J. Nocedal & S.J. Wright. "Numerical Optimization". Springer, New York, (2006).
  • [22] C.C. Paige & M.A. Saunders. "LSQR: An Algorithm for Sparse Linear Equations and Sparse Least Squares". ACM Transactions on Mathematical Software, 8(1) (1982), 43-71.
  • [23] I. Söderkvist & P. Wedin. "On Condition Numbers and Algorithms for Determining a Rigid Body Movement". BIT, 34 (1994), 424-436.
  • [24] Z. Wen & W. Yin. "A feasible method for optimization with orthogonality constraints". Optimization Online, pp. 1-30, (2010).
  • [25] Z. Zhang & K. Du. "Successive projection method for solving the unbalanced procrustes problem". Science in China: Series A Mathematics, 49(7) (2006), 971-986.
  • [26] H. Zhang & W.W. Hager. "A nonmonotone line search technique and its application to unconstrained optimization". SIAM Journal Optimization, 14(4) (2004), 1043-1056.
  • *
    Corresponding author: Tiara Martini.
  • 1
    Lf is the Lipschitz constant associated with the gradient of f
  • 2
    The solver used is available in
  • 3
    The solver used is available in
    http://optman.blogs.rice.edu/ (accessed in 29/01/2012)
  • Publication Dates

    • Publication in this collection
      10 June 2014
    • Date of issue
      Apr 2014

    History

    • Accepted
      20 Apr 2014
    • Received
      30 Nov 2013
    Sociedade Brasileira de Matemática Aplicada e Computacional Rua Maestro João Seppe, nº. 900, 16º. andar - Sala 163 , 13561-120 São Carlos - SP, Tel. / Fax: (55 16) 3412-9752 - São Carlos - SP - Brazil
    E-mail: sbmac@sbmac.org.br