Block linear method for large scale Sylvester equations

Monsalve, Marlliny

Abstract

We present and analyze a new iterative scheme for large-scale solution of the well-known Sylvester equation. The proposed scheme is based on fixed point iteration approach and can make good use of the recently developed methods for solving block linear systems. It is shown mathematically that the iterative process converges under some assumptions on the coefficient matrices. Results on our numerical experiments with large-scale matrices are quite encouraging. In particular, the method compares favorably with the other block methods and a recently proposed method for Sylvester equation based on low-rank approximation of the right hand side matrix C.

Sylvester equation; block linear systems; iterative methods

Block linear method for large scale Sylvester equations

Marlliny Monsalve

Scientific Computing Center, Departamento de Computación, Facultad de Ciencias Universidad Central de Venezuela, Ap. 47002, Caracas 1041-A, Venezuela. E-mail: mmonsalv@kuaimare.ciens.ucv.ve

ABSTRACT

We present and analyze a new iterative scheme for large-scale solution of the well-known Sylvester equation. The proposed scheme is based on fixed point iteration approach and can make good use of the recently developed methods for solving block linear systems. It is shown mathematically that the iterative process converges under some assumptions on the coefficient matrices. Results on our numerical experiments with large-scale matrices are quite encouraging. In particular, the method compares favorably with the other block methods and a recently proposed method for Sylvester equation based on low-rank approximation of the right hand side matrix C.

Mathematical subject classification: 65J10, 65F10, 15A24.

Key words: Sylvester equation, block linear systems, iterative methods.

1 Introduction

In this paper, we present a new block algorithm for solving the Sylvester matrix equation (SE):

where A, B, and C are given matrices of dimensions n, p, and n × p, respectively and the matrix X of dimension n × p needs to be found.

It is well-known (see [6] and the original paper of Sylvester [16]) that the equation (1) has a unique solution if and only if A and B do not have an eigenvalue in common. The necessity of solving this equation arises in a wide variety of practical applications, including control systems design and analysis [4, 6], numerical solutions of differential equations, including boundary value problems [2, 8].

Because of its importance, the problem has been well studied in the literature and there now exist many methods for its solutions. An account of these methods can be found in the recent book by Datta [6]. In theory, the problem can be reduced to a linear system problem whose system matrix is a Kronecker product matrix of large dimension. The Kronecker-product approach is not numerically viable even for a small and dense system. For details see again the book by Datta [6]. The best-known and very widely used numerical method for small and dense problems is the Hessenberg-Schur method by Golub, Nash, and Van Loan [9]. The method is based on reduction of the largest of two matrices to Hessenberg form and the other to real Schur form. The Hessenberg-Schur method is an efficient implementation of the Bartels-Stewart method [1] proposed earlier, based on the reductions of both matrices to Schur forms. Unfortunately, these methods are not practical for large and sparse problems.

For large-scale Sylvester equations, several Krylov subspace methods have been recently proposed [5, 6, 7, 10, 11, 12, 14]. The Krylov methods are basically projection methods. The idea behind these methods is to first project the large problem into a smaller one by constructing an orthonormal basis of the Krylov subspace, then solve the smaller projected problem using a standard technique such as the Hessenberg-Schur method, and finally recover the solution of the original large problem from the solution of the smaller problem. In most cases, the smaller projected problem itself is a Sylvester equation. However, the method proposed in [11] by Hu and Reichel is different in the sense that their idea is to project the associated Kronecker system rather than the Sylvester equation itself. In the recent Ph.D Thesis [14], Peng has proposed two new Krylov subspace methods, one is divide-and-conquer type and the other is based on low-rank approximation of the right-hand matrix C. Yet another type of iterative methods which are neither Krylov methods nor based on solution of the Kronecker product systems have been proposed by several authors. For details see Miller [13] and the references therein. These iterative methods seem to be more of theoretical interests and are not practical for large problems.

In this paper, we propose a new iterative scheme based on fixed point iteration. The scheme requires solution of a linear systems with multiple right-hand sides at each iteration, which can be solved by using block Krylov subspace methods for linear systems (see Brezinski [3], Saad [15]). Our method method does not require solution of a low-dimensional Sylvester equation at every iteration. The results of our numerical experiments show that the proposed method works quite well for large-scale problems and is quite competitive with the other existing competing methods. Mathematical results on the convergence of our iterative scheme is also provided in the paper.

2 Methods based on SE and block linear systems

The main idea to solve the SE is to write this equation as a block linear system and then use some suitable iterative scheme. This can be accomplished by the following change of variable: AX = Z where Z = C + XB. This possibility generates the iterative method:

Notice that, the method requires the solution of one block linear system and one matrix-matrix product per iteration. The matrix of coefficients of the internal system is of dimension n, therefore from a computational-cost point of view if n < p it is convenient to use Algorithm 1 as described, but if n > p it is convenient to transpose the Sylvester equation, and then apply Algorithm 1. To be precise, when n > p solving -B^TX^T + X^TA^T = C^T, instead of (1), should be preferred. This guarantees that the matrix of coefficients of the internal system is of dimension p. On the other hand, the norm of the matrices A and B could also help to decide whether it is convenient to solve (1) or its transpose, as discussed below after Theorem 2.1.

Our next result establishes convergence under an additional hypothesis on the matrices A and B. This additional hypothesis is sufficient to guarantee also the existence of a unique solution, and as far as we know, it is the classical hypothesis assumed when dealing with iterative schemes that are not related to Krylov subspace methods, see [13].

Theorem 2.1. Let A Î ^n×n, B Î ^p×p, C Î ^n×p and let ||.|| be an induced norm. If ||A^-1|| ||B|| < 1 then equation (1) has a unique solution, and the sequence {X_k}_k_{> 0} generated by Algorithm 1 converges q-linearly to the solution.

Proof. Let l_i(A) be the eigenvalues of A for 1 < i < n and let l_j(B) be the eigenvalues of B for 1 < j < p. Since ||A|| > |l_i(A)| for any i and for any induced norm we have ||A^-1|| > max_{1 < i < n}|l_i(A^-1)| and thus,

On the other hand, ||A^-1|| ||B|| < 1 Þ ||B|| < , now using (2) we obtain,

From (3) we conclude, that equation (1) has a unique solution since the spectra of A and B are disjoint.

For the convergence we proceed as follows. From Algorithm 1 we obtain that X_k = A^-1(C + X_k_-1B) and from SE we know that X = A^-1(C + XB). Combining these two equations it follows that

and so,

X X_k = (A^-1)^k(X X₀)(B)^k.

Therefore

||X X_k|| < (||A^-1|| ||B||)^k ||(X X₀)||.

Since ||A^-1|| ||B|| < 1 then the sequence X_k converges to X when k tends to infinity. On the other hand, let E_k = X X_k the absolute error, and taking norm on equation (4) we have that

||E_k|| < ||A^-1|| ||B|| ||E_k_-1||,

since ||A^-1|| ||B|| < 1 then the sequence X_k converges q-linearly to the the solution.

Notice that if no information is available about ||A^-1|| or ||B^-1|| then ||A|| and ||B|| could be of help to decide whether it is convenient to solve (1) or its transpose to increase the possibility of convergence. For example, if ||A|| is much larger than ||B||, then the condition ||A^-1|| ||B|| < 1 is likely to be satisfied for convergence when solving (1). Similarly, the transpose approach should be preferred when ||B|| is much larger than ||A||. Notice also that, if we want to solve SE with B = A (or B = A^T), then Theorem 2.1 does not guarantee convergence, since ||A|| ||A^-1|| = cond(A) > 1. Unfortunately, this includes the well-know Lyapunov equation.

For the proposed algorithm, the residual matrix at iteration k is defined as R_k = C (AX_k X_kB). This expression involves a high computational cost, since it require two matrix-matrix products per iteration. The following result provides an equivalent and less expensive way to calculate the residual.

Theorem 2.2. Let {X_k}_k_{> 0} be the sequence generated by Algorithm 1. The following expressions for the residual matrix are equivalent:

a) R_k = C (AX_k X_kB) = C + X_kB AX_k.

b) R_k = Z_k Z_k_-1.

c) R_k = (X_k X_k_-1)B.

Proof. The residual is given by R_k = C + X_kB AX_k and from Algorithm 1 we have that Z_k = C + X_kB and AX_k = Z_k_-1. Therefore

On the other hand, substituting Z_k from Algorithm 1 in (5) we obtain R_k = C + X_kB (C + X_k_-1B). Therefore

Equation (5) provides a less expensive form to calculate the residual since it does not involve matrix-matrix products. Finally from (6) we obtain the following corollary which yields a more convenient stopping criterion.

Corollary 2.1. Let {X_k}_k_{> 0} be the sequence generated by Algorithm 1. If ||X_k X_k_-1|| ® 0, then ||R_k|| ® 0.

Finally, the following result establishes a bound that relates the norm of the residual and the norm of the error.

Theorem 2.3. Let {X_k}_k_{> 0} be the sequence generated by Algorithm 1. The norm of the residual matrix satisfies that:

Proof. R_k = C (AX_k X_kB) = AX XB AX_k + X_kB = A(X X_k) (X X_k)B = AE_k E_kB. Therefore

||R_k|| = ||AE_k E_kB|| < (||A|| + ||B||)||E_k||

3 Numerical results

In this section we present preliminary numerical experiments to illustrate the performance of the new proposed algorithm. For solving the internal block linear systems for this algorithm we use the block SOR scheme (BSOR), the non-Hermitian block steepest descent method (BSD) and the block minimal residual iteration (BMR) fully described in [3]. For the proposed algorithm, we will only present the combination that better results generated in CPU time. For each experiment, we will indicate which method was used to solve the internal block linear system.

We compare the new proposed algorithm with the following methods for solving SE:

The block Arnoldi-Sylvester algorithm (BAS()) and the block GMRES-Sylvester algorithm (BGS()) proposed and fully described in [10]. These methods are based on Krylov subspace methods and solve a small SE per iteration of order p², where is the restart value. These internal SE are solved using the algorithm proposed by Golub, Nash and Van Loan in [9]. The BAS() and BGS() can only be used when n >> p. When n < p we can always transpose SE and then apply the same techniques.
Galerkin method for Sylvester equation (GMSE()) and the restarted minimal projected residual algorithm (RMPR()) proposed and fully described in [11]. These methods, as well as BAS() and BGM(), are based on Krylov subspace methods and solve a small SE per iteration of order ². The internal SE is written as a linear system of equation of order ² that is solved using direct methods, but the operation count for these algorithms is np flops. GMSE() and RMPR() can be used for any values of n and p.
Arnoldi method for low-rank approximate solution to the Sylvester equation (LRASE(k,)) proposed and fully described in [14]. The parameter k is a small value required by the algorithm, and is the restart value. This method can be used for any values of n and p. In all our experiment, we prove for 1 < k < 5 and we report the best result.

In our tests we use matrices from the Harwell-Boeing collection and also some sparse matrices from the Matlab gallery, and for these matrices the inequalities of Theorem 2.1 hold, therefore convergence is guaranteed for the new proposed algorithm. In all cases, the entries of the solution matrix X are x_ij = f(x_i, y_j) where: x_i = ih_n, y_j = jh_p for 0 < i < n, 0 < j < p, h_n = 1/(n + 1), h_p = 1/(p + 1) and f(x,y) = xe^xysin(px)sin(py). This is a typical test function that appears in several works, see e.g. [11, 10]. We stop the process when

||X X_k|| = ||E_k|| < 10^-8,

or when 2500 external iterations are reached. The initial iterate, X₀, is the null matrix. To stop the iterative methods for the internal block linear systems we use the norm of the residual. To be precise, we stop the internal iterations of Algorithm 1 when ||AX_k₊₁ Z_k|| < 10^-2. The experiments were run in a Pentium IV computer at 3.4 GHz with 2 GB of RAM memory, using Matlab 7.0. In examples 3.1-3.4 we compare the performance of Algorithm 1 with other methods. In these examples we report number of iterations (Iter), CPU time, norm of the residual (Residual) and norm of the absolute error (Error). In example 3.5 we compare the different equivalent expressions for computing the residual norm.

We used the following notation for characterizing the different finalization states of the experiments, "-" means that the algorithm accomplished the maximum number of external iterations, "s" implies that the used method produced stagnation problems and "**" means that the memory requirements of the algorithm could not be supported by Matlab.

Example 3.1. In this example n < p and n is a small value. The matrix B is the matrix Hor__131 of the Harwell-Boeing collection with p = 434, and A is a sparse matrix of order n generated with the Matlab function gallery('wathen', nx, ny) where n = 3*nx*ny + 2*nx + 2*ny + 1 which returns a sparse random finite element matrix.

In example 3.1 reported in Table 1 we can see that all methods converge but GMSE and RMPR are not competitive in CPU time. In this case, Algorithm 1 combined with BSOR required less CPU time than other methods, but BAS(), BGS() and LRASE could still be considered as competitive choices. Moreover BGS achieves better accuracy than the other options.

Thumbnail

Example 3.2. In this example, A is the matrix orsirr_2 with n = 886 and B is the matrix sherman1 with p = 1000, both of the Harwell-Boeing collection.

As shown in Table 2, BAS and BGS produce storage problems. In this experiment ||E₀|| = 616.24, and we can observe that GMSE, RMPR and LRASE reduce the absolute error norm. RMPR produces stagnation problems while GMSE and LRASE stopped because the maximum number of external iterations was reached. Finally, Algorithm 1 with BSOR converges but the CPU time required is considerably higher.

Thumbnail

Example 3.3. In this example p < n and p is a small value. The matrix A is the matrix gre__343 of the Harwell-Boeing collection with n = 343, and B is a tridiagonal matrix of order p = 20 given by

with

This matrix has been used by several authors, see [10, 11, 14]. In this experiment we report Algorithm 1 combined with BMR.

In Table 3, we can observe that all methods converge but Algorithm 1 obtained the best performance in CPU time. BAS, GMSE and RMPR are similar in CPU time required to satisfy tolE and the CPU time required by these methods is almost twice the one required by LRASE. On the other hand, BGS converges in one iteration and it reduces the norm of the error more than the others, but the CPU time required by BGS is higher than the one required by the others. It is worth noticing that for this example, we use the values of p₁ and p₂ suggested in [11].

Thumbnail

Example 3.4. In this example, A is the matrix gre_1107 with n = 1107 and B is the matrix fs_760_1 with p = 760, both from the Harwell-Boeing collection.

As shown in Table 4, BAS and BGS produce storage problems. RMPR produces stagnation problems while GMSE and LRASE stopped because the maximum number of external iterations was reached. Algorithm 1 with BSD converges. In this experiment ||E₀|| = 458.5 and we can see that GMSE and RMPR reduced the absolute error norm while LRASE increased it. Finally, we can observe that Algorithm 1 achieves convergence but ||R_k|| is much higher than ||E_k||.

Thumbnail

Example 3.5. In this example we want to compare ||E_k||, with the equivalent expressions for the norm of the residual described in Theorem 2.3. These expressions will be denoted by ||R_k|| = ||C (AX_k X_kB)||, |||| = ||Z_k Z_k_-1|| and |||| = ||(X_k X_k_-1)B||. The matrix A is the matrix fs_680_1 of the Harwell-Boeing collection and B is a sparse matrix with random entries. The dimensions of these matrices are n = 680 and p = 1000 respectively. We use Algorithm 1 combined with BSOR to solve SE.

In Figure 1, we can see that |||| and |||| are close to ||E_k|| during the process, while ||R_k|| is not. In this example ||A|| + ||B|| = 7.19e + 013 and the behavior observed in Figure (1) agrees and illustrates (7). Finally, we can conclude that if the stopping criterion is ||R_k|| < tolE, it is convenient to use |||| instead of ||R_k||, for two important reasons. The first one is computational work: ||R_k|| requires 2 matrix-matrix product, while |||| requires a matrix substraction. The second reason is that |||| is more accurate to measure the precision of the approximation generated by the proposed algorithms.

4 Concluding remarks

We propose a new iterative scheme for solving Sylvester Equations (SE). At each iteration a block linear system of equations is solved and, for that, direct or iterative techniques can be used. This new scheme can be applied regardless of the dimensions of the involved matrices and, in most cases, it requires less computational work than competitors.

We also establish the conditions for which convergence is guaranteed. These conditions are sufficient but not necessary, and therefore strong. Nevertheless, they also guarantee the existence of a unique solution of the SE. Unfortunately, for some important applications these conditions are not satisfied, and our proposed scheme cannot be used. For example, the new scheme cannot be applied for solving the well-known Lyapunov equation.

Finally, we present an equivalent, stable, and inexpensive way of computing the residual matrix. This equivalent formula yields a very efficient stopping criterion, as shown in our numerical experiments.

Acknowledgement. I wish to thank Wujian Peng from Northern Illinois University for providing the Matlab code for LRASE and Prof. Biswa Datta for his helpful comments and recommendations that help the quality of the presentation. I am indebted to two anonymous referees whose comments helped me to improve the quality of this paper.

Received: 15/III/06. Accepted: 16/IV/07.

#658/06.

[1] R.H. Bartels and G.W. Stewart, Algorithm 432, solution of the matrix equation AX XB = C Comm. ACM, 15 (1972), 820-826.
[2] W.G. Bickley and J. McNamee, Matrix and other direct methods for the solution of systems of linear difference equations Philos. Trans Roy. Soc. London Ser. A., 252 (1960), 69-131.
[3] C. Brezinski, Block descent methods and hybrid procedures for linear systems Numerical Algorithms, 29 (2002), 21-32.
[4] B. Datta, Linear and numerical linear algebra in control theory: Some research problems Linear Algebra and its Appl., 198 (1994), 755-790.
[5] B. Datta, Krylov subspace methods for large-scale matrix problems in control Future Generation Computer Systems, 19(7) (2003), 1253-1263.
[6] B. Datta, Numerical Methods for Linear Control Systems Design and Analysis Elsevier Academic Press, New York, 2003.
[7] B. Datta and Y. Saad, Arnoldi methods for large Sylvester-like observer matrix equation, and an associated algorithm for partial spectrum assigment Linear Algebra and its Appl., 156 (1991), 225-244.
[8] A. Dou, Method of undetermined coefficients in linear differential systems and the matrix equation YA AY = F SIAM J. Appl. Math., 14 (1966), 691-696.
[9] G.H. Golub, S. Nash and C. Van Loan, A Hessemberg-Schur method for the problem AX XB = C IEE Trans. Automat. Control, 39 (1979), 167-188.
[10] A. El Guennouni, K. Jbilou and A.J. Riquet, Block Krylov subspace methods for solving large Sylvester equations Numerical Algorithms, 29 (2001), 75-96.
[11] D.Y. Hu and L. Reichel, Krylov-subspace methods for the Sylvester equation Linear Algebra Appl., 172 (1992), 283-313.
[12] I.M. Jaimoukha and E.M. Kasenally, Krylov subspaces methods for solving large Lyapunov equations SIAM J. Numerical Anal., 31 (1994), 227-251.
[13] D.F. Miller, The iterative solution of the matrix equation XA + BX + C = 0 Linear Algebra Appl., 105 (1988), 131-137.
[14] Wujian Peng, On the Krylov subspace solutions of matrix equations in control theory PhD thesis, Northern Illinois University, 2004.
[15] Y. Saad, Iterative Methods for Sparse Linear Systems International Thompson Publishing Co., London, England, 1996.
[16] J.J. Sylvester, Sur l'equation en matrices px = xq' C.R. Acad. Sci Paris, 99 (1884), 67-71 : 115-116.

Publication Dates

Publication in this collection
02 Apr 2008
Date of issue
2008

History

Accepted
16 Apr 2007
Received
15 Mar 2006

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

[1] [1] R.H. Bartels and G.W. Stewart, Algorithm 432, solution of the matrix equation AX XB = C Comm. ACM, 15 (1972), 820-826.

[2] [2] W.G. Bickley and J. McNamee, Matrix and other direct methods for the solution of systems of linear difference equations Philos. Trans Roy. Soc. London Ser. A., 252 (1960), 69-131.

[3] [3] C. Brezinski, Block descent methods and hybrid procedures for linear systems Numerical Algorithms, 29 (2002), 21-32.

[4] [4] B. Datta, Linear and numerical linear algebra in control theory: Some research problems Linear Algebra and its Appl., 198 (1994), 755-790.

[5] [5] B. Datta, Krylov subspace methods for large-scale matrix problems in control Future Generation Computer Systems, 19(7) (2003), 1253-1263.

[6] [6] B. Datta, Numerical Methods for Linear Control Systems Design and Analysis Elsevier Academic Press, New York, 2003.

[7] [7] B. Datta and Y. Saad, Arnoldi methods for large Sylvester-like observer matrix equation, and an associated algorithm for partial spectrum assigment Linear Algebra and its Appl., 156 (1991), 225-244.

[8] [8] A. Dou, Method of undetermined coefficients in linear differential systems and the matrix equation YA AY = F SIAM J. Appl. Math., 14 (1966), 691-696.

[9] [9] G.H. Golub, S. Nash and C. Van Loan, A Hessemberg-Schur method for the problem AX XB = C IEE Trans. Automat. Control, 39 (1979), 167-188.

[10] [10] A. El Guennouni, K. Jbilou and A.J. Riquet, Block Krylov subspace methods for solving large Sylvester equations Numerical Algorithms, 29 (2001), 75-96.

[11] [11] D.Y. Hu and L. Reichel, Krylov-subspace methods for the Sylvester equation Linear Algebra Appl., 172 (1992), 283-313.

[12] [12] I.M. Jaimoukha and E.M. Kasenally, Krylov subspaces methods for solving large Lyapunov equations SIAM J. Numerical Anal., 31 (1994), 227-251.

[13] [13] D.F. Miller, The iterative solution of the matrix equation XA + BX + C = 0 Linear Algebra Appl., 105 (1988), 131-137.

[14] [14] Wujian Peng, On the Krylov subspace solutions of matrix equations in control theory PhD thesis, Northern Illinois University, 2004.

[15] [15] Y. Saad, Iterative Methods for Sparse Linear Systems International Thompson Publishing Co., London, England, 1996.

[16] [16] J.J. Sylvester, Sur l'equation en matrices px = xq' C.R. Acad. Sci Paris, 99 (1884), 67-71 : 115-116.

Brasil

Brasil

Block linear method for large scale Sylvester equations

Abstract

Publication Dates

History