The nearest generalized doubly stochastic matrix to a real matrix with the same firstand second moments

Glunt, William; Hayden, Thomas L.; Reams, Robert

doi:10.1590/S0101-82052008000200005

Abstract

Let T be an arbitrary n × n matrix with real entries. We explicitly find the closest (in Frobenius norm) matrix A to T, where A is n × n with real entries, subject to the condition that A is ''generalized doubly stochastic'' (i.e. Ae = e and eT A = eT, where e = (1,1,...,1)T, although A is not necessarily nonnegative) and A has the same first moment as T (i.e. e1T Ae1 = e1T Te1). We also explicitly find the closest matrix A to T when A is generalized doubly stochastic has the same first moment as T and the same second moment as T (i.e. e1T A²e1 = e1T T²e1), when such a matrix A exists.

doubly stochastic; generalized doubly stochastic; moments; nearest matrix; closest matrix; Frobenius norm

The nearest generalized doubly stochastic matrix to a real matrix with the same firstand second moments

William Glunt^I; Thomas L. Hayden^II; Robert Reams^III

^IDepartment of Mathematics and Computer Science, Austin Peay State University, Clarksville, TN 37044

^IIDepartment of Mathematics, University of Kentucky, Lexington, KY 40506

^IIIDepartment of Mathematics, Virginia Commonwealth University, 1001 West Main Street, Richmond, VA 23284 E-mails: gluntw@apsu.edu / hayden@ms.uky.edu / rbreams@vcu.edu

ABSTRACT

Let T be an arbitrary n × n matrix with real entries. We explicitly find the closest (in Frobenius norm) matrix A to T, where A is n × n with real entries, subject to the condition that A is ''generalized doubly stochastic'' (i.e. Ae = e and e^T A = e^T, where e = (1,1,...,1)^T, although A is not necessarily nonnegative) and A has the same first moment as T (i.e. e₁^T Ae₁ = e₁^T Te₁). We also explicitly find the closest matrix A to T when A is generalized doubly stochastic has the same first moment as T and the same second moment as T (i.e. e₁^TA²e₁ = e₁^T T²e₁), when such a matrix A exists.

Mathematical subject classification: 15A51, 65K05, 90C25.

Key words: doubly stochastic, generalized doubly stochastic, moments, nearest matrix, closest matrix, Frobenius norm.

1 Introduction

Let e

ⁿ be the vector of all ones, i.e. e = (1,1,...,1)^T, and let e_i

ⁿ denote the vector with a 1 in the ith position and zeroes elsewhere. An n × n matrix A with real entries is said to be generalized doubly stochastic if Ae = e and e^TA = e^T. A generalized doubly stochastic matrix does not necessarily have nonnegative entries, unlike a doubly stochastic matrix which has all entries nonnegative. The k\textth moment of A is defined as

A^ke₁. Let T = (t_ij)

ⁿ^×n be an arbitrary matrix which is given. We will say that a matrix A

ⁿ^×n is M_k if

A^ke₁ =

T^ke₁, where k is a positive integer. We use the convention that ||·|| refers to either the Frobenius matrix norm ||·||_F, or the vector 2-norm ||·||₂, with the context determining which is intended. Frequent use is made of the fact that if x, y

ⁿ are unit vectors we can find a Householder matrix Q

^n×n such that Qx = y [5].

We will determine the closest (in Frobenius norm) matrix A to T, subject to the conditions that A is generalized doubly stochastic and has the same first and second moments as T. The motivation for this problem comes from an application in [2] where it is desired to approximate a certain matrix T, where T comes from a linear system corresponding to a large linear network, subject to the approximating matrix satisfying certain conditions. We outlined these applications in [3] and (among other things there) used a computational algorithm to find the closest matrix A to T, subject to A being generalized doubly stochastic and M₁, or subject to A being doubly stochastic and M₁. See the references [2] and [3] for more details about the applications. Our extended work here takes only an analytic approach to the problem, includes the second moment condition, and explicitly finds the closest matrix which is generalized doubly stochastic, M₁ and M₂, having dropped the requirement that the closest matrix be nonnegative. It is worth emphasizing that despite dropping the nonnegativity requirement our solution is still relevant to the original problem. Previous approaches to this problem, in both [3] and [8], did not include the second moment, although they did include the nonnegative condition. A survey of matrix nearness problems and their applications, which include areas of control theory, numerical analysis and statistics, was given by Higham [4], see also [7].

2 The closest generalized doubly stochastic matrix

Although Theorem 1 and Corollary 2 resemble results proved in [3] and [6], the results herein present a different formulation. We include them for clarity and because we will use them later.

Theorem 1. Let A

^n×n and let Q

^n×n be an orthogonal matrix so that Qe_n =

. Then A is generalized doubly stochastic if and only if

for any A₁

⁽ ⁿ ^-1)×( ⁿ ^-1) .

Proof. Ae = e if and only Q^TAQe_n = e_n, and e^TA = e^T if and only if Q^T AQ = .

Theorem 1 enables us to easily incorporate the condition that A is generalized doubly stochastic and in Corollary 2 find the closest such matrix to T.

Corollary 2. Let T

^n×n and

where T₁

⁽ⁿ^-1)×(ⁿ^-1), t₂, t₃

^n-1, t₄

, and where Q

^n×n is as in Theorem 1. Then the generalized doubly stochastic matrix A

^n×n given by

satisfies the inequality ||A - T|| < ||Z - T|| among all generalized doubly stochastic matrices Z

ⁿ^×ⁿ.

Proof. The Frobenius norm is invariant under orthogonal similarity so ||A - T||² = ||A₁ - T₁||² + ||t₂||² + ||t₃||² + (1 - t₄)², and A is minimal among generalized doubly stochastic matrices when A₁ = T₁.

3 The closest generalized doubly stochastic matrix which is M₁

For A

^m×m and B

^n×n, the direct sum of A and B, denoted A

B, is the (m + n) × (m + n) block matrix

.

We now construct an orthogonal matrix Q with an additional desirable property which we need for the first and second moments. Let Q₁

^n×n be an orthogonal matrix such that

for some u

^n-1, β

. Let Q₂

^(n-1)×(n-1) be an orthogonal matrix such that

where α = ||u||, and let Q = Q₁(Q₂ 1). Then Qe_n = Q₁(Q₂ 1)e_n = Q₁e_n = as in Section 2, and we have the additional property that

Note that if u = 0, then = βe_n, so e₁ = βQ₁e_n = β, which is not possible, so we must have α ≠ 0.

The proof of Theorem 3 will use the fact, as stated in the introduction, that if T = (t_ij)

^n×n is the matrix to be approximated then saying A

^n×n is M₁ means

Ae₁ =

Te₁ = t₁₁. This theorem gives the form of a matrix A

^n×n that is both generalized doubly stochastic and M₁.

Theorem 3. Let A

^n×n and let Q

^n×n be an orthogonal matrix such that

where α, β

. Then A

^n×n is generalized doubly stochastic and M₁ if and only if

for any A₁

⁽ⁿ^-2)×(^n-2⁾, a₂, a₃

ⁿ^-2.

Proof. The matrix A is both generalized doubly stochastic and M₁ if andonly if

where

from Theorem 1.

Theorem 1 gives us the means to now find the closest matrix to T which is both generalized doubly stochastic and M₁.

Corollary 4. Let T

^n×n and

where T₁

⁽ⁿ^-2)×(ⁿ^-2), t₂, t₃,t₅, t₇

^n-2, t₄, t₆, t₈, t₉

, and where Q

ⁿ^×ⁿ is an orthogonal matrix such that

where α, β >

. Then the generalized doubly stochastic and M₁ matrix A given by

is the closest matrix to T in the sense that ||A - T|| < ||Z - T|| for all generalized doubly stochastic and M₁ matrices Z

ⁿ^×ⁿ.

Proof. As in the proof of Corollary 2, since the Frobenius norm is invariant under orthogonal similarity, we have that

and A is minimal when A₁ = T₁, a₂ = t₂, and a₃ = t₃.

4 The closest generalized doubly stochastic matrix which is M₁ and M₂

Similar reasoning to that given in Section 3 can be used to find necessary and sufficient conditions for a matrix to be generalized doubly stochastic, M₁ and M₂. However, for finding a nearest point in Corollary 6, a difficulty comes from the fact that the set of M₂ matrices is not a convex set, so we don't necessarily expect for there to be a unique nearest point [1]. Although, if there is such a nearest point then it will be determined by the conditions given in Corollary 6.

Theorem 5. Let A

ⁿ ^× ⁿ and Q

ⁿ ^× ⁿ be an orthogonal matrix such that

where α,β

. Then A

^n×n is generalized doubly stochastic, M₁ and M₂ if and only if

for any A₁

⁽ ⁿ ^-2)×( ⁿ ^-2) and any x, y

ⁿ ^-2 such that

Proof.A is generalized doubly stochastic, M₁ and M₂ if and only if A both satisfies Theorem 3 and satisfies the second moment condition that T²e₁ = A²e₁. However,

and then substituting

T²e₁ and rearranging gives the result.

Preparing for the proof of Corollary 6, and arguing similarly to the preceding corollaries, since the Frobenius norm is invariant under orthogonal similarity we calculate that

Now A is minimal among generalized doubly stochastic, M₁ and M₂ matrices when A₁ = T₁, and when x and y are such that ||x-t₂||² + ||y-t₃||² is minimized subject to the constraint

Thus we have our final corollary.

Corollary 6. Let T

ⁿ ^× ⁿ and

where T₁

⁽ⁿ^-2)×(ⁿ^-2),t₂, t₃, t₅, t₇

ⁿ^-2, t₄, t₆, t₈, t₉

, and where Q

ⁿ^×ⁿ is an orthogonal matrix such that

where α, β

. Then the generalized doubly stochastic, M₁ and M₂ matrix A given by

satisfies the requirement that ||A - T|| < ||Z - T|| for all generalized doubly stochastic, M₁ and M₂ matrices Z

ⁿ^×ⁿ, where x and y have been chosen (where possible) so as to minimize ||x - t₂||² + ||y - t₃||² subject to the constraint

Proof. For convenience we write c = t₂, d = t₃ and

It remains for us to solve the problem

for which we find the Kuhn-Tucker conditions to be x = c - μy and y = d - μx, where μ is the Lagrange multiplier. We solve simultaneously x^Ty = r, x + μy = c and μx + y = d. The latter two equations imply c - μd = (1 - μ²)x and d - μc = (1 - μ²)y, which imply (1 - μ²)²x^Ty = (c - μd)^T(d - μc), and with the first equation this implies

If this quartic equation has no real solution μ then there is no minimum.

Case 1: If each real solution μ ≠ ±1 then for each such μ

and we check to see which pair x, y gives a minimum.

Case 2: If μ = 1 and c = d then we must solve x + y = c and x^Ty = r, i.e. x^T(x-c) = -r, which by completing the square becomes

If - r < 0 there is no minimum in this case. Note that since x + y = c we have

||x - c||² + ||y - c||² = ||x + y||² - 2x^Ty = ||c||² - 2r.

This is a constant, so if - r > 0 we can take for any w

ⁿ, where w ≠ 0,

Case 3: If μ = -1 and c = -d then a similar calculation shows that if + r < 0 there is no minimum. Whereas if + r > 0 we can take for any w

ⁿ, where w ≠ 0,

Acknowledgements. Thomas L. Hayden received partial support from NSF grant CHE-9005960.

Received: 19/X/07. Accepted: 25/III/08.

#738/07.

[1] S.K. Berberian, Introduction to Hilbert Space Oxford University Press, New York, (1961).
[2] P. Feldmann and R.W. Freund, Efficient Linear Circuit Analysis by Padé Approximation via the Lanczos Process IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 14(5) (1995), 639-649.
[3] W. Glunt, T.L. Hayden and R. Reams, The Nearest ''Doubly Stochastic'' Matrix to a Real Matrix with the same First Moment Numerical Linear Algebra with Applications, 5(1) (1999), 1-8.
[4] N.J. Higham, Matrix nearness problems and applications Applications of Matrix Theory, pp. 1-27, The Institute of Mathematics and its Applications Conference Series. New Series. 22 (1989), Oxford University Press, New York.
[5] R.A. Horn and C.R. Johnson, Matrix Analysis Cambridge University Press, Cambridge (1985).
[6] R.N. Khoury, Closest Matrices in the Space of Generalized Doubly Stochastic Matrices Journal of Mathematical Analysis and Applications, 222 (1998), 562-568.
[7] A.W. Marshall and I. Olkin, Inequalities: Theory of Majorization and its Applications Academic Press, New York (1979).
[8] Zheng-Jian Bai, Delin Chu and Roger C.E. Tan, Computing the nearest doubly stochastic matrix with a prescribed entry SIAM Journal on Scientific Computing, 29(2) (2007),635-655.

Publication Dates

Publication in this collection
21 July 2008
Date of issue
2008

History

Received
19 Oct 2007
Accepted
25 Mar 2008

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

[1] [1] S.K. Berberian, Introduction to Hilbert Space Oxford University Press, New York, (1961).

[2] [2] P. Feldmann and R.W. Freund, Efficient Linear Circuit Analysis by Padé Approximation via the Lanczos Process IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 14(5) (1995), 639-649.

[3] [3] W. Glunt, T.L. Hayden and R. Reams, The Nearest ''Doubly Stochastic'' Matrix to a Real Matrix with the same First Moment Numerical Linear Algebra with Applications, 5(1) (1999), 1-8.

[4] [4] N.J. Higham, Matrix nearness problems and applications Applications of Matrix Theory, pp. 1-27, The Institute of Mathematics and its Applications Conference Series. New Series. 22 (1989), Oxford University Press, New York.

[5] [5] R.A. Horn and C.R. Johnson, Matrix Analysis Cambridge University Press, Cambridge (1985).

[6] [6] R.N. Khoury, Closest Matrices in the Space of Generalized Doubly Stochastic Matrices Journal of Mathematical Analysis and Applications, 222 (1998), 562-568.

[7] [7] A.W. Marshall and I. Olkin, Inequalities: Theory of Majorization and its Applications Academic Press, New York (1979).

[8] [8] Zheng-Jian Bai, Delin Chu and Roger C.E. Tan, Computing the nearest doubly stochastic matrix with a prescribed entry SIAM Journal on Scientific Computing, 29(2) (2007),635-655.

Brasil

Brasil

The nearest generalized doubly stochastic matrix to a real matrix with the same firstand second moments

Abstract

Publication Dates

History