Acessibilidade / Reportar erro

Error bound for a perturbed minimization problem related with the sum of smallest eigenvalues

Abstract

Let C be a n×n symmetric matrix. For each integer 1 < k < n we consider the minimization problem m(ε): = minX{ Tr{CX} + εƒ(X)}. Here the variable X is an n×n symmetric matrix, whose eigenvalues satisfy <img border=0 src="../../../../img/revistas/cam/v29n2/a04img01.gif"> the number ε is a positive (perturbation) parameter and ƒ is a Lipchitz-continuous function (in general nonlinear). It is well known that when ε = 0 the minimum value, m(0), is the sum of the smallest k eigenvalues of C. Assuming that the eigenvalues of C satisfy λ1(C) < ... < λk(C) < λk+1(C) < ∙∙∙ < λn(C), we establish the following upper and lower bounds for the minimum value m(ε): <img border=0 src="../../../../img/revistas/cam/v29n2/a04img02.gif"> where <img border=0 src="../../../../img/revistas/cam/v29n2/f4_barra.gif" align=absmiddle>is the minimum value of ƒ over the solution set of unperturbed problem and L is the Lipschitz-constant of ƒ. The above inequality shows that the error by replacing the upper bound (or the lower bound) by the exact value is at least quadratic in the perturbation parameter. We also treat the case that λk+1(C) = λk(C). We compare the exact solution with the upper and lower bounds for some examples. Mathematical subject classification: 15A42, 15A18, 90C22.

matrix analysis; sum of smallest eigenvalues; minimization problem involving matrices; nonlinear perturbation; semidefinite programming


Error bound for a perturbed minimization problem related with the sum of smallest eigenvalues

Marcos Vinicio Travaglia

Departamento de Matemática, Centro de Ciências da Natureza Universidade Federal do Piauí, 64049-550 Teresina, PI, Brazil E-mail: mvtravaglia@ufpi.edu.br

ABSTRACT

Let C be a n×n symmetric matrix. For each integer 1 < k < n we consider the minimization problem m(ε): = minX{ Tr{CX} + εƒ(X)}. Here the variable X is an n×n symmetric matrix, whose eigenvalues satisfy

the number ε is a positive (perturbation) parameter and ƒ is a Lipchitz-continuous function (in general nonlinear). It is well known that when ε = 0 the minimum value, m(0), is the sum of the smallest k eigenvalues of C.

Assuming that the eigenvalues of C satisfy λ1(C) < ... < λk(C) < λk+1(C) < ∙∙∙ < λn(C), we establish the following upper and lower bounds for the minimum value m(ε):

where is the minimum value of ƒ over the solution set of unperturbed problem and L is the Lipschitz-constant of ƒ. The above inequality shows that the error by replacing the upper bound (or the lower bound) by the exact value is at least quadratic in the perturbation parameter. We also treat the case that λk+1(C) = λk(C). We compare the exact solution with the upper and lower bounds for some examples.

Mathematical subject classification: 15A42, 15A18, 90C22.

Key words: matrix analysis, sum of smallest eigenvalues, minimization problem involving matrices; nonlinear perturbation; semidefinite programming.

1 Introduction

We denote by Mn the set of n×n real matrices. The sum of diagonal of CMn is denoted by Tr{C}. If C is a symmetric matrix then the sum of their first (smallest) k eigenvalues, (C) (multiplicity included), can be obtained asthe minimum value of the following minimization problem with matrix variable (see the proposition 2.3):

The notation X 0 means that X is an n×n (symmetric) positive semidefinite matrix, that is, X is symmetric and its eigenvalues are nonnegative. We write Y X when the difference matrix Y - X 0. We denote by I the identity matrix.

Note that the objective function, Tr{CX}, and the restriction, Tr{X} = k, of the minimization problem (1) are linear in the variable X. On the other hand, the restrictions I - X 0 and X 0 are convex but nonlinear. Due to these last two restrictions the problem (1) is, in general, not a Linear Programming (LP).

We denote by : = {XMn : Tr{X} = k, I - X 0 and X 0 } the domain of the objective function of the minimization problem (1). This means that the elements of are symmetric matrices, whose eigenvalues satisfy: 0 < λi (X) < 1 for i = 1, 2,..., n and (X) = k.

In this work we propose to study a nonlinear perturbation of the problem (1) defined as follows:

where ε (perturbation parameter) is a nonnegative real number and ƒ: is a function that is, in general, nonlinear. The function ε ƒ (X) is called perturbation. We denote by m(ε) the minimum value of the problem (2). In general the solution of (2) depends on ε. Moreover, according to (1) one has m(0) = (C).

We close the introductory section with a motivation to study the perturbed minimization problem (2):

Semidefinite Programming: In particular case k = 1 the restriction I - X 0 can be dropped in (1). To see this, note that if λi(X) > 0 and (X) = Tr{X} = 1 then 0 < λi(X) < 1. Hence, I - X 0. The minimization problem (k = 1)

is a special case of the following problem:

which is called standard semidefinite programming (SDP). In (4) Ai are n×n symmetric matrices. The SDP has many applications in Combinatorial Optimization. See [5] for a survey on SDP and [6] for the relation between SDP and Eigenvalue Optimization.

A bit more general case of (3) is min{Tr{CX} : Tr{AX} = b and X 0 } with A 0 (that is, A is strict positive definite) and b > 0. We can reduce this case to (3) by plugging : = b A-1/2C A-1/2 in the place of C in (3).

Strictly Convex Perturbation of SDP: It is convenient to add a strictly convex function εƒ (x) in order to solve the unperturbed problem (4). This makes the perturbed minimization problem strictly convex. Hence, it has only one minimizer. One hopes that this minimizer approximates to a solution of (4) as ε→ 0. The interior point methods [10], for solve SDPs, make use of the strictly convex function εƒ (X) = - ε log det(X), which is a log-barrier. Since this function is not Lipschtiz-continuous our error bound can not be used. On the other hand, the authors [7] arrived at an algorithm for SDPs that has several advantages over existing techniques using the perturbation functions ε ƒ (X) = - ε log det(X) and ε ƒ (X) = ε Tr{X2}. The last one is strictly convex and Lipschitz-continuous. As the main advantage, this algorithm is a first-ordermethod, which makes it scalable.

2 Results

Since we can not obtain in general the exact value of m(ε) for ε > 0, we propose to establish an upper u(ε) and a lower bound (ε) for m(ε). That is, u(ε) > m(ε) > (ε), from which follows that 0 < m(ε) - (ε) < u(ε) - (ε) and 0 < u(ε) - m(ε) < u(ε) - (ε). Under some condition on ƒ we prove that u(ε) - (ε) = constant ε2 (see theorem 2.8). Hence, both m(ε) - (ε) and u(ε) - m(ε) go at least quadratic to zero as e goes to zero. We interpret u(ε) - (ε) as an error bound. In contrast to the minimization problem (2), the authors [1] proved that for perturbed LP (see section 4) uLP(ε) - LP(ε) = 0 for all 0 < ε < εo with some εo > 0. This means that for perturbed LP there is no error if we require that e be small enough. This is, in general, not the case for the minimization problem (2) (see example 1 of section 3). The author [9] derived error bound for perturbed LP when ƒ is strictly convex, and in [8] the perturbation results of [1] were extend to convex programmings.

As in [1], we assume that ƒ : be a Lipschitz-continuous function, that is, there is a constant 0 < L < ∞ such that for all X,Y the inequality |ƒ(X) - ƒ(Y)| < L||X - Y|| is true. Throughout this paper ||·|| denotes the Frobenius-norm, that is, for XMn we define

It is well known that for a symmetric matrix X the equality holds.

The following proposition establishes that the set is a bounded subset of Mn

n×n. Hence, is compact.

Proposition 2.1. The set is a subset of the ball of Mn with size , that is, if X then ||X|| < .

Proof. For Xwe have 0 < λi(X) < 1 for i = 1,..., n. Therefore

Since is compact and Tr{C X} is continuous the minimum value of the problem (1) is attained.

In order to characterize the minimizers of the problem (1) we need the following definition:

Definition 2.2 (The values d, r, s, λr(C), and λs(C)). Consider the eigenvalues of the symmetric matrix CMn in increasing order, that is, λ1(C) < λ2(C) < ··· < λn(C). For each k = 1, 2,..., n we define:

d = d(C, k) := {i ∈ {1,..., n}: λi(C) = λk(C)}

Note that d is multiplicity (degeneracy) of k-th eigenvalue. Further we define:

(C, k):= {i ∈ {1,..., n} λi (C) < λk (C)},

∈ (C, k):= {i {1,..., n} λi (C) > λk (C)},

and

In the case r = 0 we do the following convention λr(C) = λ0(C): = -∞ and in the case s = n+1 we do λs(C) = λn+1 (C): = +∞. Note that s - r - 1 = d holds true.

The following example illustrates the above definition:

Example. For n = 10 and k = 6. If the eigenvalues of C are:

a) 2, 3, 4, 4, 4, 4, 4, 4, 11, 12 then λk = 4, d = 6, r = 2, s = 9, λr = 3 and λs = 11.

b) 4, 4, 4, 4, 4, 4, 4, 4, 11, 12 then λk = 4, d = 8, r = 0, s = 9, λr = -∞ and λs = 11.

c) 2, 2, 2, 3, 4, 4, 4, 4, 4, 4 then λk = 4, d = 6, r = 4, s = 11, λr = 3and λs = +∞.

d) 4, 4, 4, 4, 4, 4, 4, 4, 4, 4 then λk = 4, d = 10, r = 0, s = 11, λr = -∞, λs = +∞. Note that, in this case, C = 4I.

Proposition 2.3.

The minimum value of the problem (1) is the sum of the smallest k eigenvalues of the symmetric matrix C. Moreover, the set of its minimizers is

where

Here d = d(C,k) is the multiplicity of the k-th eigenvalue of C and V is any n×n orthogonal matrix whose columns are the eigenvectors (in increasing order of their eigenvalues) of the matrix C.

The matrix Ir in (7) is the identity matrix of order r, where r = r(C, k) is defined by (5). If r = 0 then this identity matrix does not appear in (7). The block matrix 0 in the diagonal of (7) is of order n+1 - s, where s = s(C,k) is defined by (6). If s = n+1 then this block matrix 0 does not appear in (7).

If d = 1 then there is only one minimizer, namely: (orthogonal projection with rank k), where vi is any (normalized) eigenvector corresponding to the i-th eigenvalue of C.

Proof. See appendix A.

Remark 2.4. In order to simplify the notation we mean by AB (direct sum) the block matrix . That is, AB is a square matrix of order na + nb if A and B are square matrices of order na and nb respectively.

Since the objective function Tr{CX} + εƒ(X) is the sum of two continuous functions on the compact set its minimum value, m(ε), is attained. We denote by X*(ε) the set argmin{Tr{CX} + εƒ(X) : X} (the set of minimizers). The proposition 2.3 states that m(0) = (C) and

*(0) = {V (IrZ ⊕ 0) VT : ZMd with Tr{Z} = k - r and I Z 0}.

An important value of ƒ in order to study the relation between m(ε) and m(0) is : = min{ƒ(X) : X*(0)}.

We now establish the following upper bound for m(ε):

Proposition 2.5 (Upper bound).

For the minimum value m(ε) of the problem (2) the following upper bound holds:

u(ε): = m(0) + ε

> m(ε)

for all ε> 0.

Proof. Take Y*∈ argmin{ƒ(X) : X*(0)}. Since Y* we have by definition of m(ε) that

Remark 2.6. In the case d(C, k) = 1 we have = ƒ() because

(there is only one minimizer). Hence, u(ε) = m(0) + εƒ(P) is an upper bound. On the other hand, in the case d(C, k) > 1, the function U(ε): = m(0) + εƒ(P) = Tr{CP} + εƒ(P) is clearly an upper bound for m(ε). If ƒ(P) > then U(ε) > u(ε) for ε > 0. Therefore, the upper bound U(ε) is not interesting because when we are estimating an error we try, in general, to find smaller upper bounds and larger lower bounds.

We can easily obtain a crude lower bound for m(ε) as the following:

Proposition 2.7 (A linear error bound). For the minimum value of (2) the following upper and lower bounds hold:

for all ε> 0.

Proof. The upper bound, u(ε) : = m(0) + ε, was proved in proposition 2.5. To prove the lower bound take X*X*(ε) and Y*∈ argmin{ƒ(X) : X*(0)}. Hence

In last step we used that X* and Y* and the proposition 2.1. This proves the proposition 2.7.

The main result of this work is the following theorem, which improves the proposition 2.7:

Theorem 2.8 (A quadratic error bound). If f is a Lipschitz-continuous function with Lipschitz-constant L then the following upper and lower bound hold:

for all 0 < ε. Here α(C, k) is strict positive and given by

where the values r = r(C, k) and s = s(C, k) are given by the definition 2.2.

Example. In the last example (n = 10, and k = 6) we have for a) α(C, 6) = 1/4, b) α(C, 6) = 7/12, c) α(C, 6) = 1/8 and d) α(C, 6) = +∞.

Remark 2.9. In the particular case, where the eigenvalues of C are ordered as λ1< λ2< λk < λk+1< λn, that is, s = k+1, we have .

Remark 2.10. In the particular case k = 1 (⇒ r = 0) we obtain:

where d(C,1) is the multiplicity (degeneracy) of λ1(C). That is, if k = 1, then α can be taken as the half of the gap of the first eigenvalue.

Remark 2.11. The expression (9) for the strict positive constant α(C, k) is obtained in lemma B.1 (see appendix B). The lemma B.1 states that α(C, k) given by (9) satisfies the inequality

for all X.

We comment two cases where α(C, k) = +∞.

Remark 2.12. If d = n then one has C = λ1(C) I (a multiple of the identity matrix). Note that in this case we have r = 0 and s = n+1. Hence, by definition (9) α(C, k) = +∞. Note that the left hand side (LHS) of (10) is zero since Tr{CX} = kλ1(C) for X. This means that X*(0) = . The RHS of (10) is also zero because in that case X*(0) = . Therefore, the largest (the best) α for which the inequality (10) holds is +∞. Note also that, in the case d = n, the equality m(ε) = m(0) + ε holds for all ε > 0 since X*(0) = . This is in agreement with the theorem 2.8: 0 < m(0) + ε - m(ε) < since = 0. That is, the theorem 2.8 confirms that if d = n the exact minimum value m(ε) coincides with theupper bound for all ε > 0.

Remark 2.13. If k = n then s = n+1. According to definition (9) we also have α(C, k) = +∞. Note that, if k = n then = {I} = *(0) because if XMn with Tr{X} = n and I X 0 then X = I. Consequently both the LHS and the RHS of (10) are zero. This means that we can take α(C, n) = +∞. Note also that, in the case k = n, the equality m(ε) = m(0) + ε = Tr{C} + εƒ(I) holds for all ε > 0 since X*(0) == {I}.

Proof of Theorem 2.8

Proof. The upper bound was proved in proposition 2.5. To prove the lower bound take a X**(ε), a Y*∈ argmin{ƒ(X) : X*(0)} and a Y**∈ argmin{||X* - Y||2 : Y*(0)}.

Since X**(ε) and Y* we have:

Developing (11), using that Tr{CY*} = (C) and (10) we obtain:

We conclude two things:

and

Note that Tr{CY*} = (C) = m(0) (because Y**(0)). Further recall that

because X**(ε). Hence, we rewrite (13) as

that is,

Now, we consider two cases in (15):

Case 1: X* = Y**. In this case we obtain directly from (15) that m(ε) > m(0) + ε.

Case 2: X*Y**. In this case follows from (12) that ||X* - Y**|| < . Replacing this last inequality in (15) we obtain:

Since m(0) + ε

> m(0) + ε - we conclude that in both cases the following lower bound for m(ε) holds:

This proves the theorem 2.8.

3 Examples and comparison between the perturbed matrix minimization problem (2) and perturbed LP

In this section we give two examples of the minimization problem (2) for n = 2 and k = 1. We present the exact minimum value and compare it with the upper and lower bounds. The first example consists in an perturbed SDP that is not a perturbed LP.

Example 1: Consider the matrix C = , n = 2, k = 1 and the nonlinear function ƒ(X) = X1,1X1,1 . In this example the minimization problem (2) is given by:

Remark 3.1. In (16) the constraint, < X1,1X2,2, is nonlinear and the variable X1,2 appears in the objective function. Hence, the above perturbed matrix minimization problem is not a perturbed LP.

In this example we have:

– The eigenvalues and eigenvectors of C are: λ1(C) = 0, λ2(C) =

– The solution set of linear part is

*(0) = ;

– The minimum value of ƒ on the solution set of the linear part is : = ƒ(X) = 1/4;

– The Lipschitz-constant is L = . To see this, note that for X,Y holds:

and

Taking (17) and (18) into account, we get

That is,

From (19) we conclude that the Lipschitz-constant for ƒ(X) = can be taken as . Moreover, L = is the best (the smallest) Lipschitz-constant since Lbest = sup{|ƒ(X) - ƒ(Y)|/||X - Y||: X, Y with XY} > limδ → 0 |ƒ(A) - ƒ(Bδ)|/||A - Bδ|| = for the choice A = and B = with 0 < δ < 1;

– The upper bound in this example (see proposition 2.5) is u(ε) = ε;

– Since r = 0 and s = 2 the lower bound in this example (see theorem 2.8) is (ε) = ε - 2ε2;

According to proposition C.1 the exact minimum value m(ε) can be expressed as:

For some values of ε > 0 the maximum value in (20) can be obtained with help of the software Maple (command maximize). We present the values of m(ε) and the corresponding lower and upper bounds in the following table:

Table 1 shows that for ε > 0 the upper bound u(ε) is always strict larger than the exact value m(ε). We prove this fact rigorously in the proposition C.2.

Example 2: Consider the diagonal matrix C = , n = 2, k = 1, r = 0, s = 2 and the same nonlinear function ƒ(X) = as in the example 1. In this case the minimization problem (2) is given by:

Since the variable X1,2 does not appear in the above objective function we can rewrite this minimization problem as:

This shows that the example 2 of perturbed matrix minimization problem is not more than a perturbed LP.

In this example we have m(0) = λ1(C) = 0, λ2(C) = 2; r = 0, s = 2; = (1, 0), = (0, 1); = ƒ(v1) = 1 and L = (see example 1). Therefore u(ε) = ε (see proposition 2.5) and (ε) = ε- 2ε2 (see theorem 2.8). On the other hand, we can easily compute the exact minimum value as:

On the contrary of example 1, note that u(ε) = m(ε) for all 0 < ε < 1. That is, there is no error by replacing the upper bound by the exact minimum value if we require that ε be small enough. This is a property of all Linear Programmings perturbed by a Lipschitz function (see section 4).

4 Note about LP perturbed by a Lipschtz function

In the context of Linear Programming (LP) the corresponding matrix minimization problem (2) is given by:

for a fixed cn. Here the domain of the objective function is:

and ƒ :

LP is a Lipschitz function, that is, there is a 0 < Lmax < ∞ such that:

is the max-norm.

We define S(0): = argmin, that is, the preimage of the value mLP(0) (the set of minimizers of the unperturbed problem), S(ε): = argmin, Sƒ(0): = argmin{ƒ(x) : xS(0)} and : = min{ƒ(x) : xS(0)}.

The authors [1] showed that for small enough ε there is no error by replacing the upper bound, uLP(ε) : = mLP(0) + ε, by the exact minimum value, mLP(ε), namely:

where εo is strict positive and given by:

This means that Sƒ(0) ⊂ S(ε) for all 0 < ε < εo. In [1] αLP can be taken as the largest positive constant that satisfies the inequality

for all xLP. It is interesting to compare (26) with (10) and (9). In fact, in [1] it is only proved that αLP > 0. That is, in [1] there is no explicit formula for αLP in terms of c.

Remark 4.1. In this section we are using the max-norm in (23) and (26) in order to follow [1]. We can assure the strict positiveness of εo for any choice of norm, since in n all norms are equivalent.

In order to illustrate the result (24)-(25) of [1], note that minimization problem of example 2 is as in (22). To see this, we identify x1 = X1,1 and x2 = X2,2. Hence,

It is easy to show that for this example that Lmax = 2 and αLP = 2. So, by (25) εo = 1. According to the exact solution, see (21), εo = 1 is the best (the largest) value for the equation (24) holds true.

Appendix A. Proof of proposition 2.3

In order to prove the proposition 2.3 we need first a result (see corollary A.2 bellow), which is a direct consequence of the following lemma:

Lemma A.1 (Cauchy-Schwarz-Bunjakowski Inequality for the entries of positive semidefinite matrices). If W Mn is a symmetric positive semidefinite matrix then its entries satisfy the following inequality: |Wi,j| 2< Wi,i Wj,j for all i,j = 1, 2,..., n.

Proof. see [4] page 398.

Corollary A.2.

Consider W ∈ Mn with I W 0. In this case we have the following implications:

a) If W, = 1 for some ∈ {1 , ... , n} then W,j = 0 for all 1 < j < n (j) and Wi, = 0 for all 1 < i < n (i);

b) If W, = 0 for some ∈ {1 , ... , n} then W,j = 0 for all 1 < j < n and Wi, = 0 for all 1 < i < n;

Proof. To prove a) we use that I

W, so I - W 0. It follows bylemma A.1 that |δi,j - Wi,j| 2< (1- Wi,i)(1 - Wj,j). Hence, if W, = 1 then W,j = 0 for j and Wi, = 0 for i. The proof of b) uses that W 0 and the lemma A.1.

Proof of Proposition 2.3.

Proof. Since C is symmetric there is an orthonormal-basis of eigenvectors.We denote it by n. Since this basis is orthonormal the traceTr{CX} is expressed as

We define Wi,j(V, X) : = X vj for i, j = 1, 2,..., n. That is, the matrix W is given by W: = VT X V. Recall that V is the matrix, whose columns are the eigenvectors vi. Note that for all X we have:

because I X 0 and Tr{X} = k respectively. Combining (28) with (29) we obtain:

where the set := {x = (x1, x2,..., xn) ∈ [0, 1]n : x1 + x2 +...+ xn = k} is convex and compact.

We claim that we have equality in (30). To see this, let yC be a minimizer of the right hand side (RHS) of (30). Further, take X(y): = VD(y)VT, where D(y) is the diagonal matrix which the diagonal entries are the yi's and V is the matrix which columns are the eigenvectors vi. We will show that X(y) ∈ and Tr{CX(y)} is equal the minimum value of the RHS of (30). Indeed, X(y) ∈ since Tr{VD(y) VT} = Tr{D(y)} = yi = k and I

V D(y)VT 0 since 1 > yi > 0. Moreover, Tr{CX(y)} = Tr{CVD(y)VT} = Tr{VTCVD(y)} (the trace is cyclic). On the other hand, VTCV = D(λ) (spectral decomposition of C). Hence, Tr{CX(y)} = Tr{D(λ) D(y) } = Σi=1λi(C)yi, which is the minimum value of the RHS of (30). Therefore,

Comparing (30) with (31) we obtain the claim, that is,

The fundamental theorem of linear optimization (FTLP, see [2]) states that

Here Ext() is the set of the extreme points of . It is well-known that

Note that #Ext() = Hence:

The minimization problem of the right side of (35) is easy to solve. More precisely:

In the last equality we used the fact that λ1(C) < λ2(C) < ... < λn(C) (increasing order).

Combining (32), (33), (35) and (36) follows that λ1(C) +...+ λk(C) is the minimum value of the problem (1).

Next, we will characterize the set of minimizers of the problem (1). We denote it by

*(0).

According to the definitions of r(C, k), s(C, k) and d(C, k) (see definition 2.2) the eigenvalues of C are ordered as

λ1< ... < λr < λr+1 = λr+2 = ... = λr+d < λs< ... < λn.

The FTLP states also that

where

Note that # Ext*() = . The set of the RHS of (37) is the smallest convex subset of that contains Ext*(). Recalling (38) and the fact n = r + d + s, it follows that the convex hull of Ext*() is clearly given by:

We define the following three sets: 1)

d := {ZMd : Tr{Z} = k - r and I Z 0 },

2)

and 3) V

r,d,s VT: = {V W VT : Wr,d,s}. We mean by Ir
Z 0 the n×n matrix in the block form:

In this block, Ir is the r×r identity matrix, Z is a d×d matrix and, by 0 in the diagonal, we mean the (n - s +1)×(n - s + 1) zero-matrix.

We claim that if X* is a minimizer of problem (2) then the corresponding matrix W* = W(V, X*): = VT X*V is an element of r,d,s. To see this, note that if X* is a minimizer then by (28) one has Tr{CX*} = λi(C)W*i,i and by (37) and (39) the vector (W*1,1, W*2,2, ... , W*n,n) must satisfy:

Now, combining (40) a) and (40) b) with the corollary A.2 we conclude that:

a) For any i = 1, 2,..., r and j = 1, 2,..., n we have W*i,j = δi,j,

a') For any j = 1,2,..., r and i = 1, 2,..., n we have W*i,j = δi,j,

b) For any i = s, s+1,..., n and j = 1, 2,..., n we have W*i,j = 0 and

b') For any j = s, s+1,..., n and i = 1, 2, ..., n we also have W*i,j = 0.

This proves that the n×n matrix W* is of the form

Here is a d×d (note that d = s - r -1) submatrix which satisfies I

0 because W* = VT X*V and X* satisfies also I X 0. Due to the (40) c) one has Tr{} = k. Therefore W*r,d,s. Since V T = V-1 (because V is orthogonal) we obtain that *(0) ⊂ V r,d,s VT.

On the other hand, the set V r,d,s V T is a subset of *(0) because if Zd then

Therefore V r,d,s V T*(0).

In the particular case d = 1 (⇒ r+1 = k) we have Z = [1]. Hence, the only element of the set *(0) is X = V(Ir [1] 0)VT = V(Ir+1 0)VT = V(Ik 0)VT = . This is a orthogonal projection of rank k.

Recall that if the matrix C has degenerate eigenvalues then there are many choices for the orthogonal matrix V. We prove that the set of minimizers does not depend on the choice of V. Consider the eigenvalues of C without counting the multiplicity, that is, µ1(C) < µ2(C) < ... < µp(C). We denote by ν1 the multiplicity of µ1(C), by ν2 is the multiplicity of µ2(C) and so on. The corresponding eigenspaces Ein, i = 1,... , p are pairwise orthogonal and dim Ei = νi. Note that νi = n. The many choices for V comes from the fact that there are many orthogonal basis for each eigenspace. However, two different orthogonal basis of an eigenspace are related to each other by an orthogonal matrix. Suppose that were other choice, then V and are related by = V(U1

U2 ... Up) where Ui is a νi×νi orthogonal matrix. We claim that the solution sets X*(0) := {V(Ir Z 0)VT : Zd} and *(0) := {(Ir Z 0) T : Zd} are equal. To see this, note that (Ir Z 0)T = V(U1 ... Up)(Ir Z 0)(U1 ... Up)T VT = V(r
0)VT where = (U1 ... Up)Z(U1 ... Up)T and r = (U1 ... Up)Ir(U1 ... Up)T. But it easy to see that r = Ir. Therefore, (Ir
Z 0)T = V (Ir
0)VT and since ZKdd follows the claim.

5 Proof of Lemma B.1

Lemma B.1. Let C Mn be a symmetric matrix. For each k = 1, 2, ..., n, the following strict positive constant α(C, k) defined by

satisfies the inequality

for all X . In (41) the values r = r(C, k) and s = s(C, k) are given by definition 2.2. Moreover, we do the convention that λ0(C): = -∞ in the case r = 0 and λn+1(C) = +∞ in the case s = n+1. In (42) we denote by *(0) the set argmin{Tr{CX} : X}.

The proof of the lemma B.1 is based on the three propositions below. More precisely, the propositions B.2, B.3 and B.6.

Note that the matrix function Tr{·} (consequently also ||·||2) is invariant by conjugation of an orthogonal matrix, that is, Tr{SAST} = Tr{A} for all AMn and S orthogonal. Due to this fact, we can reduce the proof of (42) to the case that C is diagonal. More precisely, we have:

Proposition B.2. Let C = VΛV T be a spectral decomposition of the symmetric matrix C, with diagonal matrix Λ: = diag(λ1(C), ..., λn(C)) and a corresponding orthogonal eigenvector matrix V. If X then for W = W(X): = V T X V holds:

and

Here

d := {ZMd : Tr{Z} = k - r and I Z 0}.

Proof. Since the trace is orthogonal-invariant we have

Tr{CX} = Tr {V Λ VT V W VT}= Tr{Λ W}.

This proves (43). To prove (44) take Z**∈ argmin{||W - Ir

Z 0|| 2 : Zd}. Due to the proposition 2.3 Y** := V(Ir
Z** 0)VT*(0). From the orthogonal invariance of ||·||2 follows that the

This proves (44).

In order to prove (42) we establish a lower bound for TrW} - Σi=1 λi(C) (see proposition B.3) and an upper for {||W - Ir Z 0||2} (see proposition B.6). Namely:

Proposition B.3. Let be n > 2. For all W we have

In the cases r = 0 and s = n+1 we have

and

respectively.

Proof. Recall that W means that Tr{W} = k and I W 0. Consequently Wi,i - 1 < 0, Wi,i > 0 and Wi,i = k. In order to simplify the notation we identify Wi,i with wi and λi(C) with λi. Note that

From the following observations:

a) Combining wi - 1 < 0 for i = 1,..., k with λr> λi for i = 1,..., r and λk< λj for j = r + 1,..., k;

b) Combining wi > 0 for i = s,..., n with λs< λi for i = s,..., n and λk> λj for j = r + 1,..., k;

c) λk = λr+i for i = 1,... , d and λk = λk+j for j = 1,..., s - 1;

d) , we conclude that

which proves the proposition B.3.

In order to establish an upper bound for ||W - Ir Z 0||2 we need first the following two lemmas:

Lemma B.4. For any positive semidefinite matrix A Mp the inequality

holds.

Proof. To prove (48) note that if A 0 then 0 < λi(A). Hence,

Lemma B.5. For any symmetric matrix A Md with I

A 0 we have the following inequality

Proof. Case 1:A ≠ 0. Since A 0 we have in this case that Tr{A} > 0. So we can take Zo := A as a trial element of the domain of the minimization problem (50). Note that Zo satisfies I Z 0, since A also does, and satisfies Tr{Zo} = k - r. Hence, the Left Hand Side, LHS, of (50) has the following upper bound:

On the other hand, since A 0 we have by (48) that ||A||2/Tr2{A} < 1. This closes the proof in the first case.

Case 2: A = 0. In this case Tr{A} = 0. Using (48) again, we have ||Z||2< Tr2{Z}. Moreover, Tr2{Z} = (k - r)2 = ((k - r) - 0)2 = ((k - r) -Tr{A})2. Hence, LHS < ((k - r) -Tr{A})2. This proves the lemma B.5.

Proposition B.6. The inequality

holds for all W . In the cases r = 0 and s = n + 1 the inequality (52) becomes

and

respectively.

Proof. We write W ∈ in the block form:

Here R(W) ∈ Mr, A(W) ∈ Md, S(W) ∈ Mn-(d+r), E(W) ∈ Mr×d, F(W) ∈ Mr×(n-(r+d)) and G(W) ∈ Md×(n-(r+d)). In the particular case r = 0 it is understood that the blocks R(W), E(W), ET(W), F(W) and FT(W) do not appear in (55). Similarly, if s = n+1, that is, r+d = n, then the blocks S(W), F(W), FT(W), G(W)and GT(W) does not appear in (55). The square of the Frobenius norm of the matrix Ir Z 0 - W splits as

Since I W we also have that Ir R(W), that is, Ir - R(W) 0. Using (48) we have:

Since I W 0 we also have I A(W) \succeq 0. Hence, from lemma B.5 follows that

since Tr{Z} = k - r.

Recall that k = Tr{W} = Tr{R(W)} + Tr{A(W)} +Tr{S(W)}, consequently

Combining (58) with (59) we have

On the other hand, since W 0 we also have that S(W) 0. Combining this with (48) we have:

Now we claim that the following three inequalities hold:

and

To see (62), we recall I - W 0 and use the lemma A.1. Hence,

Combing this with (59) and d = s-(r+1) we prove (62).

To see (63), we recall that W 0 use the lemma A.1. Hence,

The proof of (64) is analog to (63).

Now plugging (57), (60), (61), (62), (63) and (64) into (56) we obtain:

which proves the proposition B.6.

Proof of Lemma B.1

Proof. Recall that the eigenvalues of C are ordered as λ1< ... < λr < λr+1 = ... = λk = ... = λr+k < λs< ... < λn. According to the values of r(C, k) and s(C, k) we divide the proof in the following four cases:

Case 1: r > 0 and s < n+1. In order to treat this case, we need first the following lemma, which proof is trivial.

Lemma B.7. Let a > 0, b > 0, c > 0 and d > 0 be constants. For all x, y > 0 we have the inequality

In the case c = 0 is to be understood that min{ac-1, bd-1} = bd-1.

From (43) and (45) we obtain:

Now we use the lemma B.7 with x := r - Tr{R(W)}, y := Tr{S(W)}, a := λk(C) - λr(C), b := λs(C) - λk(C), c := 2(s- (k+1)) and d := 2k. Note that Tr{R(W)} < r since R(W) ∈ Mr and I R(W) 0. Hence, x > 0. Combining the lemma B.7 with (66) we obtain:

Now using (52) and (44) we have:

This means that we can take α(C, k): = 1/2 min.

Case 2: r = 0 and s < n+1. This means that the block R(W) does not appear. In this case we use (43), (46), (46) and (44) to obtain:

This means that we can take α(C, k): = . Note that this a satisfies

since = +∞ and < +∞. This closes the proof in the second case.

Case 3: s = n+1 and r > 0. This means that the block S(W) does not appear. We have two subcases:

Subcase 3.1: k = n. Note that if XMn with I

X 0 and Tr{X} = n then X = In. Hence, = {In}, and so α can be take as +∞ because

for all X Î and ||X - Y||2 = 0 since *(0) = = {I}. Note that this α satisfies

To see this, note that s = n+1 = k+1 and λs = +∞. Hence, both and are infinity.

Subcase 3.2: k < n. Since s = n+1 we obtain s-(k+1) > 0. In this subcase we use (43), (47), (54) and (44) to obtain:

This means that we can take α(C, k) := = . Note that this α satisfies

since λs = +∞ and < +∞.

Case 4: r = 0 and s = n+1. This means, that λ1(C) = ... = λn(C), that is, C = λ1(C) I (a multiple of the identity). Hence, *(0) = because Tr{CX} -(C) = λ1 Tr{X} -λ1k = λ1k - kλ1(C) = 0 for any X. Therefore, ||X - Y||2 = 0 for all X. This means, that we can take α = +∞. Hence, α satisfies

since λr = -∞ and λs = +∞.

6 The matrix minimization problem as a counterexample

For perturbed linear programming (LP) the authors of [1] proved that thereexists an εo > 0 such that mLP(e) = mLP(0) + ε for all 0 < ε < εo. On the contrary, we prove that in the example 1 of matrix minimization problem (2) we have m(0) + ε > m(ε) for all ε > 0. This is proved in proposition C.2. To prove this proposition we need first the following result:

Proposition C.1. Consider the case n = 2 and k = 1. For the matrix C = and the nonlinear function ƒ(X) = X1,1 X1,1 the minimum value of Tr{CX} + εƒ(X) on can be expressed as:

Proof. Note that X implies X1,1, X2,2> 0 and X1,1 + X2,2 = 1. Therefore, if X then X11∈ [0,1]. For X we can express the nonlinear part of the objective function, X1,1 X1,1, as:

With help of the identity (67) we obtain a linear expression in the variable X (up to a maximization problem) for εƒ(X), namely:

Now, we rewrite the minimum value m(ε) as

This means that m(ε) is a minimax value of the two-variable function g(X, r) := Tr{ X} - εr2. Note that the domains and [0,1] are convex, the function g is convex in the variable X and concave in the variable r. Under these conditions we can change the order of minimization and maximization (see Corollary 37.3.2 of [3]). Therefore,

We use the proposition 2.3 to solve the minimization problem (in the variable X) of the above equation. The result is an expression involving the first eigenvalue of the matrix , namely

On the other hand, it is easy to compute λ1 (), wich is equal

This proves the proposition C.1.

Proposition C.2. Consider the case n = 2 and k = 1. For the matrix C = and the nonlinear function ƒ(X) = X1,1 X1,1 the minimum value of (2) satisfies

m(e) < m(0) + ε

for all ε > 0.

Proof. Recall that := X1,1X1,1. Here *(0) = {v1} with = (1, -1), hence = 1/4. Moreover, m(0) = λ1(C) = 0.

By proposition C.1, we only need to show that for all ε > 0 the strict inequality

holds. Here a(r, ε) : = 1 + εr - εr2 and b(r, ε) := . In order to prove (71) consider two cases:

Case 1: 0 ∈ argmax{a(r, ε) + b(r, ε) : r ∈ [0, 1]}. In this case, m(ε) = a(0, ε) + b(0, ε) = 1 + (-1) = 0 and so 0 < ε, since ε > 0. This proves (71) in the first case.

Case 2: there is an r*(ε) ∈ argmax{a(r, ε) + b(r, ε) : r ∈ [0, 1]} with r*(ε) ≠ 0. In this case we have:

Since the function b(r, ε) is strict decreasing in the variable r for ε > 0, the following strict inequality,

holds for ε > 0. Put (73) and (72) into (71) we obtain for all ε > 0 that

This proves (71) in the second case.

Acknowlegments. The author thank the referees for their suggestions: a) Generalization of the case k = 1 to any k = 1, 2,..., n; b) some important current references about SDP; c) Pointing out mistakes in the manuscript. The author would also like to thank João Xavier da Cruz Neto from Federal University of Piauí (UFPI) for discussions and Sissy da S. Souza from UFPI for indicating some reviews about SDP.

Received 28/VII/09.

Accepted: 07/XII/09.

#CAM-120/09.

  • [1] O.L. Mangasarian and R.R. Meyer, Nonlinear perturbation of linear programs SIAM Journal on Control and Optimization, 17(6) (1979), 745-752.
  • [2] V. Chvatal, Linear Programming. W.H. Freeman (1983).
  • [3] R.T. Rockefeller, Convex Analysis. Princeton University Press (1970).
  • [4] R.A. Horn and C.R. Johnson, Matrix Analysis. Cambridge University Press (1985).
  • [5] L. Vandenberghe and S. Boyd, Semidefinite programming SIAM Review, 38 (1996), 49-95.
  • [6] A.S. Lewis and M.L. Overton, Eigenvalue Optimization Acta Numerica, 5 (1996), 149-190.
  • [7] B. Kulis, S. Sra, S. Jegelka and I.S. Dhillon, Scalable Semidefinite Programming using Convex Perturbations Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics - AISTATS, JMLR W&CP, 5 April (2009), 296-303.
  • [8] M.C. Ferris and O.L. Mangasarian, Finite perturbation of convex programs Applied Mathematics and Optimization, 23 (1991), 221-230.
  • [9] P. Tseng, Convergence and error bounds for perturbation of linear programs Computational Optimization and Applications, 13 (1999), 221-230.
  • [10] F. Alizadeh, Interior point methods in semidefinite programming with applications to combinatorial optimization SIAM Journal on Optimization, 5 (1995), 13-51.

Publication Dates

  • Publication in this collection
    23 July 2010
  • Date of issue
    June 2010

History

  • Received
    28 July 2009
  • Accepted
    07 July 2009
Sociedade Brasileira de Matemática Aplicada e Computacional Sociedade Brasileira de Matemática Aplicada e Computacional - SBMAC, Rua Maestro João Seppe, nº. 900 , 16º. andar - Sala 163, 13561-120 São Carlos - SP Brasil, Tel./Fax: 55 16 3412-9752 - São Carlos - SP - Brazil
E-mail: sbmac@sbmac.org.br