Acessibilidade / Reportar erro

An interior point method for constrained saddle point problems

Abstract

We present an algorithm for the constrained saddle point problem with a convex-concave function L and convex sets with nonempty interior. The method consists of moving away from the current iterate by choosing certain perturbed vectors. The values of gradients of L at these vectors provide an appropriate direction. Bregman functions allow us to define a curve which starts at the current iterate with this direction, and is fully contained in the interior of the feasible set. The next iterate is obtained by moving along such a curve with a certain step size. We establish convergence to a solution with minimal conditions upon the function L, weaker than Lipschitz continuity of the gradient of L, for instance, and including cases where the solution needs not be unique. We also consider the case in which the perturbed vectors are on certain specific curves starting at the current iterate, in which case another convergence proof is provided. In the case of linear programming, we obtain a family of interior point methods where all the iterates and perturbed vectors are computed with very simple formulae, without factorization of matrices or solution of linear systems, which makes the method attractive for very large and sparse matrices. The method may be of interest for massively parallel computing. Numerical examples for the linear programming case are given.

saddle point problems; variational inequalities; linear programming; interior point methods; Bregman distances


An interior point method for constrained saddle point problems

Alfredo N. IusemI, * * Research of this author was partially supported by CNPq grant No. 301280/86. ; Markku KallioII

IInstituto de Matemática Pura e Aplicada, Estrada Dona Castorina, 110 22460, Rio de Janeiro, RJ, Brazil

IIHelsinki School of Economics, Runeberginkatu 14-16, 00100 Helsinki, Finland E-mail: iusp@impa.br/ kallio@hkkk.fi

ABSTRACT

We present an algorithm for the constrained saddle point problem with a convex-concave function L and convex sets with nonempty interior. The method consists of moving away from the current iterate by choosing certain perturbed vectors. The values of gradients of L at these vectors provide an appropriate direction. Bregman functions allow us to define a curve which starts at the current iterate with this direction, and is fully contained in the interior of the feasible set. The next iterate is obtained by moving along such a curve with a certain step size. We establish convergence to a solution with minimal conditions upon the function L, weaker than Lipschitz continuity of the gradient of L, for instance, and including cases where the solution needs not be unique. We also consider the case in which the perturbed vectors are on certain specific curves starting at the current iterate, in which case another convergence proof is provided. In the case of linear programming, we obtain a family of interior point methods where all the iterates and perturbed vectors are computed with very simple formulae, without factorization of matrices or solution of linear systems, which makes the method attractive for very large and sparse matrices. The method may be of interest for massively parallel computing. Numerical examples for the linear programming case are given.

Mathematical subject classification: 90C25, 90C30.

Key words: saddle point problems, variational inequalities, linear programming, interior point methods, Bregman distances.

1 Introduction

In this paper, we discuss methods for solving constrained saddle point problems. Given closed convex sets X, Y contained in n, m respectively, and a function L: X × Y ® , convex in the first variable and concave in the second one, the saddle point problem SPP(L,X,Y) consists of finding (x*,y*) Î X × Y such that

for all (x, y) Î X × Y. SPP(L, X, Y) is a particular case of the variational inequality problem, which we describe next. Given a closed convex set C Ì p and a maximal monotone operator T : p ® (p), VIP(T, C) consists of finding z Î C such that there exists u Î T(z) satisfying

for all z¢ Î C. If ¶xL, ¶yL denote the subdifferential and superdifferential of L in each variable (i.e. the sets of subgradients and supergradients) respectively, and we define T : n × m ® ( n × m) as T = (¶xL, –¶yL) then VIP(T, C) coincides with SPP(X, Y) if we take p = m + n and C = X × Y. It is easy to check that this T is maximal monotone.

It is well known that when T = ¶f for a convex function f(x) then VIP(T, C) reduces to min f(x) s.t. x Î C. This suggests the extension of optimization methods to variational inequalities. In the unconstrained case C = n, for which VIP(T, C) consists just of finding a zero of T, i.e. a z* such that 0 Î T(z*), one could attempt the natural extension of the steepest descent method, i.e.

zk+1 = zk – auk

with uk Î T(zk) and a > 0. This simple approach works only under very restrictive assumptions on T (strong monotonicity and Lipschitz continuity, see [1]). An alternative one, which works under weaker assumptions, is the following: move in a direction contained in –T(zk) finding an auxiliary (or perturbed) point wk and then move from zk in a direction contained in -T(wk), i.e.

with uk Î T(zk), dk Î T(wk), and ak, k > 0. In the constrained case, one must project onto C in order to preserve feasibility, i.e., one takes

where PC is the orthogonal projection onto C. If T is Lipschitz continuous with constant p, then convergence of {zk} to a solution of VIP(T, C) is guaranteed by taking ak = k = a Î (0, 1/p). This result can be found in [16], where the method was first introduced. In the absence of Lipschitz continuity (or when the Lipschitz constant exists but is not known beforehand) it has been proven in [11] that the algorithm still converges to a solution of VIP(T, C) if ak, k are determined through a certain finite bracketing procedure. A somewhat similar method has been studied in [15].

When C has nonempty interior, it is interesting to consider interior point methods, i.e. such that {wk} and {zk} are contained in the interior of C, thus doing away with the projection PC. This can be achieved if a Bregman function g with zone is available. Loosely speaking, such a g is a strictly convex function which is continuous in C, differentiable in and such that its gradient diverges at the boundary of C. With g we construct a distance Dg on C × as

and instead of the half line {z – tu : t Î +} we consider the curve {sg(z, u, t) : t Î +} where sg(z, u, t) is the solution of minwÎC{utw + (1/t) Dg(w, z)} for t > 0, or equivalently sg(z, u, t) is the solution w of

which allows us to extend the curve to t = 0. Under suitable assumptions on g (see Section 2), sg(z, u, 0) = z, the curve is uniquely defined for t Î [0, ¥) and it is fully contained in . The idea is to replace (5) and (6) by

with uk, dk, ak and k as in (3) and (4). This algorithm was proposed in [3], where it is proved that convergence of {zk} to a solution of VIP(T, C) is guaranteed if ak is chosen through a given finite bracketing procedure and k solves a certain nonlinear equation in one variable. A slight improvement is presented in [12], where it is shown that convergence is preserved when k is also found through another finite bracketing search. The main advantage of these interior point methods with Bregman functions as compared e.g. to Korpelevich's algorithm lies in the fact that no orthogonal projections onto the feasible set are needed, since all iterates belong automatically to the interior of this set. A detailed discussion of the computational effects of this advantage in the case in which C is an arbitrary polyhedron with nonempty interior can be found in Section 6 of [3]. For SPP(L, X, Y), when either X or Y is an arbitrary polyhedron with nonempty interior, the method introduced in this paper presents the same advantage over the algorithm in [13], discussed in the next paragraph, which requires orthogonal projections onto X and Y.

Going back to SPP(L, X, Y), another related method is presented in [13]. Given a point (xk, yk), a rather general perturbed vector wk = (xk, hk) is computed, and then

with Î ¶xL(xk, hk) and Î ¶yL(xk, yk). In this case no search is required for the step size k, which is given by a simple formula in terms of the gap L(xk, hkL(xk, yk). The difficulty is to some extent transferred to the selection of (xk, hk), which must satisfy certain conditions in order to ensure convergence to a solution of SPP(L, X, Y). One possibility is to adopt (5) so that xk = PX(xk–ak), hk = PY(yk + ak), with Î ¶xL(xk, yk), Î ¶yL (xk, yk). In this case the method is somewhat similar to Korpelevich's (see [16]), but not identical: in linear programming, for instance, a closed expression for ak is easily available for the method of [13] even when a Lipschitz constant for ÑL is not known beforehand. Other differences between the method in [13] with this choice of (xk, hk) and Korpelevich's are commented upon right after Example 6 in Section 3.

Besides (5), the method in [13] includes other options for the selection of (xk, hk). Taking advantage of this fact, we will combine the approaches in [3] and [13], producing an interior point method for the case of nonempty Xº× Yº.

We will consider two independent modifications upon the method in [13]. The first one to be discussed in Section 3 is the following: assuming that Bregman functions gX, gY with zones Xº, Yº respectively are available, we will use (10) instead of (11) and (12), setting

We will show that if the perturbation pair (xk, hk) satisfies conditions required in [13] and the stepsizes tk are chosen in a suitable way then convergence to a solution follows.

It is clear that the algorithm becomes practical only when sgX, sgY have explicit formulae, i.e., in view of (8), when ÑgX, ÑgY are easily invertible. Bregman functions with this property are available for the cases in which C is an orthant, a box, or certain polyhedra (in the latter case, inversion of the gradient requires solution of two linear systems) and are presented in [3]. Examples of realizations of this method are given are Section 4.

The second modification presented in Section 5 consists of an interior point procedure for the determination of (xk, hk), namely

where l, m are positive constants. Definitions (15) and (16) imply

with Î ¶xL(xk, yk), Î ¶yL(xk, hk). Equivalently with (17) and (18) we have

Differently from (13) and (14), equations (17) and (18) are implicit in xk, hk, which appear in both sides of (19) and (20). These equations cannot be in general explicitly solved, even when ÑgX, ÑgY are easily invertible. On the other hand, no search is needed for perturbation stepsizes l and m even when ÑL is not Lipschitz continuous. This choice of (xk, hk) does not satisfy the conditions in [13], so that the general convergence result for the sequence defined by (13) and (14) cannot be fully used. A special convergence proof will be given assuming that the Bregman functions are separable and the constraint sets X and Y are the nonnegative orthants.

It is immediate that (19) and (20) become explicit equations (up to the inversion of ÑgXand ÑgY) when ¶xL(·, y), ¶yL(x, ·) are constant. This happens in the case of linear programming. For this case, combining both modifications, we obtain an interior point method with explicit updating formulae both for the perturbed points and the primal and dual iterates. The curvilinear stepsize tk requires a simple search. We analyze the linear programming case in detail in Section 6, where we present also some numerical illustration of the method.

2 Bregman functions and distances

Let C Ì p be a closed and convex set and let be the interior of C. Consider g : C ® differentiable in . For w Î C and z Î , define Dg : on C × ® by (7). Dg is said to be the Bregman distance associated to g.

We say that g is a Bregman function with zone C if the following conditions hold:

B1. g is continuous and strictly convex on C.

B2. g is twice differentiable on , and in any bounded subset of the eigenvalues of Ñ2g(z) are bounded below by a strictly positive number.

B3. The level sets of Dg(w, · ) and Dg( · , z) are bounded, for all w Î C, z Î , respectively.

B4. If {zk} Ì Co converges to z*, then limk®¥Dg(z*, zk) = 0.

B5. For all v Î n there exists z Î such that Ñg(z) = v.

The definition of Bregman functions and distances originates in [2]. They have been widely used in convex optimization, e.g. [4], [5], [6], [7], [8]. Condition B2 in these references is weaker than here; only differentiability of g in is required. On the other hand, such references included an additional condition which is now implied by our stronger condition B2, namely the result of Proposition 1 below. Condition B5 is called zone coerciveness in [3]. From (7) and B1 it is immediate that for all w Î C, z Î , Dg(w, z) > 0 and Dg(w, z) = 0 if and only if w = z.

We present next some examples of Bregman functions.

Example 1. C = p, g(x) = ||x||2. Then Dg(w, z) = ||w – z||2.

Example 2. C = , g(z) = zj log zj, continuously extended to the boundary of C with the convention that 0 log 0 = 0. Then

Dg is the Kullback-Leibler information divergence, widely used in statistics.

Example 3. C = , g(z) = ( – ), with a > 1, b Î (0,1).

Examples of Bregman functions satisfying these conditions for the cases of C being a box or a polyhedron can be found in [3]. We now proceed to a result which will be employed in the convergence analysis.

Proposition 1. If g is a Bregman function with domain Cº, and {zk} and {wk} are sequences in such that {zk} is bounded and limk®¥ Dg(wk, zk) = 0, then

Proof. Let B Ì Rn be a closed ball containing {zk} such that, for any z on the boundary of B, ||zk – z|| > 1, for all k. For n Î [0,1] define zk(n) = nwk + (1 – n)zk. Let nk be the largest n Î [0,1] such that zk(n) Î B. For each k, define uk = zk(nk). Since is convex and n Î [0,1], we get that zk(nk) Î . By convexity of Dg(·, zk),

0 < Dg(uk, zk) < Dg(wk, zk).

Because Dg(wk, zk) converges to zero, so does Dg(uk, zk). We shall show that ||zk – uk|| converges to zero. Then, from the definition of B it follows that there exists k¢ such that, for all k > k¢ , uk is in the interior of B, such that nk = 1 and uk = wk; hence the assertion follows.

Define uk(w) = wuk + (1 – w)zk, for w Î [0,1]. By Taylor's Theorem,

for some w Î [0,1]. The quadratic term in (22) is equal to Dg(uk, zk), which converges to zero. Because of boundedness of {zk} and {uk}, B2 implies that Dg(uk, zk) is bounded below by q||uk – zk||2 for some q > 0 and independent of k. Therefore ||uk – zk|| converges to zero.

3 An interior point method for saddle point computation

In this section we assume that X and Y have nonempty interior and that gX and gY are Bregman functions with zones X and Y, respectively. L : n × m ® is continuous on X × Y, convex in the first variable and concave in the second one.

Following [13] we introduce perturbation sets D : X × Y ® (X), G : X × Y ® (Y). We consider the following properties of D, G.

A1. If (x, y) Î Xº × Yº and (x, h>) Î D(x, y) × G(x, y) then L(x, h) – L(x, y) > 0, with strict inequality if (x, y) is not a saddle point.

A2. If E Ì Xº × Yº is bounded then È(x, y) Î ED(x, y) and È(x, y) Î EG(x, y) are bounded.

A3. If {(u, v)} Ì X × Y converges to (u, v) as goes to ¥, xÎ D(u

, v), hÎ G(u
, v) and lim® ¥[L(u, h) – L(x, v)] = 0 then (u, v) solves SPP(L, X, Y).

We give next a few examples of sets D, G, taken from [13].

Example 4. Assume that X and Y are compact and define D(x, y) = argminx Î XL(x, y) and G(x, y) = argmaxh Î YL(x, h).

Example 5. Let l > 0, m > 0 and define D(x, y) = argmin x Î X[L(x, y) + l||x – x||2] and G(x, y) = argmaxh Î Y[L(x, h) – m||h – y||2], for x Î X, y Î Y. Due to strict convexity (concavity) of the minimand (maximand), D(x, y) (G(x, y)) is uniquely determined.

Example 6. Assume that ÑxL and ÑyL are Lipschitz continuous with constant p and let 0 < a < 1/p. Define D(x, y) = {PX(x – aÑxL(x, y))} and G(x, y) = {PY(y + aÑyL(x, y))}. It corresponds to the perturbed vector wk of Korpelevich's method ([16]) in (5). But we must point out that when we use the method in [13] (i.e. (11)-(12)) with this perturbation we obtain something quite different from Korpelevich's method, because Korpelevich's method, with T = (ÑxL, –ÑyL), would then choose = ÑxL(xk, hk) and = ÑyL(xk, hk), while the method in [13] takes = ÑxL(xk, hk) and = ÑyL(xk, yk). This choice of , is possible only in the case of a saddle point problem, where we have two variables x and y, while Korpelevich's method is devised for a general variational inequality problem, where we only have the vector wk, instead of the pair (xk, hk).

It is not difficult to prove that the choices of D and G made in Examples 4-6 satisfy conditions A1-A3.

Next we present the algorithm generating the sequence {(xk, yk)}. We denote zk = (xk, yk), C = X × Y and = Xº × Yº, and define Dg : C × ® employing

It is a matter of routine to check that g is a Bregman function with zone C. The algorithm of this section is defined as follows:

Step 0: Initialization. Choose parameters b and g, 1 > b > g > 0, and take

Step 1: Perturbation. Begin iteration with zk = (xk, yk). Choose perturbation vectors (xk, hk) Î D(xk, yk) × G(xk, yk) and define

Step 2: Stopping test. If sk = 0 then stop.

Step 3: Direction. Choose Î ¶xL(xk, hk), and Î ¶yL(xk, yk), and denote dk = (, – )t.

Step 4: Stepsize. For t > 0 define functions

and

Search for a step size tk > 0 satisfying

Step 5: Update. Set

increment k by one and return to Step 1.

Theorem 1 below shows that fk(t) is a concave function with fk(0) = 0 and (0) = sk > 0. For iteration k and a potential stepsize t, this function provides a lower bound for the decrease in the Bregman distance to a solution. Thus the left inequality in (28) aims to guarantee that such a distance is suitably decreased while the right inequality aims to bound the stepsize from below. To shorten the search for an acceptable stepsize, one may begin with the stepsize employed in the preceding iteration. Also, we will show that the stopping criterion of Step 2 is appropriate, in the sense that when termination occurs zk is a solution of the problem. The convergence properties of the algorithm are formalized as follows.

Theorem 1. Assume that SPP(L, X, Y) has solutions. Let g be a Bregman function with zone and let {zk} be a sequence generated by the algorithm of this section, where xkÎ D(zk) and hkÎ G(zk), with D and G satisfying A1 and A2. If the algorithm stops at iteration k, then zk solves SPP(L, X, Y). If the sequence {zk} is infinite, then

i) zk is well defined and belongs to Cº, for all k,

ii) for any solution z* of SPP(L, X, Y), the Bregman distance Dg(z*, zk) is monotonically decreasing,

iii) {zk} is bounded,

iv) limk® ¥sk = 0;

v) limk® ¥||zk+1 – zk|| = 0;

vi) if additionally A3 is satisfied, then the sequence {zk} converges to a solution of SPP(L, X, Y).

Proof. We prove first (i) by induction. z0 belongs to by (24). Assuming that zk belongs to , note that (26) is equivalent to

so that by B1 and B5, z(t) is uniquely defined by (30) and belongs to for all t > 0. By (29), zk+1 belongs to and (i) holds. Now we consider the case of finite termination, i.e. when, according to Step 2, sk = 0. In such a case, since zk belongs to by (i), it follows from A1 that zk is a saddle point. From now on we assume that {zk} is infinite. We proceed to prove (ii). Since z(t) Î for all t > 0, using (7) and (30) we get

By convexity/concavity of L, since (x*, y*) is a saddle point,

Combining (27), (31) and (32) yields

Differentiating (30) yields

so that the first two derivatives of fk(t) in (27) are

and

with jk defined as

Since z(0) = zk, fk(0) = 0. By A1 and (34), (0) = sk > 0, in view of the stopping criterion, and so (32) implies dk ¹ 0. Consequently by B2, (35) and (36), (t) < 0. Therefore fk(t) is a concave function of t Î [0, ¥) with a positive slope at t = 0. Inequality (33) implies that fk(t) is bounded above. Hence there exists a stepsize tk satisfying (28), so that zk+1 is well defined, and therefore item (i) follows from (30). Furthermore, for some gkÎ [g, b], fk (tk) = gktksk which together with (33) implies item (ii). Denote by Bº the set {z Î : Dg(z*, z) < Dg(z*, z0)}. By B3, Bº is bounded. Item (ii) implies that zk+1Î Bº if zk Î Bº. Therefore, item (iii) follows then by induction from (24).

Since {zk} is bounded, we get from A2 that {(xk, hk)} is bounded. The operator (¶xL, –¶yL) is maximal monotone, therefore bounded over bounded sets contained in the interior of its domain, which in this case is n × m (e.g. [18]). It follows that {dk} is bounded.

By Taylor's theorem and (28), for some Î [0, tk], fk(tk) = tksk – jk([) < btksk. Consequently, and because B2, A2 and item (iii) imply jk() < q, for some q > 0, we have

By (28) and (33),

By (38), limk ® ¥tksk = 0, and then, multiplying both sides of (37) by sk, we conclude that limk®¥sk = 0, i.e. (iv) holds. From (27) and (28) we have 0 < Dg(zk, zk+1) < tksk, where the right side converges to zero. Therefore, item (v) follows from item (iii) and Proposition 1. By item (iii) the sequence {zk} has an accumulation point . By item (iv) and A3, solves SPP(L, X, Y). By item (ii) the Bregman distance Dg(, zk) is non-increasing. If {zjk} is a subsequence of {zk} which converges to , by B4 limk®¥Dg(, zjk) = 0. Since {Dg(, zk)} is a nondecreasing and nonnegative sequence with a subsequence which converges to 0, we conclude that the whole sequence converges to 0. Now we apply Proposition 1 with wk = , and conclude that limk®¥ zk = .

4 Particular realizations of the algorithm

In the unrestricted case (X = n, Y = m) we can take g as in Example 1 so that

In this case, with b = g = 0.5, we may choose tk to maximize fk(t) yielding

which is a special case of the method considered in [13].

Next, consider the case in which X, Y are the nonnegative orthants. For g as in Example 2 we have

In this case, a search is needed for determining the stepsize tk in Step 3 of the algorithm. For further illustration, see the case of linear programming in Section 6.

5 An interior point procedure for perturbation

In this section we analyze a selection rule for specifying xk and hk which extends the proposal of Example 6 in Section 3. Let gX and gY be Bregman functions with zones and , respectively. Take l > 0 and m > 0, and for (x, y) Î Xº × Yº define

We shall restrict the discussion to the following special case of SPP(L, X, Y): X = , Y = , L twice continuously differentiable on n × m and g separable, i.e.

We will show that the perturbation sets defined by (43)-(44) satisfy A1 and A2. A3 does not hold in general for this perturbation, but we will establish convergence of the method under a nondegeneracy assumption on the problem, and, in the case of linear programming, without such assumption. Our proof can be extended to nondifferentiable L and to box, rather than positivity, constraints, at the cost of some minor technical complications.

Proposition 2. Assume that L is continuously differentiable, X × Y =

×
, and that g is a separable Bregman functions with zone Xº× Yº. Consider (x, y) Î Xº× Yº and (x, h) as in (43)-(44). Then

i) (x, h) is uniquely determined and (x, h) Î Xº× Yº;

ii) if (x, y) is not a saddle point, then L(x, h) – L(x, y) > 0;

iii) if B Ì Xº× Yº is nonempty and bounded, then the set {(x(x, y), h(x, y)):(x, y) Î B} is bounded.

Proof. Let f(u) = L(u, y) + 1/lDgX(u, x). Take w = ÑxL(x, y) and define (u) = L(x, y) + wt(u – x) + 1/lDgX(u, x). By convexity of L(·, y) we have

for all u Î X. Consider the unrestricted minimization problem min (u), whose optimality conditions are ÑgX(u) = Ñg(x) – lw. By B5, this equation in u has a unique solution which belongs to . It follows easily from (7) that DgX(·, x) is strictly convex, and the same holds for , so that minimizes in n. A strictly convex function which attains its minimum has bounded level sets, and it follows from (45) that f also has bounded level sets (see [19, Corollary 8.7.1]), and therefore it attains its minimum in X. Being the sum of a convex function and a strictly convex one, f is strictly convex and so the minimizer is unique, by convexity of X. The fact that this minimizer belongs to follows from B5 and has been established in [10, Theorem 4.1]. By definition of f, this minimizer is x(x, y). A similar proof holds for h(x, y) so that the proof of item (i) is complete.

For item (ii), assume that (x, y) is not a saddle point. Then ÑxL(x, y) ¹ 0 or ÑyL(x, y) ¹ 0. Assume that the former case holds; the latter one can be analyzed similarly. Observe that

so that Ñf(x) = ÑxL(x, y) ¹ 0. Hence x is not a solution of (43) and consequently

This implies L(x, y) – L(x, y) > 0. Similarly, L(x, h) – L(x, y) > 0 and therefore L(x, h) – L(x, y) > 0. This completes the proof of item (ii).

To prove item (iii), we will prove boundedness of {x(x, y) : (x, y) Î B}. A similar argument holds for h(x, y). The proof relies on successive relaxations for optimization problems, whose optimal objective function values are upper bounds for ||x||2. Since Ñf(x) = 0 by (46), x is unique by item (i), and we have

||x||2 = max{||u||2| u > 0, ÑgX(u) – ÑgX(x) = –lÑxL(u, y)}.

Note that

using the facts that utxL(u, y) – ÑxL(0, y)) > 0 and gX(u) – gX(0) < utÑgX(u), which result from convexity of gX and L(·, y), in the last inclusion of (47).

Let now r = maxj{sup(x, y) Î B[ (xj) – lÑxL(0, y)j]}. We claim that r < ¥. It suffices to show that sup(x, y) Î B[(xj)–lÑxL(0, y)j] < ¥ for all j, which follows from the facts that ÑxL(0, y) is continuous as a function of y, is continuous in ++, limt ® 0 (t) = –¥, because of B5, and B is bounded. The claim is established.

It follows from (47) that

where st = (1, 1,¼, 1). We claim that the set {u > 0|gX(u) – rstu < gX(0)} is bounded. Note that this set is a level set of the convex function X(u) = gX(u) – rstu and that ÑX(u) = ÑgX(u) – rs. By B5, there exists z > 0 such that ÑgX(z) = rs, i.e. ÑX(z) = 0, so that z is an unrestricted minimizer of X. Since Ñ2

X = Ñ2gX, we get from B2 that Ñ2
X(u) is positive definite for all u > 0, and hence X is strictly convex and z is its unique minimizer. Thus, the level set of X corresponding to the value X(z) is the singleton {z}. It is well known that if a convex function has a bounded level set, then all its nonempty level sets are bounded, which establishes the claim. It follows that

where the last inequality in (48) results from boundedness of the feasible set and continuity of the objective of the corresponding optimization problem.

We use now (43) and (44) to define perturbation sets D and G consisting of single elements x and h, respectively. In view of Proposition 2, this rule of perturbation satisfies conditions A1 and A2. However, the following example demonstrates that A3 may not hold.

Take m = n = 1, l = m = 1, g as in Example 2 and L(x, y) = (x – 1)2 – (y – 1)2, whose only saddle point is (1,1). In this case the optimality conditions for (43)-(44) become

Take a sequence {(xk, yk)} convergent to (0, 0). Then the right hand sides of (49)-(50) diverge to –¥, implying that logxk and loghk also diverge to –¥; i.e. {(xk, hk)} also converges to (0, 0). It follows that

but (0, 0) is not a saddle point of L.

This example shows that the convergence argument of Theorem 1 is no longer valid for our algorithm with perturbation sets chosen according to (43)-(44). Instead, we provide another convergence argument, which can be more easily formulated in terms of variational inequalities and nonlinear complementarity problems. We need two results on these problems. The first one is well known, while the second one is, to our knowledge, new, and of some interest on its own.

Proposition 3.

i) If T : p ® p is monotone and continuous, and C Ì p is closed and convex, then the solution set of VIP(T, C), when nonempty, is closed and convex.

ii) If C =

then VIP(T, C) becomes the nonlinear complementarity problem NCP(T), consisting of finding z Î p such that

Proof. Elementary (see, e.g. [9]).

Corollary 1. If X × Y =

×
then (, ) is a solution of SPP(L, X, Y) if and only if

Proof. Follows from Proposition 3 with T = (ÑxL, – ÑyL).

Before presenting our new result, we need some notation.

Definition. We say that NCP(T) satisfies the strict complementarity assumption (SCA from now on) if for all solutions of NCP(T) and for all j between 1 and p it holds that either j > 0 or T()j > 0.

For z Î p, let J(z) = {j Î {1,..., p} : zj = 0} and I(z) = {1,..., p}\J(z).

Proposition 4. Assume that T is monotone and continuous and that NCP(T) satisfies SCA. If Î p satisfies (52) and (53), and J() Ì J(z*) for some solution z* of NCP(T), then

solves NCP(T).

Proof. Let z(a) = + a(z* – ). We will prove that z(a) satisfies (52)-(53) for all a Î [0,1], and also (51) for a close to 1; then we will show that if (51) is violated for some a Î [0,1] then SCA will be violated too. In order to prove that z(a) satisfies (53) for all a Î [0,1] we must make a detour. Let U = {z Î : zj = 0 for all j Î J()}. A vector solves VIP(T, U) if and only if Î U and

for all z Î U. Note that Î U. Since J() Ì J(z*), z* also belongs to U. We claim that both and z* satisfy (59). This is immediate for z* which satisfies (59) for all z Î , since it solves NCP(T), equivalent, by Proposition 3(ii), to VIP(T, ). Regarding , we have j = zj = 0 for all z Î U and all j Î J(), so that terms with j Î J() in the summation of (59) vanish. For j Ï J(), we have T()j = 0 because satisfies (53), so that terms with j Ï J() in the summation of (59) also vanish. Therefore both and z* solve VIP(T, U), and so z(a) solves VIP(T, U) for all a Î [0, 1] by Proposition 3(i). In particular z(a) satisfies (52) for all a Î [0, 1]. We look now at (53). Since z(a) Î U, we have z(a)j = 0 for j Î J(), i.e. (53) holds for such j's. If j Î I(), we take z1 and z2 defined as = = 0 for ¹ j, = 2z(a)j, = (1/2)z(a)j, so that z1 and z2 belong to U. Looking at (59) with = z(a), z = z1 or z2 we get T(z(a))jz(a)j> 0, -(1/2)T(z(a))jz(a)j > 0, so that z(a) satisfies (53) for all a Î [0,1]. Finally we look at (51). For j Î I(), we have z(a)j > 0 for a Î [0, 1), and therefore, since z(a) satisfies (53), T(z(a))j = 0, i.e. (51) holds for j Î I(), a Î [0, 1). If j Î J() then j Î J(z*), i.e. = 0. By SCA, 0 < T(z*)j = T(z(1))j. By continuity of T, T(z(a))j > 0 for j Î J() and a close to 1, i.e., defining A = {a Î [0, 1) : T(z(a))j > 0 for all j Î J()}, we get that A ¹ Æ. It follows that for a Î A, z(a) satisfies (51)-(53), i.e. it solves NCP(T). Let = inf A. By continuity of T, z() also solves NCP(T). We claim now that = 0. If ¹ 0, then, by the infimum property of , there exists j Î J() such that T(z())j = 0, and also z()j = 0 because z() Î U, in which case SCA is violated at z(). The claim is established and therefore = 0 and = z(0) solves NCP(T).

We will also use the following elementary result for bilinear L.

Proposition 5. For a bilinear function L(x, y) and for any (x, y) and (x¢, y¢) in n × m, it holds that

xL(x¢, y¢) – ÑxL(x, y))(x¢ – x) – (ÑyL(x¢, y¢) – ÑyL(x, y))(y¢ – y) = 0.

Proof. Elementary.

We will consider SCA for SPP(L, , ), which is just NCP(T) with T = (ÑxL, –ÑyL). It is clear that in this case SCA means that, for all solutions (x, y) of SPP(L, , ), xj = 0 implies ¶L(x, y)/¶xj > 0 and yi = 0 implies ¶L(x, y)/¶yi < 0.

Theorem 2. Assume that SPP(L, X, Y) has solutions, L is continuously differentiable, X × Y =

×
, and g is a separable Bregman function with zone
×
, satisfying B1-B5. Consider the algorithm of Section 3 with the perturbation vectors xk and hk defined according to (43) and (44). If either SCA is satisfied or L is bilinear then the sequence {(xk, yk)} generated by the algorithm converges to a solution (, ) of SPP(L, X, Y).

Proof. Proposition 2 implies that our choice of (xk, hk) satisfies A1 and A2, so that the results of Theorem 1(i)-(v) hold. Therefore {(xk, yk)} has an accumulation point = (, ) Î X × Y, i.e. (54) holds for (, ). We prove next that (, ) satisfies also (55) and (57). Under SCA, we then show that (, ) solves SPP(L, X, Y). For the bilinear case we show directly that {zk} converges to (, ).

First we show that ||xk – xk|| and ||hk – yk|| converge to zero. Using (7), (19) and (20), we have

Add (60)-(61) to get 0

By Theorem 1(iv), the rightmost expression in (62) tends to 0 as k goes to ¥ so that

Since {(xk, yk)} is bounded, by Proposition 1 and (63),

and

We check now that (55) holds at (, ). In the case of separable gX and differentiable L, (43) with (x, y) = (xk, yk) and x = xk implies

Consider now j such that j > 0. Let {}, {} be subsequences of {}, {} respectively such that limk®¥ = limk®¥ = j > 0. Since gj and L are continuously differentiable for xj > 0, we can take limits in (66) along the subsequence and obtain

Hence (55) follows. A similar argument shows that (57) holds. In terms of NCP(T), we have proved that = (, ) satisfies (52) and (53). It remains to be proved that satisfies (51), i.e. that (, ) satisfies (56) and (58).

From now on we will consider separately the case of general L under SCA, for which we will use Proposition 4, and the case of bilinear L, which will result from Proposition 5.

Consider first the case in which SCA holds. In view of Proposition 4, it is enough to check that J() Ì J(z*) for some solution z* of SPP(L, X, Y). Take any solution z* of SPP(L, X, Y). If the result does not hold, there exists j such that j = 0 and > 0. By Theorem 1(ii)

Taking limits in (68) along a subsequence {} converging to , we get

Since > 0 and limk®¥ = j = 0, (69) implies that limt®0+(t) > –¥. On the other hand, B5 in the case of separable g means that (0, ¥) = (–¥, +¥), which in turn implies, since is increasing by B1, that limt®0+(t) = –¥ . This contradiction establishes that J() Ì J(z*). Thus, all the hypotheses of Proposition 4 hold for and so solves SPP(L, X, Y).

Now we proceed exactly as at the end of the proof of Theorem 1. Since is a solution, {Dg(, zk)} is nonincreasing by Theorem 1(ii). By B4 this sequence has a subsequence (corresponding to the subsequence of {zk} which converges to ) which converges to 0, so that the whole sequence converges to 0, and therefore limk®¥ zk = by Proposition 1.

Finally, consider a bilinear L, i.e., L(x, y) = ct x + bt y – yt Ax with c Î n, b Î m and A Î m×n. In this case we will invert the procedure of the previous case: we will establish first convergence of the whole sequence to a unique cluster point and then use this uniqueness to prove optimality. In this case L is the Lagrangian of the linear programming problem minx{ctx|Ax > b, x > 0}. It is well known that the saddle points of L and the optimal primal-dual pairs (x, y) of the linear programming problem coincide. Denote the gradient components of L by

dx(y) = ÑxL(x, y) = c – Aty, dy(x) = ÑyL(x, y) = b –Ax,

and d(z) = (dx(y), –dy(x))t with z = (x, y).

Let I0 be the set of indices such that for j Î I0 it holds that zj = 0 for all saddle points z. If a saddle point exists, Tucker [20] shows that for all j Î I0 there exists a saddle point z such that dj(z) > 0. Now for each index j Î {1,... , m + n}, we pick up a saddle point j in the following way: if j belongs to I0, we choose, using Tucker's result, j so that dj(j) > 0. If j does not belong to I0, we choose, using the definition of I0, j so that > 0. Let z* be a convex combination of 1,... , m+n with strictly positive coefficients. It follows that z* = (x*, y*) is a saddle point with the following additional property:

= 0 Þ dj(z*) > 0

(note that > 0 for j Ï I0). From (27), (32) and (37) we obtain

where wk = (xk, hk). For a bilinear L,

and

so that

and, by Propositions 5 and 3,

–(dk)t(z* – wk) = –d(z*)t(z*– wk) = d(z*)twk.

Combining the equations above we obtain

By Theorem 1(iii) {zk} is bounded, which implies easily that < ¥. Consequently, by nonnegativity of all terms in (73),

for all j such that dj(z*) > 0. Let be a cluster point of {zk}. Next, we use (74) to show that the absolute values of increments of the sequence {Dg(, zk)} have a finite sum.

As in (73), using (27) and (37), we obtain

so that

Consider j such that dj() ¹ 0. Since we have already proved that satisfies (54), (55) and (57), we get j = 0. {Dg(z*, zk)} is bounded by Theorem 1(ii), so that {D(, )} is bounded. By B5 in the one dimensional case, limt®0+(t) = –¥. Since there exists a subsequence of {} which converges to j = 0, boundedness of {D(, )} implies that = 0. It follows from the special choice of z* that dj(z*) > 0. It follows from (74) that < ¥. By (38), tksk < ¥. Hence we have proved that the rightmost expression of (75) is summable in k, and therefore {Dg(, zk)} is a Cauchy sequence which converges to 0 by B4. Hence limk®¥ zk = by Proposition 1 with wk = .

We have established that the whole sequence {zk} converges to a unique point . In view of Corollary 1, since we have shown above that satisfies (54), (55) and (57), it only remains to prove that = (, ) satisfies (56) and (58). We start with (56). Take j such that j = 0. Suppose that (56) does not hold; i.e. ÑxL(, )j < 0. By (64), limk®¥(xk, hk) = (, ) so that by continuous differentiability of L there exists such that ÑxL(xk, hk)j < 0 for all k > . In the separable case (30) becomes

and Step 3, under the differentiability assumption, asserts that = ÑxL(xk, hk) so that ()j is negative for k > . Since tk > 0 it follows from (76) that (xk+1) > (xk), and, since gj is strictly increasing by B2, we get > , for all k > . Therefore, we have j = limk®¥ > > 0. This is a contradiction, so that (56) holds for j. The proof of (58) is virtually identical.

6 An application to linear programming

In this section we present the iterative formulae of the algorithm in the case of linear programming problems, with xk and hk chosen as in Section 5, and with g chosen as in Example 2. Consider the problem in the form

with c Î n, b Î m and A Î m×n, so that the Lagrangian is L(x, y) = ctx + bty – yt Ax and the constraint sets are X = , Y = .

The Bregman functions are defined as

Note that in this case the perturbation vectors x and h defined by (43) and (44) can be explicitly solved, and, given the stepsize tk, the updated primal and dual solutions can be stated explicitly:

With zk = (xk, yk), employing fk(t) in (42) and

sk = ct(xk – xk) + bt(hk – yk) – (hk)t Axk + (yk)t Axk,

the stepsize tk satisfying (28) is obtained via Armijo search.

Based on our method, an experimental code called Bregman was written. It is not intended to fully explore the computational capabilities of our method, or its competitiveness with other algorithms, but just its plausibility, showing that its performance is compatible with other methods. Thus we chose to test on a well known and easily accessible collection of LP problems, though we don't claim that our method will be competitive with algorithms specific for LP; in fact, our article is intended to address primarily nonlinear saddle point problems.

Computational results are compared with those obtained from the saddle point method of [13]. The latter code is called Saddle and it employs the following iterative scheme:

with tk proportional to sk.

As is the case with Saddle, scaling is crucial for efficiency of Bregman. Our experience indicates that static (initial) data scaling via equilibration, which traditionally have been employed in LP codes, is not helpful for Bregman. That is why the dynamic scaling procedure in Saddle adopted from Kallio and Salo [14] was further developed for Bregman. In Saddle, auxiliary reference quantity and value vectors k = () Î Rm and dk = () Î Rn are defined in each iteration so that

where and denote primal and dual solutions at the beginning of the iteration k. If either or is smaller than certain safeguard values, then the safeguard value is used instead of the reference quantity smaller than it. These safeguard values are equal to one tenth of the average of and over all variables. The scaling heuristic of Saddle employs two phases as follows. In phase one, each row i of A is divided by and each column j by . Let A¢ = () be the resulting matrix. Let i be the mean of nonzero values || in row i and let j be the corresponding column average, for each column j. Then, in the second phase of scaling, each element is divided by the geometric mean of i and j.

This procedure defines row and column scaling factors for rows and columns in A as well as for rows of b and columns of ct. Primal and dual variables are scaled by the inverses of these factors. Formulae (80)-(83) are applied in the scaled space. For Bregman, the scaling factors of Saddle are adjusted in such a way, that the update direction at (xk, yk) > 0 (obtained by differentiating (82)-(83) with respect to tk, at tk = 0 ) is the same as the update direction of Saddle obtained from (84)-(85). For example, if Di is the scaling factor of row i obtained for Saddle and we denote by Gi the scaling factor for that row to be applied in Bregman, then (83) yields = exp [tkGi(b – Axk)i] and (85) yields = max[0, + tk(b – Axk)i]. Differentiating these at (xk, yk) > 0 with respect to tk, and evaluating the derivatives at tk = 0, yields directional components, which we require to be equal. This implies that Gi = /.

As a termination test for Bregman and Saddle, we use V(xk, yk) < f|ctxk|, where

is an error measure, and f > 0 is a stopping parameter. If (b – Axk)i > 0, then |(b – Axk)i| is the value (in units of the objective function) of the primal error; otherwise |(b – Axk)i| is the value of complementarity violation. Terms |(c – ATyk)j| have similar interpretation, so that V(xk, yk) is interpreted as the total value of primal, dual and complementarity violations. Our experience indicates that, if f = 10–k, then we may expect k + 1 significant digits in the optimal objective function value. Using our termination test, final infeasibilities are usually small, although they do not enter the stopping criterion directly (see Table 4 below).

For computational illustration, Bregman was tested against Saddle on ten problems from the Netlib library [17]. For the purpose of this comparison, each problem was first cast in the form given in (77)-(79). Problem names and dimensions are given in Table 1.

All primal and dual variables are set to one initially. We set the perturbation step size parameters in (43)-(44) to l = m = 0.5, the stepsize test parameters in (28) to g = 0.3 and b = 0.7. Two values for the stopping parameter were applied: f = 10–4 and f = 10–6.

Table 2 shows the iteration count for f = 10–4 and f = 10–6, both for Bregman and Saddle. Table 3 shows the corresponding relative objective errors |ct – ctx*|/|ctx*|, where is the primal solution obtained at the end of iterations and x* is a true optimal solution. We observe that the iteration counts for both methods are of the same order of magnitude. Also the precision in both cases is approximately the same (five and seven significant digits as predicted for f = 10–4 and f = 10–6, respectively). Due to function evaluations (for the exponent functions) and due to the search for step sizes, work per iteration for Bregman is somewhat larger than for Saddle. The computational tests show that the performance of Bregman appears quite equivalent to Saddle. One may speculate that a better scaling method for Bregman may improve its performance.

For each iteration k, define relative primal and dual infeasibilities, denoted by and , respectively, as follows:

Table 4 reports the final infeasibilities in terms of max{, } for Bregman and Saddle, with f = 10–6.

To see the importance of scaling, we ran Bregman for a dozen of small Netlib problems without scaling for f = 10–4. Five out our twelve test problems failed to converge in one million iterations, while for the others the number of iterations was increased by a factor ranging from 3 to 22 as compared with the scaled version.

Our convergence results apply if the scaling factors are updated during a prespecified number of iterations only. To demonstrate that it pays off to update scaling until the end, we run our smallest problem sctap1 fixing scaling after 5000 iterations. As a result, the number of iterations is 26088, for f = 10–4, and 63631, for f = 10–6. Hence, the iteration count almost doubles as compared with results for sctap1 in Table 2. In conclusion, our theoretical results apply if scaling is fixed after a given number of iterations; however, it is plainly more efficient to update scaling factors until the termination criterion is met.

Finally, it is important to note that for small problems both of these approaches often turn out to be inefficient. Their comparative advantage is expected to appear for large-scale problems, and in particular, our method may be of interest for massively parallel computing.

Received:13/V/03. Accepted: 09/I/04.

#581/03.

  • [1] D. Bertsekas and J.N. Tsitsiklis, Parallel and Distributed Computation: Numerical Methods Prentice Hall, New Jersey (1989).
  • [2] L.M. Bregman, The relaxation method of finding the common points of convex sets and its application to the solution of problems in convex programming. USSR Computational Mathematics and Mathematical Physics, 7 (3) (1967), 200-217.
  • [3] Y. Censor, A.N. Iusem and S.A. Zenios, An interior point method with Bregman functions for the variational inequality problem with paramonotone operators. Mathematical Programming, 81 (1998), 373-400.
  • [4] Y. Censor and A. Lent, An iterative row-action method for interval convex programming. Journal of Optimization Theory and Applications, 34 (1981), 321-353.
  • [5] Y. Censor and S.A. Zenios, The proximal minimization algorithm with D-functions. Journal of Optimization Theory and Applications, 73 (1992), 451-464.
  • [6] G. Chen and M. Teboulle, Convergence analysis of a proximal-like optimization algorithm using Bregman functions. SIAM Journal on Optimization, 3 (1993) 538-543.
  • [7] A.R. De Pierro and A.N. Iusem, A relaxed version of Bregman's method for convex programming. Journal of Optimization Theory and Applications, 51 (1986), 421-440.
  • [8] J. Eckstein, Nonlinear proximal point algorithms using Bregman functions, with applications to convex programming. Mathematics of Operations Research, 18 (1993), 202-226.
  • [9] P.T. Harker and J.S. Pang, Finite dimensional variational inequalities and nonlinear complementarity problems: a survey of theory, algorithms and applications. Mathematical Programming, 48 (1990), 161-220.
  • [10] A.N. Iusem, On some properties of generalized proximal point methods for quadratic and linear programming. Journal of Optimization Theory and Applications, 85 (1995), 593-612.
  • [11] A.N. Iusem, An iterative algorithm for the variational inequality problem. Computational and Applied Mathematics, 13 (1994), 103-114.
  • [12] A.N. Iusem, An interior point method for the nonlinear complementarity problem. Applied Numerical Mathematics, 24 (1997), 469-482.
  • [13] M. Kallio and A. Ruszczynski, Perturbation methods for saddle point computation. Working Paper 94-38, Institute for Applied Systems Analysis, Laxenburg, Austria (1994).
  • [14] M. Kallio and S. Salo, Tatonnement procedures for linearly constrained convex optimization. Management Science, 40 (1994), 788-797.
  • [15] E.N. Khobotov, Modifications of the extragradient method for solving variational inequalities and certain optimization problems. USSR Computational Mathematics and Mathematical Physics, 27 (5) (1987), 120-127.
  • [16] G.M. Korpelevich, The extragradient method for finding saddle points and other problems. Ekonomika i Matematcheskie Metody, 12 (1976), 747-756.
  • [17]LP Netlib, Test Problems, Bell Laboratories.
  • [18] R.T. Rockafellar, Local boundedness of nonlinear monotone operators. Michigan Mathematical Journal, 16 (1969), 397-407.
  • [19] R.T. Rockafellar, Convex Analysis Princeton University Press, New Jersey (1970).
  • [20] A.W. Tucker, Dual systems of homogeneous linear relations. In Linear Inequalities and Related Systems (H.W. Kuhn and A.W. Tucker, editors). Annals of Mathematics Studies , Princeton University Press, Princeton 38 (1956), 3-18.
  • *
    Research of this author was partially supported by CNPq grant No. 301280/86.
  • Publication Dates

    • Publication in this collection
      26 Nov 2004
    • Date of issue
      2004

    History

    • Accepted
      09 Jan 2004
    • Received
      13 May 2003
    Sociedade Brasileira de Matemática Aplicada e Computacional Sociedade Brasileira de Matemática Aplicada e Computacional - SBMAC, Rua Maestro João Seppe, nº. 900 , 16º. andar - Sala 163, 13561-120 São Carlos - SP Brasil, Tel./Fax: 55 16 3412-9752 - São Carlos - SP - Brazil
    E-mail: sbmac@sbmac.org.br