An interior point method for constrained saddle point problems

Iusem, Alfredo N.; Kallio, Markku

doi:10.1590/S0101-82052004000100001

Abstract

We present an algorithm for the constrained saddle point problem with a convex-concave function L and convex sets with nonempty interior. The method consists of moving away from the current iterate by choosing certain perturbed vectors. The values of gradients of L at these vectors provide an appropriate direction. Bregman functions allow us to define a curve which starts at the current iterate with this direction, and is fully contained in the interior of the feasible set. The next iterate is obtained by moving along such a curve with a certain step size. We establish convergence to a solution with minimal conditions upon the function L, weaker than Lipschitz continuity of the gradient of L, for instance, and including cases where the solution needs not be unique. We also consider the case in which the perturbed vectors are on certain specific curves starting at the current iterate, in which case another convergence proof is provided. In the case of linear programming, we obtain a family of interior point methods where all the iterates and perturbed vectors are computed with very simple formulae, without factorization of matrices or solution of linear systems, which makes the method attractive for very large and sparse matrices. The method may be of interest for massively parallel computing. Numerical examples for the linear programming case are given.

saddle point problems; variational inequalities; linear programming; interior point methods; Bregman distances

An interior point method for constrained saddle point problems

Alfredo N. IusemI, ^* * Research of this author was partially supported by CNPq grant No. 301280/86. ; Markku Kallio^II

^IInstituto de Matemática Pura e Aplicada, Estrada Dona Castorina, 110 22460, Rio de Janeiro, RJ, Brazil

^IIHelsinki School of Economics, Runeberginkatu 14-16, 00100 Helsinki, Finland E-mail: iusp@impa.br/ kallio@hkkk.fi

ABSTRACT

We present an algorithm for the constrained saddle point problem with a convex-concave function L and convex sets with nonempty interior. The method consists of moving away from the current iterate by choosing certain perturbed vectors. The values of gradients of L at these vectors provide an appropriate direction. Bregman functions allow us to define a curve which starts at the current iterate with this direction, and is fully contained in the interior of the feasible set. The next iterate is obtained by moving along such a curve with a certain step size. We establish convergence to a solution with minimal conditions upon the function L, weaker than Lipschitz continuity of the gradient of L, for instance, and including cases where the solution needs not be unique. We also consider the case in which the perturbed vectors are on certain specific curves starting at the current iterate, in which case another convergence proof is provided. In the case of linear programming, we obtain a family of interior point methods where all the iterates and perturbed vectors are computed with very simple formulae, without factorization of matrices or solution of linear systems, which makes the method attractive for very large and sparse matrices. The method may be of interest for massively parallel computing. Numerical examples for the linear programming case are given.

Mathematical subject classification: 90C25, 90C30.

Key words: saddle point problems, variational inequalities, linear programming, interior point methods, Bregman distances.

1 Introduction

In this paper, we discuss methods for solving constrained saddle point problems. Given closed convex sets X, Y contained in ⁿ, ^m respectively, and a function L: X × Y ® , convex in the first variable and concave in the second one, the saddle point problem SPP(L,X,Y) consists of finding (x^*,y^*) Î X × Y such that

for all (x, y) Î X × Y. SPP(L, X, Y) is a particular case of the variational inequality problem, which we describe next. Given a closed convex set C Ì ^p and a maximal monotone operator T : ^p ® (^p), VIP(T, C) consists of finding z Î C such that there exists u Î T(z) satisfying

for all z¢ Î C. If ¶_xL, ¶_yL denote the subdifferential and superdifferential of L in each variable (i.e. the sets of subgradients and supergradients) respectively, and we define T : ⁿ× ^m ® ( ⁿ× ^m) as T = (¶_xL, ¶_yL) then VIP(T, C) coincides with SPP(X, Y) if we take p = m + n and C = X × Y. It is easy to check that this T is maximal monotone.

It is well known that when T = ¶f for a convex function f(x) then VIP(T, C) reduces to min f(x) s.t. x Î C. This suggests the extension of optimization methods to variational inequalities. In the unconstrained case C = ⁿ, for which VIP(T, C) consists just of finding a zero of T, i.e. a z^* such that 0 Î T(z^*), one could attempt the natural extension of the steepest descent method, i.e.

z^k⁺¹ = z^k au^k

with u^k Î T(z^k) and a > 0. This simple approach works only under very restrictive assumptions on T (strong monotonicity and Lipschitz continuity, see [1]). An alternative one, which works under weaker assumptions, is the following: move in a direction contained in T(z^k) finding an auxiliary (or perturbed) point w^k and then move from z^k in a direction contained in -T(w^k), i.e.

with u^k Î T(z^k), d^k Î T(w^k), and a_k, k > 0. In the constrained case, one must project onto C in order to preserve feasibility, i.e., one takes

where P_C is the orthogonal projection onto C. If T is Lipschitz continuous with constant p, then convergence of {z^k} to a solution of VIP(T, C) is guaranteed by taking a_k = _k = a Î (0, 1/p). This result can be found in [16], where the method was first introduced. In the absence of Lipschitz continuity (or when the Lipschitz constant exists but is not known beforehand) it has been proven in [11] that the algorithm still converges to a solution of VIP(T, C) if a_k, _k are determined through a certain finite bracketing procedure. A somewhat similar method has been studied in [15].

When C has nonempty interior, it is interesting to consider interior point methods, i.e. such that {w^k} and {z^k} are contained in the interior Cº of C, thus doing away with the projection P_C. This can be achieved if a Bregman function g with zone Cº is available. Loosely speaking, such a g is a strictly convex function which is continuous in C, differentiable in Cº and such that its gradient diverges at the boundary of C. With g we construct a distance D_g on C × Cº as

and instead of the half line {z tu : t Î ₊} we consider the curve {s_g(z, u, t) : t Î ₊} where s_g(z, u, t) is the solution of min_wÎC{u^tw + (1/t) D_g(w, z)} for t > 0, or equivalently s_g(z, u, t) is the solution w of

which allows us to extend the curve to t = 0. Under suitable assumptions on g (see Section 2), s_g(z, u, 0) = z, the curve is uniquely defined for t Î [0, ¥) and it is fully contained in Cº. The idea is to replace (5) and (6) by

with u^k, d^k, a_k and _k as in (3) and (4). This algorithm was proposed in [3], where it is proved that convergence of {z^k} to a solution of VIP(T, C) is guaranteed if a_k is chosen through a given finite bracketing procedure and _k solves a certain nonlinear equation in one variable. A slight improvement is presented in [12], where it is shown that convergence is preserved when k is also found through another finite bracketing search. The main advantage of these interior point methods with Bregman functions as compared e.g. to Korpelevich's algorithm lies in the fact that no orthogonal projections onto the feasible set are needed, since all iterates belong automatically to the interior of this set. A detailed discussion of the computational effects of this advantage in the case in which C is an arbitrary polyhedron with nonempty interior can be found in Section 6 of [3]. For SPP(L, X, Y), when either X or Y is an arbitrary polyhedron with nonempty interior, the method introduced in this paper presents the same advantage over the algorithm in [13], discussed in the next paragraph, which requires orthogonal projections onto X and Y.

Going back to SPP(L, X, Y), another related method is presented in [13]. Given a point (x^k, y^k), a rather general perturbed vector w^k = (x^k, h^k) is computed, and then

with Î ¶_xL(x^k, h^k) and Î ¶_yL(x^k, y^k). In this case no search is required for the step size _k, which is given by a simple formula in terms of the gap L(x^k, h^k)L(x^k, y^k). The difficulty is to some extent transferred to the selection of (x^k, h^k), which must satisfy certain conditions in order to ensure convergence to a solution of SPP(L, X, Y). One possibility is to adopt (5) so that x^k = P_X(x^ka_k), h^k = P_Y(y^k + a_k), with Î ¶_xL(x^k, y^k), Î ¶_yL (x^k, y^k). In this case the method is somewhat similar to Korpelevich's (see [16]), but not identical: in linear programming, for instance, a closed expression for a_k is easily available for the method of [13] even when a Lipschitz constant for ÑL is not known beforehand. Other differences between the method in [13] with this choice of (x^k, h^k) and Korpelevich's are commented upon right after Example 6 in Section 3.

Besides (5), the method in [13] includes other options for the selection of (x^k, h^k). Taking advantage of this fact, we will combine the approaches in [3] and [13], producing an interior point method for the case of nonempty Xº× Yº.

We will consider two independent modifications upon the method in [13]. The first one to be discussed in Section 3 is the following: assuming that Bregman functions g_X, g_Y with zones Xº, Yº respectively are available, we will use (10) instead of (11) and (12), setting

We will show that if the perturbation pair (x^k, h^k) satisfies conditions required in [13] and the stepsizes t_k are chosen in a suitable way then convergence to a solution follows.

It is clear that the algorithm becomes practical only when sg_X, sg_Y have explicit formulae, i.e., in view of (8), when Ñg_X, Ñg_Y are easily invertible. Bregman functions with this property are available for the cases in which C is an orthant, a box, or certain polyhedra (in the latter case, inversion of the gradient requires solution of two linear systems) and are presented in [3]. Examples of realizations of this method are given are Section 4.

The second modification presented in Section 5 consists of an interior point procedure for the determination of (x^k, h^k), namely

where l, m are positive constants. Definitions (15) and (16) imply

with Î ¶_xL(x^k, y^k), Î ¶_yL(x^k, h^k). Equivalently with (17) and (18) we have

Differently from (13) and (14), equations (17) and (18) are implicit in x^k, h^k, which appear in both sides of (19) and (20). These equations cannot be in general explicitly solved, even when Ñg_X, Ñg_Y are easily invertible. On the other hand, no search is needed for perturbation stepsizes l and m even when ÑL is not Lipschitz continuous. This choice of (x^k, h^k) does not satisfy the conditions in [13], so that the general convergence result for the sequence defined by (13) and (14) cannot be fully used. A special convergence proof will be given assuming that the Bregman functions are separable and the constraint sets X and Y are the nonnegative orthants.

It is immediate that (19) and (20) become explicit equations (up to the inversion of Ñg_Xand Ñg_Y) when ¶_xL(·, y), ¶_yL(x, ·) are constant. This happens in the case of linear programming. For this case, combining both modifications, we obtain an interior point method with explicit updating formulae both for the perturbed points and the primal and dual iterates. The curvilinear stepsize t_k requires a simple search. We analyze the linear programming case in detail in Section 6, where we present also some numerical illustration of the method.

2 Bregman functions and distances

Let C Ì ^p be a closed and convex set and let Cº be the interior of C. Consider g : C ® differentiable in Cº. For w Î C and z Î Cº, define D_g : on C × Cº ® by (7). D_g is said to be the Bregman distance associated to g.

We say that g is a Bregman function with zone C if the following conditions hold:

B1. g is continuous and strictly convex on C.

B2. g is twice differentiable on Cº, and in any bounded subset of Cº the eigenvalues of Ñ²g(z) are bounded below by a strictly positive number.

B3. The level sets of D_g(w, · ) and D_g( · , z) are bounded, for all w Î C, z Î Cº, respectively.

B4. If {z^k} Ì C^o converges to z^*, then lim_k®¥D_g(z^*, z^k) = 0.

B5. For all v Î ⁿ there exists z Î Cº such that Ñg(z) = v.

The definition of Bregman functions and distances originates in [2]. They have been widely used in convex optimization, e.g. [4], [5], [6], [7], [8]. Condition B2 in these references is weaker than here; only differentiability of g in Cº is required. On the other hand, such references included an additional condition which is now implied by our stronger condition B2, namely the result of Proposition 1 below. Condition B5 is called zone coerciveness in [3]. From (7) and B1 it is immediate that for all w Î C, z Î Cº, D_g(w, z) > 0 and D_g(w, z) = 0 if and only if w = z.

We present next some examples of Bregman functions.

Example 1. C = ^p, g(x) = ||x||². Then D_g(w, z) = ||w z||².

Example 2. C = , g(z) = z_jlog z_j, continuously extended to the boundary of C with the convention that 0 log 0 = 0. Then

D_g is the Kullback-Leibler information divergence, widely used in statistics.

Example 3. C = , g(z) = ( ), with a > 1, b Î (0,1).

Examples of Bregman functions satisfying these conditions for the cases of C being a box or a polyhedron can be found in [3]. We now proceed to a result which will be employed in the convergence analysis.

Proposition 1. If g is a Bregman function with domain Cº, and {z^k} and {w^k} are sequences in Cº such that {z^k} is bounded and lim_k_®¥D_g(w^k, z^k) = 0, then

Proof. Let B Ì Rⁿ be a closed ball containing {z^k} such that, for any z on the boundary of B, ||z^k z|| > 1, for all k. For n Î [0,1] define z^k(n) = nw^k+ (1 n)z^k. Let n_k be the largest n Î [0,1] such that z^k(n) Î B. For each k, define u^k = z^k(n_k). Since Cº is convex and n Î [0,1], we get that z^k(n_k) Î Cº. By convexity of D_g(·, z^k),

0 < D_g(u^k, z^k) < D_g(w^k, z^k).

Because D_g(w^k, z^k) converges to zero, so does D_g(u^k, z^k). We shall show that ||z^k u^k|| converges to zero. Then, from the definition of B it follows that there exists k¢ such that, for all k > k¢ , u^k is in the interior of B, such that n_k = 1 and u^k = w^k; hence the assertion follows.

Define u^k(w) = wu^k+ (1 w)z^k, for w Î [0,1]. By Taylor's Theorem,

for some w Î [0,1]. The quadratic term in (22) is equal to D_g(u^k, z^k), which converges to zero. Because of boundedness of {z^k} and {u^k}, B2 implies that D_g(u^k, z^k) is bounded below by q||u^k z^k||² for some q > 0 and independent of k. Therefore ||u^k z^k|| converges to zero.

3 An interior point method for saddle point computation

In this section we assume that X and Y have nonempty interior and that g_X and g_Y are Bregman functions with zones X and Y, respectively. L : ⁿ× ^m ® is continuous on X × Y, convex in the first variable and concave in the second one.

Following [13] we introduce perturbation sets D : X × Y ® (X), G : X × Y ® (Y). We consider the following properties of D, G.

A1. If (x, y) Î Xº × Yº and (x, h>) Î D(x, y) × G(x, y) then L(x, h) L(x, y) > 0, with strict inequality if (x, y) is not a saddle point.

A2. If E Ì Xº × Yº is bounded then È_{(x, y) Î E}D(x, y) and È_{(x, y) Î E}G(x, y) are bounded.

A3. If {(u, v)} Ì X × Y converges to (u, v) as goes to ¥, xÎ D(u

, v), hÎ G(u

, v) and lim_{® ¥}[L(u, h) L(x, v)] = 0 then (u, v) solves SPP(L, X, Y).

We give next a few examples of sets D, G, taken from [13].

Example 4. Assume that X and Y are compact and define D(x, y) = argmin_{x Î X}L(x, y) and G(x, y) = argmax_{h Î Y}L(x, h).

Example 5. Let l > 0, m > 0 and define D(x, y) = argmin x Î X[L(x, y) + l||x x||²] and G(x, y) = argmax_{h Î Y}[L(x, h) m||h y||²], for x Î X, y Î Y. Due to strict convexity (concavity) of the minimand (maximand), D(x, y) (G(x, y)) is uniquely determined.

Example 6. Assume that Ñ_xL and Ñ_yL are Lipschitz continuous with constant p and let 0 < a < 1/p. Define D(x, y) = {P_X(x aÑ_xL(x, y))} and G(x, y) = {P_Y(y + aÑ_yL(x, y))}. It corresponds to the perturbed vector w^k of Korpelevich's method ([16]) in (5). But we must point out that when we use the method in [13] (i.e. (11)-(12)) with this perturbation we obtain something quite different from Korpelevich's method, because Korpelevich's method, with T = (Ñ_xL, Ñ_yL), would then choose = Ñ_xL(x^k, h^k) and = Ñ_yL(x^k, h^k), while the method in [13] takes = Ñ_xL(x^k, h^k) and = Ñ_yL(x^k, y^k). This choice of , is possible only in the case of a saddle point problem, where we have two variables x and y, while Korpelevich's method is devised for a general variational inequality problem, where we only have the vector w^k, instead of the pair (x^k, h^k).

It is not difficult to prove that the choices of D and G made in Examples 4-6 satisfy conditions A1-A3.

Next we present the algorithm generating the sequence {(x^k, y^k)}. We denote z^k = (x^k, y^k), C = X × Y and Cº = Xº × Yº, and define D_g : C × Cº ® employing

It is a matter of routine to check that g is a Bregman function with zone C. The algorithm of this section is defined as follows:

Step 0: Initialization. Choose parameters b and g, 1 > b > g > 0, and take

Step 1: Perturbation. Begin iteration with z^k = (x^k, y^k). Choose perturbation vectors (x^k, h^k) Î D(x^k, y^k) × G(x^k, y^k) and define

Step 2: Stopping test. If s_k = 0 then stop.

Step 3: Direction. Choose Î ¶_xL(x^k, h^k), and Î ¶_yL(x^k, y^k), and denote d^k = (, )^t.

Step 4: Stepsize. For t > 0 define functions

and

Search for a step size t_k > 0 satisfying

Step 5: Update. Set

increment k by one and return to Step 1.

Theorem 1 below shows that f_k(t) is a concave function with f_k(0) = 0 and (0) = s_k > 0. For iteration k and a potential stepsize t, this function provides a lower bound for the decrease in the Bregman distance to a solution. Thus the left inequality in (28) aims to guarantee that such a distance is suitably decreased while the right inequality aims to bound the stepsize from below. To shorten the search for an acceptable stepsize, one may begin with the stepsize employed in the preceding iteration. Also, we will show that the stopping criterion of Step 2 is appropriate, in the sense that when termination occurs z^k is a solution of the problem. The convergence properties of the algorithm are formalized as follows.

Theorem 1. Assume that SPP(L, X, Y) has solutions. Let g be a Bregman function with zone Cº and let {z^k} be a sequence generated by the algorithm of this section, where x^kÎ D(z^k) and h^kÎ G(z^k), with D and G satisfying A1 and A2. If the algorithm stops at iteration k, then z^k solves SPP(L, X, Y). If the sequence {z^k} is infinite, then

i) z^k is well defined and belongs to Cº, for all k,

ii) for any solution z^* of SPP(L, X, Y), the Bregman distance D_g(z^*, z^k) is monotonically decreasing,

iii) {z^k} is bounded,

iv) lim_k_{® ¥}s_k = 0;

v) lim_k_{® ¥}||z^k⁺¹ z^k|| = 0;

vi) if additionally A3 is satisfied, then the sequence {z^k} converges to a solution of SPP(L, X, Y).

Proof. We prove first (i) by induction. z⁰ belongs to Cº by (24). Assuming that z^k belongs to Cº, note that (26) is equivalent to

so that by B1 and B5, z(t) is uniquely defined by (30) and belongs to Cº for all t > 0. By (29), z^k⁺¹ belongs to Cº and (i) holds. Now we consider the case of finite termination, i.e. when, according to Step 2, s_k = 0. In such a case, since z^k belongs to Cº by (i), it follows from A1 that z^k is a saddle point. From now on we assume that {z^k} is infinite. We proceed to prove (ii). Since z(t) Î Cº for all t > 0, using (7) and (30) we get

By convexity/concavity of L, since (x^*, y^*) is a saddle point,

Combining (27), (31) and (32) yields

Differentiating (30) yields

so that the first two derivatives of f_k(t) in (27) are

and

with j_k defined as

Since z(0) = z^k, f_k(0) = 0. By A1 and (34), (0) = s_k > 0, in view of the stopping criterion, and so (32) implies d^k ¹ 0. Consequently by B2, (35) and (36), (t) < 0. Therefore f_k(t) is a concave function of t Î [0, ¥) with a positive slope at t = 0. Inequality (33) implies that f_k(t) is bounded above. Hence there exists a stepsize t_k satisfying (28), so that z^k⁺¹ is well defined, and therefore item (i) follows from (30). Furthermore, for some g_kÎ [g, b], f_k (t_k) = g_kt_ks_k which together with (33) implies item (ii). Denote by Bº the set {z Î Cº : D_g(z^*, z) < D_g(z^*, z⁰)}. By B3, Bº is bounded. Item (ii) implies that z^k⁺¹Î Bº if z^k Î Bº. Therefore, item (iii) follows then by induction from (24).

Since {z^k} is bounded, we get from A2 that {(x^k, h^k)} is bounded. The operator (¶_xL, ¶_yL) is maximal monotone, therefore bounded over bounded sets contained in the interior of its domain, which in this case is ⁿ× ^m (e.g. [18]). It follows that {d^k} is bounded.

By Taylor's theorem and (28), for some Î [0, t_k], f_k(t_k) = t_ks_k j_k([) < bt_ks_k. Consequently, and because B2, A2 and item (iii) imply j_k() < q, for some q > 0, we have

By (28) and (33),

By (38), lim_{k ® ¥}t_ks_k = 0, and then, multiplying both sides of (37) by s_k, we conclude that lim_k®¥s_k = 0, i.e. (iv) holds. From (27) and (28) we have 0 < D_g(z^k, z^k⁺¹) < t_ks_k, where the right side converges to zero. Therefore, item (v) follows from item (iii) and Proposition 1. By item (iii) the sequence {z^k} has an accumulation point . By item (iv) and A3, solves SPP(L, X, Y). By item (ii) the Bregman distance D_g(, z^k) is non-increasing. If {zj_^k} is a subsequence of {z^k} which converges to , by B4 lim_k®¥D_g(, zj_^k) = 0. Since {D_g(, z^k)} is a nondecreasing and nonnegative sequence with a subsequence which converges to 0, we conclude that the whole sequence converges to 0. Now we apply Proposition 1 with w^k = , and conclude that lim_k®¥ z^k = .

4 Particular realizations of the algorithm

In the unrestricted case (X = ⁿ, Y = ^m) we can take g as in Example 1 so that

In this case, with b = g = 0.5, we may choose t_k to maximize f_k(t) yielding

which is a special case of the method considered in [13].

Next, consider the case in which X, Y are the nonnegative orthants. For g as in Example 2 we have

In this case, a search is needed for determining the stepsize t_k in Step 3 of the algorithm. For further illustration, see the case of linear programming in Section 6.

5 An interior point procedure for perturbation

In this section we analyze a selection rule for specifying x^k and h^k which extends the proposal of Example 6 in Section 3. Let g_X and g_Y be Bregman functions with zones Xº and Yº, respectively. Take l > 0 and m > 0, and for (x, y) Î Xº × Yº define

We shall restrict the discussion to the following special case of SPP(L, X, Y): X = , Y = , L twice continuously differentiable on ⁿ× ^m and g separable, i.e.

We will show that the perturbation sets defined by (43)-(44) satisfy A1 and A2. A3 does not hold in general for this perturbation, but we will establish convergence of the method under a nondegeneracy assumption on the problem, and, in the case of linear programming, without such assumption. Our proof can be extended to nondifferentiable L and to box, rather than positivity, constraints, at the cost of some minor technical complications.

Proposition 2. Assume that L is continuously differentiable, X × Y =

×

, and that g is a separable Bregman functions with zone Xº× Yº. Consider (x, y) Î Xº× Yº and (x, h) as in (43)-(44). Then

i) (x, h) is uniquely determined and (x, h) Î Xº× Yº;

ii) if (x, y) is not a saddle point, then L(x, h) L(x, y) > 0;

iii) if B Ì Xº× Yº is nonempty and bounded, then the set {(x(x, y), h(x, y)):(x, y) Î B} is bounded.

Proof. Let f(u) = L(u, y) + 1/lDg_X(u, x). Take w = Ñ_xL(x, y) and define (u) = L(x, y) + w^t(u x) + 1/lDg_X(u, x). By convexity of L(·, y) we have

for all u Î X. Consider the unrestricted minimization problem min (u), whose optimality conditions are Ñg_X(u) = Ñg(x) lw. By B5, this equation in u has a unique solution which belongs to Xº. It follows easily from (7) that Dg_X(·, x) is strictly convex, and the same holds for , so that minimizes in ⁿ. A strictly convex function which attains its minimum has bounded level sets, and it follows from (45) that f also has bounded level sets (see [19, Corollary 8.7.1]), and therefore it attains its minimum in X. Being the sum of a convex function and a strictly convex one, f is strictly convex and so the minimizer is unique, by convexity of X. The fact that this minimizer belongs to Xº follows from B5 and has been established in [10, Theorem 4.1]. By definition of f, this minimizer is x(x, y). A similar proof holds for h(x, y) so that the proof of item (i) is complete.

For item (ii), assume that (x, y) is not a saddle point. Then Ñ_xL(x, y) ¹ 0 or Ñ_yL(x, y) ¹ 0. Assume that the former case holds; the latter one can be analyzed similarly. Observe that

so that Ñf(x) = Ñ_xL(x, y) ¹ 0. Hence x is not a solution of (43) and consequently

This implies L(x, y) L(x, y) > 0. Similarly, L(x, h) L(x, y) > 0 and therefore L(x, h) L(x, y) > 0. This completes the proof of item (ii).

To prove item (iii), we will prove boundedness of {x(x, y) : (x, y) Î B}. A similar argument holds for h(x, y). The proof relies on successive relaxations for optimization problems, whose optimal objective function values are upper bounds for ||x||². Since Ñf(x) = 0 by (46), x is unique by item (i), and we have

||x||² = max{||u||²| u > 0, Ñg_X(u) Ñg_X(x) = lÑ_xL(u, y)}.

Note that

using the facts that u^t(Ñ_xL(u, y) Ñ_xL(0, y)) > 0 and g_X(u) g_X(0) < u^tÑg_X(u), which result from convexity of g_X and L(·, y), in the last inclusion of (47).

Let now r = max_j{sup_{(x, y) Î B}[ (x_j) lÑ_xL(0, y)_j]}. We claim that r < ¥. It suffices to show that sup_{(x, y) Î B}[(x_j)lÑ_xL(0, y)_j] < ¥ for all j, which follows from the facts that Ñ_xL(0, y) is continuous as a function of y, is continuous in ₊₊, lim_{t ® 0} (t) = ¥, because of B5, and B is bounded. The claim is established.

It follows from (47) that

where s^t = (1, 1,¼, 1). We claim that the set {u > 0|g_X(u) rs^tu < g_X(0)} is bounded. Note that this set is a level set of the convex function _X(u) = g_X(u) rs^tu and that Ñ_X(u) = Ñg_X(u) rs. By B5, there exists z > 0 such that Ñg_X(z) = rs, i.e. Ñ_X(z) = 0, so that z is an unrestricted minimizer of _X. Since Ñ²

_X = Ñ²g_X, we get from B2 that Ñ²

_X(u) is positive definite for all u > 0, and hence

_X is strictly convex and z is its unique minimizer. Thus, the level set of

_X corresponding to the value

_X(z) is the singleton {z}. It is well known that if a convex function has a bounded level set, then all its nonempty level sets are bounded, which establishes the claim. It follows that

where the last inequality in (48) results from boundedness of the feasible set and continuity of the objective of the corresponding optimization problem.

We use now (43) and (44) to define perturbation sets D and G consisting of single elements x and h, respectively. In view of Proposition 2, this rule of perturbation satisfies conditions A1 and A2. However, the following example demonstrates that A3 may not hold.

Take m = n = 1, l = m = 1, g as in Example 2 and L(x, y) = (x 1)² (y 1)², whose only saddle point is (1,1). In this case the optimality conditions for (43)-(44) become

Take a sequence {(x^k, y^k)} convergent to (0, 0). Then the right hand sides of (49)-(50) diverge to ¥, implying that logx^k and logh^k also diverge to ¥; i.e. {(x^k, h^k)} also converges to (0, 0). It follows that

but (0, 0) is not a saddle point of L.

This example shows that the convergence argument of Theorem 1 is no longer valid for our algorithm with perturbation sets chosen according to (43)-(44). Instead, we provide another convergence argument, which can be more easily formulated in terms of variational inequalities and nonlinear complementarity problems. We need two results on these problems. The first one is well known, while the second one is, to our knowledge, new, and of some interest on its own.

Proposition 3.

i) If T : ^p ® ^p is monotone and continuous, and C Ì ^p is closed and convex, then the solution set of VIP(T, C), when nonempty, is closed and convex.

ii) If C =

then VIP(T, C) becomes the nonlinear complementarity problem NCP(T), consisting of finding z Î ^p such that

Proof. Elementary (see, e.g. [9]).

Corollary 1. If X × Y =

×

then (

,

) is a solution of SPP(L, X, Y) if and only if

Proof. Follows from Proposition 3 with T = (Ñ_xL, Ñ_yL).

Before presenting our new result, we need some notation.

Definition. We say that NCP(T) satisfies the strict complementarity assumption (SCA from now on) if for all solutions of NCP(T) and for all j between 1 and p it holds that either _j > 0 or T()_j > 0.

For z Î ^p, let J(z) = {j Î {1,..., p} : z_j = 0} and I(z) = {1,..., p}\J(z).

Proposition 4. Assume that T is monotone and continuous and that NCP(T) satisfies SCA. If Î ^p satisfies (52) and (53), and J() Ì J(z^*) for some solution z^* of NCP(T), then

solves NCP(T).

Proof. Let z(a) = + a(z^* ). We will prove that z(a) satisfies (52)-(53) for all a Î [0,1], and also (51) for a close to 1; then we will show that if (51) is violated for some a Î [0,1] then SCA will be violated too. In order to prove that z(a) satisfies (53) for all a Î [0,1] we must make a detour. Let U = {z Î : z_j = 0 for all j Î J()}. A vector solves VIP(T, U) if and only if Î U and

for all z Î U. Note that Î U. Since J() Ì J(z^*), z^* also belongs to U. We claim that both and z^* satisfy (59). This is immediate for z^* which satisfies (59) for all z Î , since it solves NCP(T), equivalent, by Proposition 3(ii), to VIP(T, ). Regarding , we have _j = z_j = 0 for all z Î U and all j Î J(), so that terms with j Î J() in the summation of (59) vanish. For j Ï J(), we have T()_j = 0 because satisfies (53), so that terms with j Ï J() in the summation of (59) also vanish. Therefore both and z^* solve VIP(T, U), and so z(a) solves VIP(T, U) for all a Î [0, 1] by Proposition 3(i). In particular z(a) satisfies (52) for all a Î [0, 1]. We look now at (53). Since z(a) Î U, we have z(a)_j = 0 for j Î J(), i.e. (53) holds for such j's. If j Î I(), we take z¹ and z² defined as = = 0 for ¹ j, = 2z(a)_j, = (1/2)z(a)_j, so that z¹ and z² belong to U. Looking at (59) with = z(a), z = z¹ or z² we get T(z(a))_jz(a)_j> 0, -(1/2)T(z(a))_jz(a)j > 0, so that z(a) satisfies (53) for all a Î [0,1]. Finally we look at (51). For j Î I(), we have z(a)_j > 0 for a Î [0, 1), and therefore, since z(a) satisfies (53), T(z(a))_j = 0, i.e. (51) holds for j Î I(), a Î [0, 1). If j Î J() then j Î J(z^*), i.e. = 0. By SCA, 0 < T(z^*)_j = T(z(1))_j. By continuity of T, T(z(a))_j > 0 for j Î J() and a close to 1, i.e., defining A = {a Î [0, 1) : T(z(a))_j > 0 for all j Î J()}, we get that A ¹ Æ. It follows that for a Î A, z(a) satisfies (51)-(53), i.e. it solves NCP(T). Let = inf A. By continuity of T, z() also solves NCP(T). We claim now that = 0. If ¹ 0, then, by the infimum property of , there exists j Î J() such that T(z())_j = 0, and also z()_j = 0 because z() Î U, in which case SCA is violated at z(). The claim is established and therefore = 0 and = z(0) solves NCP(T).

We will also use the following elementary result for bilinear L.

Proposition 5. For a bilinear function L(x, y) and for any (x, y) and (x¢, y¢) in ⁿ × ^m, it holds that

(Ñ_xL(x¢, y¢) Ñ_xL(x, y))(x¢ x) (Ñ_yL(x¢, y¢) Ñ_yL(x, y))(y¢ y) = 0.

Proof. Elementary.

We will consider SCA for SPP(L, , ), which is just NCP(T) with T = (Ñ_xL, Ñ_yL). It is clear that in this case SCA means that, for all solutions (x, y) of SPP(L, , ), x_j = 0 implies ¶L(x, y)/¶x_j > 0 and y_i = 0 implies ¶L(x, y)/¶y_i < 0.

Theorem 2. Assume that SPP(L, X, Y) has solutions, L is continuously differentiable, X × Y =

×

, and g is a separable Bregman function with zone

×

, satisfying B1-B5. Consider the algorithm of Section 3 with the perturbation vectors x^k and h^k defined according to (43) and (44). If either SCA is satisfied or L is bilinear then the sequence {(x^k, y^k)} generated by the algorithm converges to a solution (

,

) of SPP(L, X, Y).

Proof. Proposition 2 implies that our choice of (x^k, h^k) satisfies A1 and A2, so that the results of Theorem 1(i)-(v) hold. Therefore {(x^k, y^k)} has an accumulation point = (, ) Î X × Y, i.e. (54) holds for (, ). We prove next that (, ) satisfies also (55) and (57). Under SCA, we then show that (, ) solves SPP(L, X, Y). For the bilinear case we show directly that {z^k} converges to (, ).

First we show that ||x^k x^k|| and ||h^k y^k|| converge to zero. Using (7), (19) and (20), we have

Add (60)-(61) to get 0

By Theorem 1(iv), the rightmost expression in (62) tends to 0 as k goes to ¥ so that

Since {(x^k, y^k)} is bounded, by Proposition 1 and (63),

and

We check now that (55) holds at (, ). In the case of separable g_X and differentiable L, (43) with (x, y) = (x^k, y^k) and x = x^k implies

Consider now j such that _j > 0. Let {}, {} be subsequences of {}, {} respectively such that lim_k®¥ = lim_k_®¥ = _j > 0. Since g_j and L are continuously differentiable for x_j > 0, we can take limits in (66) along the subsequence and obtain

Hence (55) follows. A similar argument shows that (57) holds. In terms of NCP(T), we have proved that = (, ) satisfies (52) and (53). It remains to be proved that satisfies (51), i.e. that (, ) satisfies (56) and (58).

From now on we will consider separately the case of general L under SCA, for which we will use Proposition 4, and the case of bilinear L, which will result from Proposition 5.

Consider first the case in which SCA holds. In view of Proposition 4, it is enough to check that J() Ì J(z^*) for some solution z^* of SPP(L, X, Y). Take any solution z^* of SPP(L, X, Y). If the result does not hold, there exists j such that _j = 0 and > 0. By Theorem 1(ii)

Taking limits in (68) along a subsequence {} converging to , we get

Since > 0 and lim_k®¥ = j = 0, (69) implies that limt®0^₊(t) > ¥. On the other hand, B5 in the case of separable g means that (0, ¥) = (¥, +¥), which in turn implies, since is increasing by B1, that limt®0^₊(t) = ¥ . This contradiction establishes that J() Ì J(z^*). Thus, all the hypotheses of Proposition 4 hold for and so solves SPP(L, X, Y).

Now we proceed exactly as at the end of the proof of Theorem 1. Since is a solution, {D_g(, z^k)} is nonincreasing by Theorem 1(ii). By B4 this sequence has a subsequence (corresponding to the subsequence of {z^k} which converges to ) which converges to 0, so that the whole sequence converges to 0, and therefore lim_k®¥z^k = by Proposition 1.

Finally, consider a bilinear L, i.e., L(x, y) = c^t x + b^t y y^t Ax with c Î ⁿ, b Î ^m and A Î ^m×n. In this case we will invert the procedure of the previous case: we will establish first convergence of the whole sequence to a unique cluster point and then use this uniqueness to prove optimality. In this case L is the Lagrangian of the linear programming problem min_x{c^tx|Ax > b, x > 0}. It is well known that the saddle points of L and the optimal primal-dual pairs (x, y) of the linear programming problem coincide. Denote the gradient components of L by

d_x(y) = Ñ_xL(x, y) = c A^ty, d_y(x) = Ñ_yL(x, y) = b Ax,

and d(z) = (d_x(y), d_y(x))^t with z = (x, y).

Let I⁰ be the set of indices such that for j Î I⁰ it holds that z_j = 0 for all saddle points z. If a saddle point exists, Tucker [20] shows that for all j Î I⁰ there exists a saddle point z such that d_j(z) > 0. Now for each index j Î {1,... , m + n}, we pick up a saddle point ^j in the following way: if j belongs to I⁰, we choose, using Tucker's result, ^j so that d_j(^j) > 0. If j does not belong to I⁰, we choose, using the definition of I⁰, ^j so that > 0. Let z^* be a convex combination of ¹,... , ^m+n with strictly positive coefficients. It follows that z^* = (x^*, y^*) is a saddle point with the following additional property:

= 0 Þ d_j(z^*) > 0

(note that > 0 for j Ï I⁰). From (27), (32) and (37) we obtain

where w^k = (x^k, h^k). For a bilinear L,

and

so that

and, by Propositions 5 and 3,

(d^k)^t(z^* w^k) = d(z^*)^t(z^* w^k) = d(z^*)^tw^k.

Combining the equations above we obtain

By Theorem 1(iii) {z^k} is bounded, which implies easily that < ¥. Consequently, by nonnegativity of all terms in (73),

for all j such that d_j(z^*) > 0. Let be a cluster point of {z^k}. Next, we use (74) to show that the absolute values of increments of the sequence {D_g(, z^k)} have a finite sum.

As in (73), using (27) and (37), we obtain

so that

Consider j such that d_j() ¹ 0. Since we have already proved that satisfies (54), (55) and (57), we get _j = 0. {D_g(z^*, z^k)} is bounded by Theorem 1(ii), so that {D(, )} is bounded. By B5 in the one dimensional case, lim_t®0+(t) = ¥. Since there exists a subsequence of {} which converges to j = 0, boundedness of {D(, )} implies that = 0. It follows from the special choice of z^* that d_j(z^*) > 0. It follows from (74) that < ¥. By (38), t_ks_k < ¥. Hence we have proved that the rightmost expression of (75) is summable in k, and therefore {D_g(, z^k)} is a Cauchy sequence which converges to 0 by B4. Hence lim_k®¥z^k = by Proposition 1 with w^k = .

We have established that the whole sequence {z^k} converges to a unique point . In view of Corollary 1, since we have shown above that satisfies (54), (55) and (57), it only remains to prove that = (, ) satisfies (56) and (58). We start with (56). Take j such that _j = 0. Suppose that (56) does not hold; i.e. Ñ_xL(, )_j < 0. By (64), lim_k®¥(x^k, h^k) = (, ) so that by continuous differentiability of L there exists such that Ñ_xL(x^k, h^k)_j < 0 for all k > . In the separable case (30) becomes

and Step 3, under the differentiability assumption, asserts that = Ñ_xL(x^k, h^k) so that ()_j is negative for k > . Since t_k > 0 it follows from (76) that (x^k⁺¹) > (x^k), and, since g_j is strictly increasing by B2, we get > , for all k > . Therefore, we have _j = lim_k®¥ > > 0. This is a contradiction, so that (56) holds for _j. The proof of (58) is virtually identical.

6 An application to linear programming

In this section we present the iterative formulae of the algorithm in the case of linear programming problems, with x^k and h^k chosen as in Section 5, and with g chosen as in Example 2. Consider the problem in the form

with c Î ⁿ, b Î ^m and A Î ^m×n, so that the Lagrangian is L(x, y) = c^tx + b^ty y^tAx and the constraint sets are X = , Y = .

The Bregman functions are defined as

Note that in this case the perturbation vectors x and h defined by (43) and (44) can be explicitly solved, and, given the stepsize t_k, the updated primal and dual solutions can be stated explicitly:

With z^k = (x^k, y^k), employing f_k(t) in (42) and

s_k = c^t(x^k x^k) + b^t(h^k y^k) (h^k)^t Ax^k+ (y^k)^tAx^k,

the stepsize t_k satisfying (28) is obtained via Armijo search.

Based on our method, an experimental code called Bregman was written. It is not intended to fully explore the computational capabilities of our method, or its competitiveness with other algorithms, but just its plausibility, showing that its performance is compatible with other methods. Thus we chose to test on a well known and easily accessible collection of LP problems, though we don't claim that our method will be competitive with algorithms specific for LP; in fact, our article is intended to address primarily nonlinear saddle point problems.

Computational results are compared with those obtained from the saddle point method of [13]. The latter code is called Saddle and it employs the following iterative scheme:

with t_k proportional to s_k.

As is the case with Saddle, scaling is crucial for efficiency of Bregman. Our experience indicates that static (initial) data scaling via equilibration, which traditionally have been employed in LP codes, is not helpful for Bregman. That is why the dynamic scaling procedure in Saddle adopted from Kallio and Salo [14] was further developed for Bregman. In Saddle, auxiliary reference quantity and value vectors ^k = () Î R^m and d^k = () Î Rⁿ are defined in each iteration so that

where and denote primal and dual solutions at the beginning of the iteration k. If either or is smaller than certain safeguard values, then the safeguard value is used instead of the reference quantity smaller than it. These safeguard values are equal to one tenth of the average of and over all variables. The scaling heuristic of Saddle employs two phases as follows. In phase one, each row i of A is divided by and each column j by . Let A^¢ = () be the resulting matrix. Let _i be the mean of nonzero values || in row i and let _j be the corresponding column average, for each column j. Then, in the second phase of scaling, each element is divided by the geometric mean of _i and _j.

This procedure defines row and column scaling factors for rows and columns in A as well as for rows of b and columns of c^t. Primal and dual variables are scaled by the inverses of these factors. Formulae (80)-(83) are applied in the scaled space. For Bregman, the scaling factors of Saddle are adjusted in such a way, that the update direction at (x^k, y^k) > 0 (obtained by differentiating (82)-(83) with respect to t_k, at t_k = 0 ) is the same as the update direction of Saddle obtained from (84)-(85). For example, if D_i is the scaling factor of row i obtained for Saddle and we denote by G_i the scaling factor for that row to be applied in Bregman, then (83) yields = exp [t_kG_i(b Ax^k)_i] and (85) yields = max[0, + t_k(b Ax^k)_i]. Differentiating these at (x^k, y^k) > 0 with respect to t^k, and evaluating the derivatives at t^k = 0, yields directional components, which we require to be equal. This implies that G_i = /.

As a termination test for Bregman and Saddle, we use V(x^k, y^k) < f|c^tx^k|, where

is an error measure, and f > 0 is a stopping parameter. If (b Ax^k)_i > 0, then |(b Ax^k)_i| is the value (in units of the objective function) of the primal error; otherwise |(b Ax^k)_i| is the value of complementarity violation. Terms |(c A^Ty^k)_j| have similar interpretation, so that V(x^k, y^k) is interpreted as the total value of primal, dual and complementarity violations. Our experience indicates that, if f = 10^k, then we may expect k + 1 significant digits in the optimal objective function value. Using our termination test, final infeasibilities are usually small, although they do not enter the stopping criterion directly (see Table 4 below).

Thumbnail

For computational illustration, Bregman was tested against Saddle on ten problems from the Netlib library [17]. For the purpose of this comparison, each problem was first cast in the form given in (77)-(79). Problem names and dimensions are given in Table 1.

All primal and dual variables are set to one initially. We set the perturbation step size parameters in (43)-(44) to l = m = 0.5, the stepsize test parameters in (28) to g = 0.3 and b = 0.7. Two values for the stopping parameter were applied: f = 10⁴ and f = 10⁶.

Table 2 shows the iteration count for f = 10⁴ and f = 10⁶, both for Bregman and Saddle. Table 3 shows the corresponding relative objective errors |c^t c^tx^*|/|c^tx^*|, where is the primal solution obtained at the end of iterations and x^* is a true optimal solution. We observe that the iteration counts for both methods are of the same order of magnitude. Also the precision in both cases is approximately the same (five and seven significant digits as predicted for f = 10⁴ and f = 10⁶, respectively). Due to function evaluations (for the exponent functions) and due to the search for step sizes, work per iteration for Bregman is somewhat larger than for Saddle. The computational tests show that the performance of Bregman appears quite equivalent to Saddle. One may speculate that a better scaling method for Bregman may improve its performance.

For each iteration k, define relative primal and dual infeasibilities, denoted by and , respectively, as follows:

Table 4 reports the final infeasibilities in terms of max{, } for Bregman and Saddle, with f = 10⁶.

To see the importance of scaling, we ran Bregman for a dozen of small Netlib problems without scaling for f = 10⁴. Five out our twelve test problems failed to converge in one million iterations, while for the others the number of iterations was increased by a factor ranging from 3 to 22 as compared with the scaled version.

Our convergence results apply if the scaling factors are updated during a prespecified number of iterations only. To demonstrate that it pays off to update scaling until the end, we run our smallest problem sctap1 fixing scaling after 5000 iterations. As a result, the number of iterations is 26088, for f = 10⁴, and 63631, for f = 10⁶. Hence, the iteration count almost doubles as compared with results for sctap1 in Table 2. In conclusion, our theoretical results apply if scaling is fixed after a given number of iterations; however, it is plainly more efficient to update scaling factors until the termination criterion is met.

Finally, it is important to note that for small problems both of these approaches often turn out to be inefficient. Their comparative advantage is expected to appear for large-scale problems, and in particular, our method may be of interest for massively parallel computing.

Received:13/V/03. Accepted: 09/I/04.

#581/03.

[1] D. Bertsekas and J.N. Tsitsiklis, Parallel and Distributed Computation: Numerical Methods Prentice Hall, New Jersey (1989).
[2] L.M. Bregman, The relaxation method of finding the common points of convex sets and its application to the solution of problems in convex programming. USSR Computational Mathematics and Mathematical Physics, 7 (3) (1967), 200-217.
[3] Y. Censor, A.N. Iusem and S.A. Zenios, An interior point method with Bregman functions for the variational inequality problem with paramonotone operators. Mathematical Programming, 81 (1998), 373-400.
[4] Y. Censor and A. Lent, An iterative row-action method for interval convex programming. Journal of Optimization Theory and Applications, 34 (1981), 321-353.
[5] Y. Censor and S.A. Zenios, The proximal minimization algorithm with D-functions. Journal of Optimization Theory and Applications, 73 (1992), 451-464.
[6] G. Chen and M. Teboulle, Convergence analysis of a proximal-like optimization algorithm using Bregman functions. SIAM Journal on Optimization, 3 (1993) 538-543.
[7] A.R. De Pierro and A.N. Iusem, A relaxed version of Bregman's method for convex programming. Journal of Optimization Theory and Applications, 51 (1986), 421-440.
[8] J. Eckstein, Nonlinear proximal point algorithms using Bregman functions, with applications to convex programming. Mathematics of Operations Research, 18 (1993), 202-226.
[9] P.T. Harker and J.S. Pang, Finite dimensional variational inequalities and nonlinear complementarity problems: a survey of theory, algorithms and applications. Mathematical Programming, 48 (1990), 161-220.
[10] A.N. Iusem, On some properties of generalized proximal point methods for quadratic and linear programming. Journal of Optimization Theory and Applications, 85 (1995), 593-612.
[11] A.N. Iusem, An iterative algorithm for the variational inequality problem. Computational and Applied Mathematics, 13 (1994), 103-114.
[12] A.N. Iusem, An interior point method for the nonlinear complementarity problem. Applied Numerical Mathematics, 24 (1997), 469-482.
[13] M. Kallio and A. Ruszczynski, Perturbation methods for saddle point computation. Working Paper 94-38, Institute for Applied Systems Analysis, Laxenburg, Austria (1994).
[14] M. Kallio and S. Salo, Tatonnement procedures for linearly constrained convex optimization. Management Science, 40 (1994), 788-797.
[15] E.N. Khobotov, Modifications of the extragradient method for solving variational inequalities and certain optimization problems. USSR Computational Mathematics and Mathematical Physics, 27 (5) (1987), 120-127.
[16] G.M. Korpelevich, The extragradient method for finding saddle points and other problems. Ekonomika i Matematcheskie Metody, 12 (1976), 747-756.
[17]LP Netlib, Test Problems, Bell Laboratories.
[18] R.T. Rockafellar, Local boundedness of nonlinear monotone operators. Michigan Mathematical Journal, 16 (1969), 397-407.
[19] R.T. Rockafellar, Convex Analysis Princeton University Press, New Jersey (1970).
[20] A.W. Tucker, Dual systems of homogeneous linear relations. In Linear Inequalities and Related Systems (H.W. Kuhn and A.W. Tucker, editors). Annals of Mathematics Studies , Princeton University Press, Princeton 38 (1956), 3-18.

*

Research of this author was partially supported by CNPq grant No. 301280/86.

Publication Dates

Publication in this collection
26 Nov 2004
Date of issue
2004

History

Accepted
09 Jan 2004
Received
13 May 2003

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

[1] [1] D. Bertsekas and J.N. Tsitsiklis, Parallel and Distributed Computation: Numerical Methods Prentice Hall, New Jersey (1989).

[2] [2] L.M. Bregman, The relaxation method of finding the common points of convex sets and its application to the solution of problems in convex programming. USSR Computational Mathematics and Mathematical Physics, 7 (3) (1967), 200-217.

[3] [3] Y. Censor, A.N. Iusem and S.A. Zenios, An interior point method with Bregman functions for the variational inequality problem with paramonotone operators. Mathematical Programming, 81 (1998), 373-400.

[4] [4] Y. Censor and A. Lent, An iterative row-action method for interval convex programming. Journal of Optimization Theory and Applications, 34 (1981), 321-353.

[5] [5] Y. Censor and S.A. Zenios, The proximal minimization algorithm with D-functions. Journal of Optimization Theory and Applications, 73 (1992), 451-464.

[6] [6] G. Chen and M. Teboulle, Convergence analysis of a proximal-like optimization algorithm using Bregman functions. SIAM Journal on Optimization, 3 (1993) 538-543.

[7] [7] A.R. De Pierro and A.N. Iusem, A relaxed version of Bregman's method for convex programming. Journal of Optimization Theory and Applications, 51 (1986), 421-440.

[8] [8] J. Eckstein, Nonlinear proximal point algorithms using Bregman functions, with applications to convex programming. Mathematics of Operations Research, 18 (1993), 202-226.

[9] [9] P.T. Harker and J.S. Pang, Finite dimensional variational inequalities and nonlinear complementarity problems: a survey of theory, algorithms and applications. Mathematical Programming, 48 (1990), 161-220.

[10] [10] A.N. Iusem, On some properties of generalized proximal point methods for quadratic and linear programming. Journal of Optimization Theory and Applications, 85 (1995), 593-612.

[11] [11] A.N. Iusem, An iterative algorithm for the variational inequality problem. Computational and Applied Mathematics, 13 (1994), 103-114.

[12] [12] A.N. Iusem, An interior point method for the nonlinear complementarity problem. Applied Numerical Mathematics, 24 (1997), 469-482.

[13] [13] M. Kallio and A. Ruszczynski, Perturbation methods for saddle point computation. Working Paper 94-38, Institute for Applied Systems Analysis, Laxenburg, Austria (1994).

[14] [14] M. Kallio and S. Salo, Tatonnement procedures for linearly constrained convex optimization. Management Science, 40 (1994), 788-797.

[15] [15] E.N. Khobotov, Modifications of the extragradient method for solving variational inequalities and certain optimization problems. USSR Computational Mathematics and Mathematical Physics, 27 (5) (1987), 120-127.

[16] [16] G.M. Korpelevich, The extragradient method for finding saddle points and other problems. Ekonomika i Matematcheskie Metody, 12 (1976), 747-756.

[17] [17]LP Netlib, Test Problems, Bell Laboratories.

[18] [18] R.T. Rockafellar, Local boundedness of nonlinear monotone operators. Michigan Mathematical Journal, 16 (1969), 397-407.

[19] [19] R.T. Rockafellar, Convex Analysis Princeton University Press, New Jersey (1970).

[20] [20] A.W. Tucker, Dual systems of homogeneous linear relations. In Linear Inequalities and Related Systems (H.W. Kuhn and A.W. Tucker, editors). Annals of Mathematics Studies , Princeton University Press, Princeton 38 (1956), 3-18.

Brasil

Brasil

An interior point method for constrained saddle point problems

Abstract

Publication Dates

History