Derivative-free methods for nonlinear programming with general lower-level constraints

Diniz-Ehrhardt, M. A.; Martínez, J. M.; Pedroso, L. G.

doi:10.1590/S1807-03022011000100003

Abstract

Augmented Lagrangian methods for derivative-free continuous optimization with constraints are introduced in this paper. The algorithms inherit the convergence results obtained by Andreani, Birgin, Martínez and Schuverdt for the case in which analytic derivatives exist and are available. In particular, feasible limit points satisfy KKT conditions under the Constant Positive Linear Dependence (CPLD) constraint qualification. The form of our main algorithm allows us to employ well established derivative-free subalgorithms for solving lower-level constrained subproblems. Numerical experiments are presented.

nonlinear programming; Augmented Lagrangian; global convergence; optimality conditions; derivative-free optimization; constraint qualifications

Derivative-free methods for nonlinear programming with general lower-level constraints^* * This work was supported by PRONEX-Optimization (PRONEX - CNPq/ FAPERJ E-26/ 171.510/2006 -APQ1), FAPESP (Grants 2006/53768-0 and 2004/15635-2) and CNPq.

M. A. Diniz-Ehrhardt^I; J. M. Martínez^I; L. G. Pedroso^II

^IDepartment of Applied Mathematics, IMECC-UNICAMP University of Campinas, 13083-859 Campinas, SP, Brazil

^IIDepartment of Mathematics, Federal University of Paraná 81531-980 Curitiba, PR, Brazil E-mails: {cheti|martinez}@ime.unicamp.br, lucaspedroso@ufpr.br

ABSTRACT

Augmented Lagrangian methods for derivative-free continuous optimization with constraints are introduced in this paper. The algorithms inherit the convergence results obtained by Andreani, Birgin, Martínez and Schuverdt for the case in which analytic derivatives exist and are available. In particular, feasible limit points satisfy KKT conditions under the Constant Positive Linear Dependence (CPLD) constraint qualification. The form of our main algorithm allows us to employ well established derivative-free subalgorithms for solving lower-level constrained subproblems. Numerical experiments are presented. Mathematical subject classification: 90C30, 90C56.

Key words: nonlinear programming, Augmented Lagrangian, global convergence, optimality conditions, derivative-free optimization, constraint qualifications.

1 Introduction

In many applications one needs to solve finite-dimensional optimization problems in which derivatives of the objective functions or of the constraints are not available. A comprehensive book describing many of these situations has been recently published by Conn, Scheinberg and Vicente [9]. In this paper we introduce algorithms for solving constrained minimization problems of this class, in which constraints are divided in two levels, as in [1]. The shifted-penalty approach that characterizes Augmented Lagrangian methods is applied only with respect to the upper-level constraints whereas the lower level constraints are explicitly included in all the subproblems solved at the main algorithm. Lower-level constraints are not restricted to the ones that define boxes or polytopes. In general we will assume that derivatives with respect to lower-level constraints are available, but derivatives with respect to upper-level constraints and objective function are not.

The form of the problems addressed in this paper is the following:

The set Ω represents lower-level constraints of the form

In the most simple case, Ω will take the form of an n-dimensional box:

We will assume that the functions f : Rⁿ R, h : Rⁿ R^m, g : Rⁿ R^p, h : Rⁿ R^m, g : Rⁿ R^p are continuous. First derivatives will be assumed to exist for some convergence results but they will not be used in the algorithmic calculations at all. It is important to stress that the algorithms presented here do not rely on derivative discretizations, albeit some theoretical convergence results evoke discrete derivative ideas.

We aim to solve (1) using a derivative-free Augmented Lagrangian approach inspired in [1]. Given ρ > 0, λ ∈ R^m, μ ∈ R++, x ∈ Rⁿ, we define the Augmented Lagrangian L _ρ(x, λ, μ by:

At each (outer) iteration the main algorithm will minimize (approximately) L_ρ(x, λ, μ) subject to x ∈Ω using some derivative-free method and, between outer iterations, the Lagrange multipliers λ, μ and the penalty parameter ρ will be conveniently updated. The definition (4) corresponds to the well-known PHR (Powell-Hestenes-Rockafellar) [12, 24, 27] classical Augmented Lagran-gian formulation. See, also, [7, 8].

In the global optimization framework one finds an approximate global mini-mizer of L_ρ(x, λ, μ) on Ω at each outer iteration. In the limit, it may be proved that a global minimizer of the original problem up to an arbitrary precision is found [5]. Global minimization subalgorithms are usually expensive, but the Augmented Lagrangian approach exhibits a practical property called "preference for global minimizers" [1], thanks to which one usually finds local minimizers with suitable small objective function values if efficient local minimization algorithms are used for solving the subproblems. In other words, even if we do not find global minimizers of the subproblems, the global results [5] tend to explain why the algorithms work with the sole assumption of continuity of objective function and constraints.

We conjecture that the effectivity of Augmented Lagrangian tools in derivative-free optimization is associated with the main motivation of the general Augmented Lagrangian approach. Augmented Lagrangian methods are not motivated by Newtonian ideas (which involve iterative resolution of linearized problems associated with derivatives). Instead, these methods are based on the idea of minimizing penalized problems with shifted constraints. The most natural updating rule for the multipliers does not use derivatives at all, as its motivation comes from a modified-displacement strategy [24]. Continuity is the only smoothness property of the involved functions that supports the plausibility of Augmented Lagrangian procedures for constrained optimization.

The division of the constraints between upper-level and lower-level ones may be due to different reasons. In our algorithms, we will adopt the point of view that the upper level contains "difficult" constraints whose derivatives are not available and that the lower level contains "easier" constraints whose explicit derivatives could be employed. Derivative-free methods for minimization with (only) lower-level constraints are supposed to exist (see [9, 23]) albeit this assumption is irrelevant from the theoretical point of view. However it is important to put in relief that we could use many known derivative-free algorithms to minimize L_ρ(x, λ, μ) subject to x ∈ Ω, including directional direct search methods, simplex methods or algorithms based on polynomial interpolation [9]. Sometimes lower-level constraints are easy in the sense that we can deal with them using direct directional search methods. For instance, we can use the MADS algorithm [3, 4], which has been proven to be a very good choice for problems where Ω = Int(Ω). Although most frequently lower-level constraints define boxes or polytopes, more general situations are not excluded at all. Sometimes lower-level constraints cannot be violated, because they involve fundamental definitions or because the objective function may not be defined when they are not fulfilled.

The choice of the algorithm that should be used to solve the subproblems depends on the nature of the lower-level constraints. The best known case is when Ω is an n-dimensional box. We will develop a special algorithm for this case. The algorithm employs an arbitrary derivative-free box-constraint (or even unconstrained) minimization solver which we complement with a local coordinate search in order to ensure convergence. When Ω is defined by linear (equality or inequality) constraints, the technique of positive generators on cones ([9], Chapter 13) maybe employed. See [15, 17, 20].

Assume that x is a feasible point of a smooth nonlinear programming problem whose constraints are h_i(x) = 0, i ∈ I, g_j (x), j ∈ J and that the active constraints at x are (besides the equalities) g_j (x) < 0, j ∈ J₀. Let I₁⊆I,J₁ ⊆ J₀. We say that the gradients ∇h_i(x )(i ∈ I₁) ∇g_j (x )(j ∈ J₁) are positively linearly dependent if

where

The CPLD condition says that, when a subset of gradients of active constraints is positively linearly dependent at x , then the same gradients remain linearly dependent for all x (feasible or not) in a neighborhood of x . Therefore, CPLD is strictly weaker than the Mangasarian-Fromovitz (MFCQ) constraint qualification [21, 28].

The CPLD condition was introduced in [26] and its status as a constraint qualification was elucidated in [2]. In [1] an Augmented Lagrangian algorithm for minimization with arbitrary lower-level constraints was introduced and it was proved that feasible limit points that satisfy CPLD necessarily fulfill the KKT optimality conditions. The derivative-free algorithms introduced in this paper recover the convergence properties of[1].

In [14], Kolda, Lewis and Torczon proposed a derivative-free Augmented Lagrangian algorithm for nonlinear programming problems with a combination of general and linear constraints. In terms of (1), the set Q is defined by a polytope in [14]. The authors use the approach of [7], by means of which inequality constraints are transformed into equality constraints with the addition of slack non-negative variables. Employing smoothness assumptions, they reproduce the convergence results of [7]. This means that feasible limit points satisfy KKT conditions, provided that the Linear Independence Constraint Qualification (LICQ) holds. In a more recent technical report [19], Lewis and Torczon used the framework of [1] for dealing with linear constraints in the lower level and employed their derivative-free approach for solving the sub-problems. The proposed algorithm inherits the theoretical properties of [1], which means that the results in [19] are based on the CPLD constraint qualification. However, the authors don't consider nonlinear constraints in the lower level set, which is possible in our approach.

This paper is organized as follows. In Section 2 we recall the method and theoretical results found in [1]. In Section 3 we introduce a derivative-free version of [1] for the case in which the lower-level set is a box. In Section 4 we introduce a derivative-free version of the Augmented Lagrangian method for arbitrary lower-level constraints. Section 5 describes some numerical results. Finally, we make some comments and present our conclusions about this work in Section 6.

Notation

The symbol || || denotes an arbitrary vector norm.

The canonical basis of Rⁿ will be denoted {e¹, ..., eⁿ}.

For all y ∈ Rⁿ, y+ = (max{0, y₁}, ..., max{0, y_n})^T.

N = j0, 1, ...}.

If v is a vector and t is a scalar, the statement v < t means that v_i < t for all the coordinates i .

[a, b]^m = [a, b] x [a, b] x ... x [a, b] m times.

2 Preliminary results

In this section we recall the main algorithm presented in [1] for solving (1) with Ω. defined by (2). We will also mention the convergence theorems that are relevant for our present research.

Algorithm 1 (Derivative-based Augmented Lagrangian)

The parameters that define the algorithm are: τ ∈ [0, 1), y > 1, λ_min < λ_max, μ_max > 0. At the first outer iteration we use a penalty parameter ρ₁ > 0 and safeguarded Lagrange multipliers estimates λ¹∈ R^m and μ¹∈ R^ρ such that

Finally, {ε_k} is a sequence of positive numbers that satisfies

Step 1. Initialization.

Set k - 1.

Step 2. Solve the subproblem.

Compute x^k ∈ Rⁿ such that there exist v^k ∈ R^m-, w^k ∈ R^p satisfying

Step 3. Estimate multipliers.

For all i = 1 , . . . , m, compute

For all i = 1 , . . . , p, compute

Step 4. Update penalty parameter.

If k > 1 and

define

Else, define

Update k k + 1 and go to Step 2.

Lemma 1 below shows that the points x^k generated by Algorithm 1 are, approximately, KKT points of the problem, which is a consequence of requirements (5-8). Lemma 2 shows that the complementarity conditions respect to constraints g(x) < 0 are satisfied for k large enough.

Lemma 1.Assume that {x^k} is a sequence generated by Algorithm 1. Then, for all k = 1, 2, ... we have:

where

Proof. The proof follows from (5-8) using the definitions (9) and (11). □

Lemma 2.Assume that the sequence {x^k} is generated by Algorithm 1 and that K is an infinite sequence of indices such that

Suppose that g_i (x *) < 0. Then, there exists k₀∈ K such that

Proof. See the proof of formula (4.10) in Theorem 4.2 of [1]. □

Theorem 1.Assume that x * is a limit point of a sequence generated by Algo-rith 1 . Then:

1. Ifthe sequence ofpenalty paraeters {ρ_k} is bounded, x^* is a feasible point of (1). Otherwise, at least one of the following two possibilities holds:

The point x * satisfies the KKT conditions of the problem Minimize
The CPLD constraint qualification corresponding to the lower-level constraints

does not hold at x*.

2. Ifx * is a feasible point of (1) then at least one of the following two possibilities holds:

The KKT conditions of (1)

are fulfilled at x*.
The CPLD constraint qualification corresponding to all the constraints of

does not hold at x

^*.

Proof. See Theorems 4.1 and 4.2 of [1]. □

Theorem 1 represents the type of global convergence result that we want to prove for the algorithms presented in this paper. In [1], under additional assumptions, it is proved that the penalty parameters remain bounded when one applies Algorithm 1 to (1). The first part of Theorem 1 says that the algorithm behaves in the best possible way in the process of finding feasible points. This result cannot be improved since the feasible region could be empty and, on the other hand, the algorithm is not equipped for finding global minimizers. It must be observed that, since the lower-level set is generally simple, the CPLD condition related to Q is usually satisfied at all the lower-level feasible points. The second part of the theorem (which corresponds to Theorem 4.2 of [1]) says that, under the CPLD constraint qualification, every feasible limit point is KKT. Since CPLD is weaker than popular constraint qualifications like LICQ (regularity) or MFCQ (Mangasarian-Fromovitz), this result is stronger than properties that are based on LICQ or MFCQ. The consequence is that, roughly speaking, Algorithm 1 "works" when it generates a bounded sequence (so that limit points necessarily exist). The first requirement for this is, of course, that the conditions (5-8) must be effectively satisfied at every outer iteration, otherwise the algorithm would not be well defined. This requirement must be analyzed in every particular case. A sufficient condition for the boundedness of the sequence is the boundedness of the set

This condition holds, for example, if the lower-level constraints include box constraints on all the variables.

3 Derivative-free method for box lower-level constraints

In this section we define a derivative-free method that applies to problem (1) when Ω is a box defined by (3).

Algorithm 2 (Derivative-free Augmented Lagrangian for box constraints in the lower level)

Let τ ∈ [0, 1), γ > 1, λ_mim< λ_max, μ_max> 0, ρ1 > 0, -1, ^-¹ and { εk} be as in Algorithm 1. Steps 1, 3 and 4 of Algorithm 2 are identical to those of Algorithm 1. Step 2 is replaced by the following:

Step 2. Solve the subproblem.

Choose a positive tolerance δ_k < (u_i l_t)/2, i = 1, ..., n, such that

Find x^k ∈ Ω such that, for all i = 1, ..., n,

Since the algorithm found in [19] handles linear constrained problems, it could also be applied to a box constrained problem. The main difference between this algorithm and Algorithm 2 relies on the choice of the solver for the sub-problem. We encourage the use of any derivative-free technique able to find points satisfying (14), whereas in [19] the authors use Generating Set Search methods for solving the subproblems, taking advantage of the geometry of the linear constraints.

Theorem 2.Assume that ∇ f, ∇h, ∇g satisfy Lipschitz conditions on the box Ω . Then Algorith 2 is a particular case of Algorith 1 . Moreover:

The sequence {x^k} is well defined for all k and admits limit points.
If the sequence ofpenalty paraeters {

ρ_k} is bounded, x^* is feasible. Otherise, every limit point x^* is a stationary point of the box constrained problem

.
Ifx^* is a feasible liit point, then at least one ofthe following possibilities holds:

1. The point x* fulfills the KKT conditions of

Minimize f (x), subject to h(x) = 0, g(x) < 0, x ∈ Ω.

2. The CPLD constraint qualification is not satisfied at x* for the constraints h(x) = 0, g(x) < 0, x ∈ Ω.

Proof. By (4), (10), (12) and the Lipschitz assumption, there exists M > 0 such that for all x, y ∈ Ω we have:

Assume that x^k is computed at Step 2 of Algorithm 2. Given i ∈ {1, ..., n}, since δ_k < (u_i l_i)/2, we may consider three possible cases:

Consider the first case. By (14) and the Mean Value Theorem, there exists ∈ [ δ_k, δ_k] such that

Then by (15)

Now consider the second case. By (14) and the Mean Value Theorem, there exists ∈ [0, δ_k] such that

So there exists

> 0 such that

Therefore, by (15),

Analogously, in the third case we obtain that there exists

> 0 such that

From (16), (17) and (18), we deduce that there exists w^k > 0 such that

where

The sign takes place in (19) if

while the sign + takes place in (19) if

Then, by (13), x^k satisfies (5-8) if one redefines ε_k « max{ε_k, M ε_k}.

To complete the proof of the theorem, observe first that it is always possible to satisfy (14) defining a δ_k-grid on Ω and taking x^k as a global minimizer of L_pk (x, ^k, ^k) on that grid. Since Ω is bounded, the sequence is in a compact set and limit points exist. Moreover, the box constraints that define Q satisfy CPLD, therefore, by Theorem 1, limit points are stationary points of . The last part of the thesis is a direct consequence of Theorem 1. □

4 Derivative-free method for arbitrary lower-level constraints

In this section we define a derivative-free method of Augmented Lagrangian type for solving (1) with arbitrary lower-level constraints. Recall that in Section 3 the case in which the lower-level constraints define a box has been addressed.

In order to minimize the Augmented Lagrangian subject to the lower-level constraints, we need to use an "admissible" derivative-free algorithm, having appropriate theoretical properties. Roughly speaking, when objective function and constraints are differentiable, an admissible algorithm should compute KKT points of the subproblem up to any required precision.

Let be continuously differentiable. We say that an iterative algorithm is admissible with respect to the problem

if it does not make use of analytical derivatives of F and, independently of x⁰, it computes a sequence {x^v} with the following properties:

{x ^v} is bounded;

Given ε > 0 and v₁∈ N, there exist v > v₁, x^v ∈ Rⁿ, v^v∈ R^m, w^v∈ R, satisfying:

In this case, we say that x ^v is an ε-KKT point of the problem (20).

When we talk about lower-level constraints, we have in mind not only boxes and polytopes, as [14, 15], but also more general constraints that can be handled using, for example, the extreme barrier approach (see [9], Chapter 13). Barrier methods can be useful for solving subproblems when easy inequality constraints define feasible sets Ω such that

In these cases, incorporating the constraints which define Ω in the upper-level set may be inconvenient. A reasonable algorithm for solving the problem of minimizing a function subject to this type of constraints can be found in [3, 4].

We will assume that G_p(x, λ, μ, δ) approximates the gradient of the Augmented Lagrangian, in the sense that there exists M > 0 such that, for all

Many possible definitions of G can be given satisfying (25) if ∇f, ∇h, ∇g satisfy Lipschitz conditions. An appealing possibility is to define G as a simplex gradient [9]. It may not be necessary to use many auxiliary points to compute G because this approximation can be built using previous iterates of the sub-algorithm.

Algorithm 3, defined below, applies to the general problem (1). For this algorithm we will prove that the convergence properties of Algorithm 1 can be recovered under suitable assumptions on the optimization problem. For the application of Algorithm 3 we will assume that the gradients of the lower-level constraints ∇h and ∇g are available.

For each outer iteration k we will denote by {x^k,v} a sequence generated by an admissible algorithm applied to the problem

We assume that the computational form of the admissible algorithm includes some internal stopping criterion that is going to be tested for each v. When this stopping criterion is fulfilled we check the approximate fulfillment of (21-24) without using analytic derivatives of f, h, g. Of course, some admissible algorithms may ensure that the approximate KKT conditions are satisfied when their convergence criteria are met, in which case it is not necessary to check the approximate KKT conditions.

Algorithm 3 (Derivative-free Augmented Lagrangian for general constraints in the lower level)

Let τ ∈ [0, 1), y > 1, λ_mim< λ_max, μ_max> 0, ρ1 > 0, ¹, ¹ and {εk} be as in Algorithm 1. Steps 1, 3 and 4 of Algorithm 3 are identical to those of Algorithm 1. Step 2 is replaced by the following:

Step 2. Solve the subproble.

Choose a tolerance δk such that

Let {δ_k,v} be such that δ_k,v ∈ (0, δ_k ] for all v and

Step 2.1. Set v - 1.

Step 2.2. Compute x^k,v.

Step 2.3. If x^k,v does not satisfy the intrinsic stopping criterion of the admissible subalgorithm, set v -- v + 1 and go to Step 2.2.

Step 2.4. If ||h(x^k,v) || > ε_k or there exists i ∈ {1, ..., p} such that g_i(x^k,v) > ε_k, set v -- v + 1 and go to Step 2.2.

Step 2.5. Compute

Step 2.6. Define

Solve, with respect to (v, w), the following problem:

Step 2.7. If, at a solution (v, w) of (28) we have that

define v*(k) = v, x^k = x^k,v*^(k), v^k = v, w^k = w and go to Step 3. Else, set v v + 1 and go to Step 2.2.

The purpose of Step 2 above is to find a point x^k,v*(k) and multipliers v^k, w^k that verify (5-8), but with the actual gradient of ∇L_pk replaced by the simplex gradient G_pk. At (28) we compute approximate Lagrange multipliers v, w with respect to lower-level constraints. In some cases, the basic admissible algorithm may provide these multipliers so that the step (28) may not be necessary. If || || is the Euclidean norm, (28) involves the minimization of a convex quadratic with non-negative constraints. So, its resolution is affordable in the context of derivative-free optimization, where functions evaluations are generally very expensive, since neither f, g nor h are computed at (28).

In the following theorem we prove that Algorithm 3 is well defined. This amounts to show that the loop defined by Steps 2.1-2.7 necessarily finishes for some finite v.

Theorem 3.Assue that, for all k = 1 , 2, . . . an adissible algorith with respect to (26) is available. Assume, moreover, that the first derivatives of f, h, g, h, g exist and are Lipschitz on a sufficiently large set. Then, Algorithm 3 is well defined.

Proof. For fixed k, assume that the admissible algorithm, whose existence is postulated in the hypothesis, computes a sequence {x^k,v}. Since f, h, g have Lipschitz gradients, there exists a Lipschitz constant p_kM for the function ∇L_pk (x^k,v, ^k, ^k) that is valid for all x in a sufficiently large ball that contains the whole sequence and (25) is satisfied. Let v₁ be such that

for all v > v1 (note that it is not necessary to know the value of M at all). By the definition of admissible algorithm, there exists v > v₁, v^k,v ∈ R^m, w^k,v ∈ R+ such that:

Now, by (25) and (30),

Thus, by (31),

therefore, by (32-34), the solution of problem (28) necessarily satisfies (29). This means that the loop at Step 2 of Algorithm 3 necessarily finishes in finite time, so the algorithm is well defined.

Theorem 4.In addition to the hypotheses of Theorem 3, assue that there exists ε > 0 such that the set defined by ||h(x)|| < ε, g(x) < ε is bounded. Then, the sequence {x^k} defined by Algorithm 3 is well defined and bounded. oreover, ifx^* is a limit point of this sequence, we have:

1. The lower-level constraints are satisfied at x* (h(x*) = 0, g(x*) < 0).

2. If the sequence ofpenalty paraeters {p_k} is bounded, x^* is a feasible point of (1). Otherwise, at least one of the following two possibilities holds:

The point x * satisfies the KKT conditions of the problem
The CPLD constraint qualification corresponding to the lower-level constraints h(x) = 0,

g(x) < 0

does not hold at x*.

3. If x * is a feasible point of (1) then at least one of the following possibilities hold:
The KKT conditions of (1)

are fulfilled at x *.
The CPLD constraint qualification corresponding to all the constraints of (1) (h(x) = 0, g(x) < 0, h(x) = 0, g(x) < 0) does not hold at x^*.

Proof. By Theorem 3, the sequence {x^k} is well defined. Since ε_k < εfor k large enough, all the iterates of the method belong to a compact set from some iteration on. This implies that the whole sequence is bounded, so limit points exist.

By (25) we have that:

Therefore, since, by (27), δ_k,v*(k) < δ_k and pk δ_k< εk,

Then, by (29),

Moreover, by Step 2 of Algorithm 3,

and

Therefore, redefining ε_k « (1 + M)ε_k, we have that Algorithm 3 can be seen as a particular case of Algorithm 1. So the thesis of Theorem 1 holds for Algorithm 3 and the theorem is proved.

In Theorem 4 we proved that the only requirement for Algorithm 3 to be a satisfactory algorithm for solving (1) with arbitrary lower level constraints is the availability of an admissible algorithm for solving the subproblems. Now we are going to show that, for many reasonable lower-level constraints, Algorithm 2 may be such admissible algorithm. As we mentioned before, reasonable admissible algorithms has been already introduced by several authors for particular classes of lower-level sets. Here we wish to show that, as a last resource, even Algorithm 2 may also be used as admissible algorithm for solving subproblems.

The special case of lower-level constraints that we wish to consider is almost as general as it could be. Essentially, we are going to consider arbitrary lower level constraints with bounds on the variables. More precisely, we define:

where h and g are as defined in the Introduction. A very important particular case of (35) is when Ω is described by linear (equality and inequality) constraints and bounds.

For proving the main admissibility result we will need the following assumption.

Assumption A.

If x is a stationary (KKT) point of the problem

then

Theorem 5.Assume that F: Rⁿ ®R is continuously differentiable and suppose that h, g, l, u satisfy Assumption A. Then Algorithm 2 is an admissible algorith for the problem

Proof. Assume that we apply Algorithm 2 to the problem above. Therefore, F plays the role of f, h and g stand for h and g, respectively.

First observe that, as required by the definition of admissibility, the generated sequence is well defined and bounded.

Without loss of generality, let us identify {x^v} as a convergent subsequence of the sequence generated by Algorithm 2. By Theorem 2, Algorithm 2 is a particular case of Algorithm 1. This implies that, for a given ε > 0 and v large enough, condition (21) necessarily holds.

Also by Theorem 2, every limit point is a stationary point of the infeasibility measure subject to the bounds. Therefore, by Assumption A, (22) and (24) also hold for v large enough.

Finally, let us prove that (23) also takes place for v large enough. Suppose, by contradiction, that this is not true. Therefore, there exists S > 0 such that, for all v large enough, there exits i = i(v) satisfying

Without loss of generality, since the number of different indices of i is finite, we may assume that i = i (v) for all v. Hence, in the limit, one has g_i. (x*) < 0. Then, by Lemma 2, w_i^v = 0 for v large enough. This is a contradiction, which ends the proof of the theorem.

Theorem 5 shows that Algorithm 2 is an admissible algorithm. Hence, by Theorem 3, it is possible to use Algorithm 2 to generate a sequence {x^k,v} that can be employed in Step 2 of Algorithm 3, when one is attempting to solve (1) with Ω as in (35).

5 Numerical experiments

In this section, we present some computational results obtained with a Fortran 77 implementation of Algorithm 2. Since there is freedom on the choice of the algorithm for the box constrained subproblems, we tested three derivative-free solvers: Coordinate Search [16], the BOBYQA software [25], based on quadratic approximations and trust region techniques, and the well known Nelder-Mead algorithm [22]. In order to satisfy the condition (14), the points returned by BOBYQA or Nelder-Mead are taken as knots of meshes with density 8k and, if necessary, other points on these meshes are visited. Eventually, a point satisfying (14) is found. This means that BOBYQA and Nelder-Mead may be followed by some iterations of Coordinate Search. On the other hand, Coordinate Search satisfies (14) naturally, if we set δ_k as the step of the search on the exit of the algorith. hen using the elder-ead algorith for the subproblems, we force f (x) = ∞ for x ∉ Ω, to make sure that all generated points lie within the box. We will call the Augmented Lagrangian with BOBYQA, Nelder-Mead and Coordinate Search as subsolvers, respectively, as AL-BOBYQA, AL-NMead and AL-CSearch.

We say that the Augmented Lagrangian found a solution of a problem if (14) is achieved with tolerance δ_opt at a point that is sufficiently feasible, in the sense that, for a tolerance εf_eas, max{||h()||, || V() ||} < εf_eas. We consider that the algorithm fails in the attempt of solving the problem in the following cases:

Failure 1: when it cannot find the solution in up to 50 outer iterations,

Failure 2: when it performs 9 outer iterations without improving the feasibility,

Failure 3: when it evaluates the Augmented Lagrangian function more than 10⁶ times in a single call of the subsolver.

5.1 Hock-Schittkowski problems

The first set of problems we considered was the Hock-Schittkowski collection [13], which is composed by 119 differentiable problems. We solved only the 47 problems that have general and box constraints simultaneously. The dimension of the problems varies between 2 and 16, while the number of constraints are between 1 and 38, exceeding 10 in only 5 cases. We used δopt = εf_eas = 10⁵. Table 1 shows the performance of each algorithm for all problems, with respect to the objective function on the solution, f(x^*), and the number of function evaluations, feval, while Table 2 lists the feasibility reached in each case. It is worth noticing that it was required as a stopping criterion for the Augmented Lagrangian algorithm that max{||h(

)||, || V(

) ||} < 10⁵, and for this reason, in all cases where the algorithm succeeded the measure of feasibility is small. But on the other hand, in the cases where the feasibility is substantially small than the tolerance, we have the impression that the respective algorithm is not having problem on achieving or maintaining the feasibility.

Thumbnail

The first line of Table 3 shows the percentage of problems that took 10 to 100 function evaluations to be solved, the second line report those that used 100 to 1000 function evaluations and so on. The last three lines sho the amount of failures.

Thumbnail

Table 4 shows which method has the best performance for each problem. The first line shows the percentage of problems for which each method evaluated less times the Augmented Lagrangian function, the second and third lines consider the occasions where each method was the second best and worst, respectively, while the last line informs the total amount of failures.

Thumbnail

We can see from the tables that BOBYQA was the most economic subproblem solver. On the other hand, even with the greatest number of function evaluations in many cases, AL-NMead was the most robust. Recall that we do not used the "pure" Nelder-Mead method as a subsolver, but the Nelder-Mead method plus Coordinate Search; the same comment can be made for BOBYQA. AL-CSearch exhibited an intermediate performance when we consider computational effort, but was the least robust among the three algorithms.

5.2 Minimum volume problems

The volume of complex regions defined by inequalities in R^d is very hard to compute. A popular technique, when one needs to compute a volume approximation, consists of using a Monte Carlo approach [6, 11]. We will use some small-dimensional examples (d = 2) in order to illustrate the behavior of derivative-free methods in problems in which volumes need to be optimized. In the following examples, one needs to minimize the Monte Carlo approximation of the area of a figure subject to constraints.

Suppose that we want to find the minimal Monte Carlo approximation area of the figure defined by the intersection between two circles that contains n_p given points. With this purpose, we draw a tight rectangle containing the intersection of the circles. Then, a large number of points is generated with uniform distribution inside the rectangle. The area of the intersection is, approximately, the area of the rectangle multiplied by the ratio of random points that dropped inside the intersection of the circles. This is the objective function, the simulated area, that has to be minimized. The constraints arise from the fact that the n_p points have to be inside the intersection. It is worth noticing that it is a different rectangle, which contains the intersection, at each iteration. Our intention is to achieve a good precision for the simulated area with a small amount of computational effort. We could put the figure of interest inside a very large rectangle, and keep this rectangle fixed within iterations. But, in this case, the computational effort would be possibly bigger.

We considered unions and intersections between three types of figures:

rectangles, defined by 4 parameters: the coordinates (x, y) of the bottom left vertex, the height and the width;

circles, defined by 3 parameters: the coordinates of the center and the radius;

ellipses, defined by 6 parameters, a, b, c, d, e and f, by the formulae ax² + 2bxy + cy² + dx + ey + f < 0.

Suppose that we want the point y to be inside some figure A (or B). This is achieved imposing a constraint of type c_A (y) < 0 (or c_B (y) < 0). If we want y to lie inside the intersection between figures A and B, we ask that c_A (y) < 0 and c_B (y) < 0, so that each point defines two inequality constraints, 2n_p at all. On the other hand, the number of constraints defined by unions are n_p, since each point must be inside figure A or figure B. This is achieved imposing that c(x) = minjc_A(x), c_B(x)} < 0.

We firstly considered the problem of minimizing the

1. intersection of two circles,

2. intersection between a rectangle and a ellipse,

3. union of a rectangle and a ellipse and

4. union of two ellipses.

In the four cases the problem consists of finding the smallest intersection (cases 1 and 2) or union (cases 3 and 4) area that contains a given set of points. In all the problems we considered the same set of n_p = 10 points given in Table 5.

Thumbnail

We adopted εf_eas = δ_opt = 10^-4. The initial guesses for circles and ellipses were chosen as circles of radius one centered at the origin and for rectangles, squares of side 2 centered at the origin. The value of n_p in our tests is 10, and the points are

The density of points generated in the simulation is 10⁵ per unit of area, but the maximum of points allowed is 10⁷. This means that, if the rectangle that contains the desired figure (used in the simulation of the area) has dimensions a x b, then the number of generated points is min{ 10⁵ ab, 10⁷}. The initial seeds used in the random points generator are the same at each call of the simulator, so the simulated area is always the same for the same parameters.

It is simple to find the smallest rectangle that contains a specific rectangle, circle or ellipse. The smallest rectangle that contains a given rectangle is the proper rectangle; for a circle or an ellipse it is the one that is tangent to these figures on four points that can be explicitly found. On the other hand, the area A of a figure F can be simulated generating n_g random points inside a rectangle with area A_R that contains F and counting the amount of points that lie inside the figure, n_i, so we have that A ≈ A_Rn_i/n_g. We use these two ideas to compute our objective function according to the following rules:

In the case of intersection between figures

F and

G, where

F and

G can be a rectangle, a circle or an ellipse, we compute the smallest rectangles

R_F and

R_G that contains each figure, and use the rectangle with smallest area between them to simulate the area of the intersection, generating random points inside it and counting the amount of points that lie on

F and

G. We are using the fact that the intersection of figures

F and

G is inside

R_F and is also inside

R_G.
In the case of union between two rectangles, a rectangle and a circle or two circles, we add the areas of the figures, since the expression to these areas is well known, and subtract the area of the intersection, which is computed by the method above. We are using the fact that the area of the union is the sum of the area of the figures minus the area of the intersection.
In the case of union between a circle and an ellipse, we simulate the area of the figure formed by the ellipse minus the circle, making the simulation inside the smallest rectangle that contains the ellipse. We count how many of the random points lie inside the ellipse but outside the circle, so the desired area is approximately the area of the circle plus the simulated area. The same method is used to simulate the area between a rectangle and an ellipse.
In the case of union between two ellipses, we find the smallest rectangle that contains both ellipses, and simulate the area of the union counting the amount of random points that lie inside the first or the second (or both) ellipse.

Of course, there are many methods to simulate the considered areas, and we are not claiming that the one we chose is the best. More efficient techniques to simulate these areas is a topic under consideration, and can be the scope of future works.

Figures 1-12 show the results obtained in some instances of area problems. The A value is the area of the figure, while feval is the number of function evaluations performed to find the solution.

Table 6 shows the objective function (the area) found and the number of function evaluations performed for each of the subalgorithms in the attempt of minimizing the area of different types of figures containing the 10 points of Table 5. I and U stand for intersection and union, while R, C and E mean rectangle, circle and ellipse, respectively. The last line shows the average area and average number of function calls among all problems.

Thumbnail

The simulated area is (slightly) discontinuous with respect to the parameters that define the figures. The reason is that this area can take only a finite number of values, depending on the number of points that belong to the region.

The impact of the discontinuities decreases with the number of points used in the simulation. (Note that the computer time of function evaluations is proportional to that number). Besides, the function c(x) defined for the union between figures has discontinuous derivatives. By these reasons, these problems do not satisfy the smoothness requirements of the convergence theorems presented in this paper. Despite of this fact, in all the cases our algorithm was able to find what seems to be a local minimizer. Since there are several local minimizers, the results show us how each subalgorithm behaved with respect to finding the smallest possible area and computer work (function evaluations) required for achieving a solution. The figures with the smallest and the largest simulated area were found by the Augmented Lagrangian algorithm with, respectively, Coordinate Search and BOBYQA plus Coordinate Search for the subproblems, and are shown in Figures 7 and 2. In terms of the average area among all figures, shown in the last line of Table 6, AL-CSearch had the best performance, followed by AL-NMead and AL-BOBYQA. On the other hand, AL-BOBYQA required less function evaluations on average, while AL-NMead was the most expensive option.

6 Final remarks

In this paper we defined new derivative-free algorithms for minimizing finitedimensional functions with two levels of constraints, following the Augmented Lagrangian approach of [1], in such a way that theoretical convergence results are preserved. We introduced an algorithm for the case in which lower-level constraints are defined by bounds on the variables, allowing us to compare the relative efficiency of different box-constraints derivative-free solvers in the context of Augmented Lagrangian subproblems. Then, we defined a new algorithm for dealing with arbitrary lower-level constraints. This algorithm generates subproblems that may be solved by procedures exhibiting an admissibility property. Admissible algorithms for the subproblems may include many of the recently introduced methods for constrained optimization when the constraints, generally speaking, admit extreme penalty approaches. Here, we showed that, besides the existing admissible algorithms for solving subproblems, the Augmented Lagrangian algorithm with box constraints considered first in this paper (Algorithm 2) may also be considered an admissible algorithm. The practical consequences of this observation remain to be exploited in future research.

From the computational point of view, we introduced a family of Volume Optimization problems, in which we optimize Monte Carlo approximations of volumes subject to constraints. We solved several toy problems of this class, which allowed us to compare different alternatives for solving box-constraint minimization problems in the context of Augmented Lagrangian subproblems without derivatives. We believe that these problems emulate some of the common characteristics of real-life derivative-free optimization problems, especially those originated in simulations.

Acknowledgements. We are indebted to the anonymous referees whose comments helped us to improve the first version of this paper.

Received: 15/08/10.

Accepted: 05/01/11.

[1] R. Andreani, E.G. Birgin, J. M. Martinez and M. L. Schuverdt, On Augmented Lagrangian Methods with general lower-level constraints. SIAM Journal on Optimization, 18 (2007), 1286-1309.
[2] R. Andreani, J.M. Martinez and M.L. Schuverdt, On the relation between the Constant Positive Linear Dependence condition and quasinormality constraint qualification. Journal of Optimization Theory and Applications, 125 (2005), 473485.
[3] C. Audet and J.E. ennis Jr, esh adaptive direct search algoriths for constrained optimization. SIAM Journal on Optimization, 17 (2006), 188-217.
[4] C. Audet, J.E. Dennis Jr. and S. Le Digabel, Globalization strategies for Mesh Adaptive Direct Search. Computational Optimization and Applications, 46 (2010), 193-215.
[5] E.G. Birgin, C. Floudas and J.M. Martinez, Global minimization using an Augmented Lagrangian method with variable lower-level constraints Mathematical Programming, 125 (2010), 139-162.
[6] R.E. Caflisch, Monte Carlo and quasi-Monte Carlo methods, Acta Numerica, Cambridge University Press, 7 (1998), 1-49.
[7] A.R. Conn, N.I.M. Gould and Ph.L. Toint, A globally convergent Augmented Lagrangian algorithm for optimization with general constraints and simple bounds. SIAM Journal on Numerical Analysis, 28 (1991), 545-572.
[8] A.R. Conn, N.I.M. Gould and Ph.L. Toint.,Trust Region Methods. MPS/ SIAM Series on Optimization, SIAM, Philadelphia (2000).
[9] A.R. Conn, K. Scheinberg and L. N. Vicente, Introduction to Derivative-Free Optimization. MPS-SIAM Series on Optimization, SIAM, Philadelphia (2009).
[10] A.L. Custódio and L.N. Vicente, Using sampling and simplex derivatives in pattern search methods. SIAM Journal on Optimization, 18 (2007), 537-555.
[11] B.P. Demidovich and I.A. Maron, Computational Mathematics. Mir Publishers, Moscow, (1987).
[12] M.R. Hestenes, Multiplier and gradient ethods. Journal of ptiization Theory and Applications, 4 (1969), 303-320.
[13] W. Hock and K. Schittkowski, Test Examples for Nonlinear Programming Codes. Lecture Notes in Economics and Mathematical Systems, Springer, 187 (1981).
[14] T.G. Kolda, R.M. Lewis and V. Torczon, A generating set direct search augmented Lagrangian algorithm for optimization with a combination of general and linear constraints. Technical Report, SAND2006-5315, Sandia National Laboratories, (2006).
[15] T.G. Kolda, R.M. Lewis and V. Torczon, Stationarity results for generating set search for linearly constrained optimization. SIAM Journal on Optimization, 17 (2006), 943-968.
[16] R.M.. Lewis and . Torczon, Pattern search algorithms for bound constrained minimization. SIAM Journal on Optimization, 9 (1999), 1082-1099.
[17] R.M.. Lewis and . Torczon, Pattern search algorithms for linearly constrained minimization. SIAM Journal on Optimization, 10 (2000), 917-941.
[18] R.M. LewisandV. Torczon, A globally convergent Augmented Lagrangian pattern search algorithm for optimization with general constraints and simple bounds. SIAM Journal on Optimization, 12 (2002), 1075-1089.
[19] R.M. Lewis and V. Torczon, A direct search approach to nonlinear programming problems using an Augmented Lagrangian method with explicit treatment of linear constraints. Technical Report WM-CS-2010-01, College of William & Mary, Department of Computer Sciences, (2010).
[20] S. Lucidi, M. Sciandrone and P. Tseng, Objective derivative-free methods for constrained optimization Mathematical Programming, 92 (2002), 37-59.
[21] O.L. Mangasarian and S. Fromovitz, The Fritz-John necessary optimality conditions in presence of equality and inequality constraints. Journal of Mathematical Analysis and Applications, 17 (1967), 37-47.
[22] J.A. Nelder and R. Mead, A simplex method for function minimization. The Computer Journal, 7 (1965), 308-313.
[23] L.G. Pedroso, Programação não linear sem derivadas. PhD thesis, State University of Campinas, Brazil, (2009).
[24] M.J.D. Powell, A method for nonlinear constraints in minimization problems, in Optimization, R. Fletcher (ed.). Academic Press, New York, NY, pp. 283-298, (1969).
[25] M.J.D. Powell, The BOBYQA algorithm for bound constrained optimization without derivatives. Cambridge NA Report NA2009/06, University of Cambridge, Cambridge, (2009).
[26] L. Qi and Z. Wei, On the constant positive linear dependence condition and its application to SQP methods. SIAM Journal on Optimization, 10 (2000), 963-981.
[27] R.T. Rockafellar, A dual approach for solving nonlinear programming problems by unconstrained optimization Mathematical Programming, 5 (1973), 354-373.
[28] R.T. Rockafellar, Lagrange multipliers and optimality. SIAM Review, 35 (1993), 183-238.

*

This work was supported by PRONEX-Optimization (PRONEX - CNPq/ FAPERJ E-26/ 171.510/2006 -APQ1), FAPESP (Grants 2006/53768-0 and 2004/15635-2) and CNPq.

Publication Dates

Publication in this collection
22 Mar 2011
Date of issue
2011

History

Received
2010
Accepted
05 Jan 2011

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

[1] [1] R. Andreani, E.G. Birgin, J. M. Martinez and M. L. Schuverdt, On Augmented Lagrangian Methods with general lower-level constraints. SIAM Journal on Optimization, 18 (2007), 1286-1309.

[2] [2] R. Andreani, J.M. Martinez and M.L. Schuverdt, On the relation between the Constant Positive Linear Dependence condition and quasinormality constraint qualification. Journal of Optimization Theory and Applications, 125 (2005), 473485.

[3] [3] C. Audet and J.E. ennis Jr, esh adaptive direct search algoriths for constrained optimization. SIAM Journal on Optimization, 17 (2006), 188-217.

[4] [4] C. Audet, J.E. Dennis Jr. and S. Le Digabel, Globalization strategies for Mesh Adaptive Direct Search. Computational Optimization and Applications, 46 (2010), 193-215.

[5] [5] E.G. Birgin, C. Floudas and J.M. Martinez, Global minimization using an Augmented Lagrangian method with variable lower-level constraints Mathematical Programming, 125 (2010), 139-162.

[6] [6] R.E. Caflisch, Monte Carlo and quasi-Monte Carlo methods, Acta Numerica, Cambridge University Press, 7 (1998), 1-49.

[7] [7] A.R. Conn, N.I.M. Gould and Ph.L. Toint, A globally convergent Augmented Lagrangian algorithm for optimization with general constraints and simple bounds. SIAM Journal on Numerical Analysis, 28 (1991), 545-572.

[8] [8] A.R. Conn, N.I.M. Gould and Ph.L. Toint.,Trust Region Methods. MPS/ SIAM Series on Optimization, SIAM, Philadelphia (2000).

[9] [9] A.R. Conn, K. Scheinberg and L. N. Vicente, Introduction to Derivative-Free Optimization. MPS-SIAM Series on Optimization, SIAM, Philadelphia (2009).

[10] [10] A.L. Custódio and L.N. Vicente, Using sampling and simplex derivatives in pattern search methods. SIAM Journal on Optimization, 18 (2007), 537-555.

[11] [11] B.P. Demidovich and I.A. Maron, Computational Mathematics. Mir Publishers, Moscow, (1987).

[12] [12] M.R. Hestenes, Multiplier and gradient ethods. Journal of ptiization Theory and Applications, 4 (1969), 303-320.

[13] [13] W. Hock and K. Schittkowski, Test Examples for Nonlinear Programming Codes. Lecture Notes in Economics and Mathematical Systems, Springer, 187 (1981).

[14] [14] T.G. Kolda, R.M. Lewis and V. Torczon, A generating set direct search augmented Lagrangian algorithm for optimization with a combination of general and linear constraints. Technical Report, SAND2006-5315, Sandia National Laboratories, (2006).

[15] [15] T.G. Kolda, R.M. Lewis and V. Torczon, Stationarity results for generating set search for linearly constrained optimization. SIAM Journal on Optimization, 17 (2006), 943-968.

[16] [16] R.M.. Lewis and . Torczon, Pattern search algorithms for bound constrained minimization. SIAM Journal on Optimization, 9 (1999), 1082-1099.

[17] [17] R.M.. Lewis and . Torczon, Pattern search algorithms for linearly constrained minimization. SIAM Journal on Optimization, 10 (2000), 917-941.

[18] [18] R.M. LewisandV. Torczon, A globally convergent Augmented Lagrangian pattern search algorithm for optimization with general constraints and simple bounds. SIAM Journal on Optimization, 12 (2002), 1075-1089.

[19] [19] R.M. Lewis and V. Torczon, A direct search approach to nonlinear programming problems using an Augmented Lagrangian method with explicit treatment of linear constraints. Technical Report WM-CS-2010-01, College of William & Mary, Department of Computer Sciences, (2010).

[20] [20] S. Lucidi, M. Sciandrone and P. Tseng, Objective derivative-free methods for constrained optimization Mathematical Programming, 92 (2002), 37-59.

[21] [21] O.L. Mangasarian and S. Fromovitz, The Fritz-John necessary optimality conditions in presence of equality and inequality constraints. Journal of Mathematical Analysis and Applications, 17 (1967), 37-47.

[22] [22] J.A. Nelder and R. Mead, A simplex method for function minimization. The Computer Journal, 7 (1965), 308-313.

[23] [23] L.G. Pedroso, Programação não linear sem derivadas. PhD thesis, State University of Campinas, Brazil, (2009).

[24] [24] M.J.D. Powell, A method for nonlinear constraints in minimization problems, in Optimization, R. Fletcher (ed.). Academic Press, New York, NY, pp. 283-298, (1969).

[25] [25] M.J.D. Powell, The BOBYQA algorithm for bound constrained optimization without derivatives. Cambridge NA Report NA2009/06, University of Cambridge, Cambridge, (2009).

[26] [26] L. Qi and Z. Wei, On the constant positive linear dependence condition and its application to SQP methods. SIAM Journal on Optimization, 10 (2000), 963-981.

[27] [27] R.T. Rockafellar, A dual approach for solving nonlinear programming problems by unconstrained optimization Mathematical Programming, 5 (1973), 354-373.

[28] [28] R.T. Rockafellar, Lagrange multipliers and optimality. SIAM Review, 35 (1993), 183-238.

Brasil

Brasil

Derivative-free methods for nonlinear programming with general lower-level constraints

Abstract

Publication Dates

History