Brazilian Journal of Chemical Engineering
Print version ISSN 0104-6632
Braz. J. Chem. Eng. vol. 14 no. 1 São Paulo Mar. 1997
http://dx.doi.org/10.1590/S0104-66321997000100006
DEALING WITH INCONSISTENT QUADRATIC PROGRAMS IN A SQP BASED ALGORITHM
M.T. de Gouvêa and D. Odloak
Departamento de Engenharia Química - Cx.Postal 61548 - 05424-970 - São Paulo, SP - Brazil
e-mail: odloak@usp.br
(Received: July 10, 1996; Accepted: January 29, 1997)
Abstract - In this paper we present a new sequential quadratic programming SQP algorithm that does not need any preprocessing phase. Its main features consist of not enforcing the Hessian to be positive definite and of dealing with infeasible QP problems in a novel way. The linearized constraints are analyzed and all those that do not belong to the minimal representation of the feasible region are eliminated. In the approach used the convergence rate of the algorithm may be adjusted by the user by properly selecting some tuning parameters that are also used to improve the robustness of the algorithm. The SQP code presented here is also able to deal with bound constraints that may be linearly dependent on the linearized equality or original constraints. The algorithm is shown to be robust and to perform well for small to medium-sized problems.
Keywords: Nonlinear programming, optimization, real-time optimization, sequential quadratic programming algorithm
INTRODUCTION
There is a great interest in solving a real-time optimization problem. That is so as savings in the production can be obtained so the process will become more competitive. One approach to solving the real-time optimization problem is to perform an economical optimization and generate set-points to the controllers. This can be mathematically stated as problem (P1).
(P1)
The function f(x) is the economical objective and h(x) and g(x) are process constraints. Problem (P1) is a general nonlinear programming problem (NLP) and for most of the real applications it will be both highly nonlinear and nonconvex. Thus it will be difficult to solve.
Several codes for solving the general nonlinear programming problem exist. However, there is still not a definite standard code. At present the sequential quadratic programming (SQP) method is considered to be the most efficient method for solving it, at least for small to medium-sized problems, as cited by many different authors (Heinz and Spelluci, 1994; Tamura and
Kobayashi, 1991; Bartholomew-Biggs and Hernandez, 1995). For large problems several attempts have been made to make this algorithm not only reliable but efficient (Schmid and Biegler, 1994; Bartholomew-Biggs and Hernandez, 1995).
In order to perform a real-time optimization, problem (P1) must be solved on-line. This requires that the optimization method used must be both efficient and robust. By efficient we mean that the problem must be solved in less time than the sampling/intervention period. By robust we mean that a local optimal solution must always be obtained or at least the algorithm must converge to a feasible point so that the system is maintained under control.
The SQP algorithm is a sequence of steps. So in order to achieve the desired characteristics, each of the steps must be solved efficiently and robustly. Differences in the codes can therefore be viewed in regard to how these steps are being solved.
Given the Lagrangian of problem (P1) (equation 1), the general SQP algorithm can be stated as follows: "starting from (x^{o}, l ^{0} , m ^{o}) obtain a sequence of values {x^{k}, l ^{k}, m ^{k}} that converges to an optimal solution (local or global) of problem (P1)."
(1)
where l , m are the Lagrange multipliers of the restrictions of the equalities and inequalities.
The sequence of values {x^{k}, l ^{k}, m ^{k}} is formed by the following series of steps:
1. Initialize x^{o} , l ^{o} and m ^{o}.
2. Form a quadratic programming QP subproblem in the form of (P2) and solve it, yielding search direction d^{k}
(P2)
where H is an approximation of the Hessian of the Lagrangian function (1) at the point (x^{k}, l ^{k}, m ^{k}), l^{d }= l - x^{k} and u^{d} = u - x^{k} .
3. Define a merit function (F ) and a penalty parameter r and, using the solution obtained in step 2, solve a line-search problem in the form of (P3) to obtain a ^{k} and form x^{k+1 }= x^{k }+ a ^{k}d^{k}
(P3)
4. Check the optimality conditions. If they are satisfied stop, otherwise update the approximation of the Hessian of the Lagrangian and return to step 2.
Most of the published SQP algorithms need a preprocessing phase to form the starting points which can be costly. It is generally acknowledged that the critical step in the algorithm is the second, i.e., the formulation and resolution of the QP subproblem. This is so as the QP problem formed at any intermediate iteration may have no feasible solution. Also the QP subproblem is normally solved by an active-set algorithm, which is exponential in time.
When there are certain types of inconsistencies in the QP subproblem, the SQP algorithm is obliged to return to the preprocessing phase. Thus, it is interesting to have an algorithm that does not need any such phase. This is the main characteristic of the SQP algorithm presented in this paper. Its main difference from other published algorithms is that it treats inconsistencies in the linearized constraints by deleting them when this is possible instead of making use of artificial penalty variables.
In section II of this paper we briefly discuss some recently published SQP methods and make more comments on the general steps of the SQP algorithm outlined above. In section III we describe our present version of the algorithm. In section IV we present some numerical tests and we conclude the paper in section V.
THE SEQUENTIAL QUADRATIC PROGRAMMING ALGORITHMS
Observing the general steps of an SQP algorithm some major aspects must be taken into consideration, mainly:
- How is the Hessian of the Lagrangian function calculated?
- How must we define the merit function and how is the line search problem solved?
- How can we formulate and solve the QP subproblem?
The discussion that follows will try to answer these three questions.
The great majority of the existing SQP codes utilize the BFGS formula to obtain the approximation of the Hessian of the Lagrangian function. By doing so they force the Hessian to be positive definite (PD). This is a strong limitation for some kinds of problems, as what is really necessary is that the Hessian be PD in the null space of the active restrictions. For some problems forcing the Hessian to be PD can create some problems (Bartholomew-Biggs and Hernandez, 1995). Bartholomew-Biggs and Hernandez (1995) claim that the use of the analytical Hessian should be preferred and that during intermediate iterations this Hessian should only be forced to be PD or PSD, when it is not PSD in the null space of the active constraints. The use of analytical expressions of the Hessian is only possible for simple problems, which is not the case with, for example, chemical engineering problems. So instead of this the method of finite differences can be used to update the Hessian, as local R-superlinear convergence of the SQP algorithm may be guaranteed (Palomares and Mangasarian, 1976). While this approach is suitable for small to medium-sized problems, it can be very CPU costly for large problems. Also the finite difference method may yield severe rounding errors for very badly scaled problems. So some care must be taken when using this approach. How to adequately enforce positive definiteness is still an open research area. As this particular point is not the main focus of this paper no other comments will be here made. The reader is referred to Bartholomew-Biggs and Hernandez (1995) to obtain more details on this particular topic.
The second question, namely how to choose the merit function and how to solve the line-search problem will be only briefly discussed. All of the existing codes use a modified augmented Lagrangian function as a merit function. However, the number of existing functions used is enormous as practically each author seems to modify it a bit. Lately little attention has been paid to the best choice of the merit function, possibly because of the fact that practically all of the functions used perform quite well for the majority of problems tested. However, as pointed out by Stoer (1985) "any merit function if used with sufficiently nasty problems may have local minima" independently of the choice of the penalty parameter r . These may not be optimal points of (P1) and can even be unfeasible ones! The solution of the line-search problem is made by applying the Armijo test (Armijo, 1966). Although the procedure is quite simple, it can consume a significant amount of time. So the suggestions of Tamura and Kobayashi (1991) can improve the performance of this step.
Now we come to the third question which is the most delicate. The majority of the present SQP codes use an active-set strategy to solve the QP problem, as stated before. So a first commentary can be made in regard to this. The interior-point algorithms are often cited as a promising alternative, though they are very seldom used in the SQP algorithms. Obuchowska and Caron (1995) also claim that having a minimal representation of the feasible set of the problem to be solved is crucial for applying the interior-point methods. So here we come to the main focus of this paper, namely what this minimal representation is and when and how it can be obtained.
Boneh et al. (1993) classify the nonlinear constraints, dividing them into some major groups according to their main characteristics. The two major groups are the ones of the so-called redundant and necessary constraints. Necessary constraints are all those that must exist so that the description of the feasible region is guaranteed. Redundant constraints are all those that can be eliminated without changing the feasible region. The minimal representation of the feasible region is thus described by the necessary constraints and the optimal solution of the NLP problem must be within it.
When the nonlinear constraints are linearized, inconsistent or redundant constraints may appear as linear dependent ones. For example the linearized equality constraints may form a rank deficient matrix and this may cause the SQP algorithm to fail or to return to a preprocessing phase if these constraints are not properly analyzed. Equality constraints or bound constraints may be linear dependent on the remaining constraints. When this happens several algorithms presented in the literature will fail or will have to return to a preprocessing phase such as the one of Schmid and Biegler (1994). This is so as these algorithms make use of a decomposition scheme and linear dependent constraints may be transformed into null vectors.
So we see that there is an interest in knowing whether a constraint is redundant or inconsistent. Also if an algorithm eliminates all redundant constraints it will always find the optimal solution as long as the minimal representation of the constraint space is not empty. This is the general idea of the SQP code presented here, which is an extension of the one presented in Gouvêa and Odloak (1996).
If there are equality constraints one might think that a decomposition could be performed so as to reduce the size of the problem as proposed by Schmid and Biegler (1994) or Coleman (1984), for example. The reduced QP problem would have a reduced dimension and the complexity of the QP solver would be smaller. For chemical engineering problems this is of great interest as these problems often have a small number of degrees of freedom. When we think of that, then it seems quite surprising that so few codes presented in the literature perform a reduction of variables. One reason for that is that performing a decomposition may worsen problems arising from inconsistent linearizations. Some authors (Heinz and Spelucci, 1994) do the opposite, i.e., they transform equality constraints into inequalities. This is certainly not suitable for large problems. There are several procedures for decomposing the variables, the main ones are the orthogonal and orthonormal approaches. Schmid and Biegler (1994) make use of the first while we make use of the later. The orthonormal approach uses the QR factorization which may be time consuming and therefore is regarded by some researches with dislike. In spite of this the QR factorization possesses some very interesting properties. One of these is the fact that inconsistencies in the linearized constraints are easily detected.
Schmid and Biegler (1994) also transform inequality constraints into equality ones by adding slack variables. One reason for this behavior, is the fact that the decomposition algorithm is polynomial in time and the reduced problem will only have bounded variables and so it should be easily solved. Though it will be simpler in form it will still be exponential and for the first iterations the augmented dimension of the problem may slow the convergence of the QP solver. So we think that the user should be left to decide whether constraints should be transformed into other types or not. Also when we perform a decomposition on the variables, the Lagrangian multipliers of the equality constraints must be calculated or estimated as their values will be needed by the SQP algorithm to update the Hessian of the Lagrangian. The decomposition in the variables is done according to equation (2):
(2)
where y is the reduced decision variable, Z is a base of the kernel of , [Z T] is a base for the set formed by all the solutions of the linearized equality constraints and and p correspond to a particular solution of the equality linearized constraint, i.e., . The dimension of y is n_{r} which is equal to n-m_{r}; m_{r} is the rank of . For example, the particular solution can be obtained by solving the least-squares problem associated with the linearized equality constraint equation. It is noteworthy that the QR factorization obtained in the orthonormal decomposition approach can be used to yield the desired least-squares solution. Also note that some simplifications can be made in the decomposition procedure. For example, the decomposition may not need to be evaluated for each iteration of the SQP algorithm.
With the definition of Z and T as given before, the Lagrange multipliers associated with the equality constraints can be obtained as:
(3)
Schmid and Biegler (1993) claim that since exact values of the multipliers are only necessary at the optimal solution, the following approximation can be used:
(4)
Another interesting modification of the standard problem (P1) to be solved is the introduction of scaling factors such as those in Heinz and Spelucci (1994) or Rijckaert and Walraven (1985), which can be stated as problem (P4).
(P4)
The advantage of the formulation of problem (P4) is that numerical errors may be reduced and good and reliable results may be still obtained by simple precision arithmetic. The choice of the scaling factors is dependent on the problem to be solved and their values can be chosen by analyzing the condition number of the gradients of the nonlinear function at some nominal points. Note that (P4) can be transformed into problem (P1) by simply defining the following relations:
(P5)
Throughout paper whenever we refer to problem (P1) it is assumed that problem (P1) is defined by the relations (P5) applied in the formulation (P4). This is made to simplify the notation.
Also it may be strongly recommended to perform a scaling in the decision variables. Papalambros and Wilde (1988) suggest that a scaling as in equation (5) may be performed, where x represents the scaled variable and is the original decision variable.
(5)
The use of equation (5) is quite natural for chemical processes and can be interpreted as including the variables in the [0,1] interval. Throughout the paper we also assume that whenever necessary the variables are scaled according to equation (5).
So the reduced QP problem becomes:
(P6)
One last comment ought to be made now. When the base of the kernel of the equality constraints is also a base for the kernel of the bound constraints, these may be ignored by the SQP algorithm. Gouvêa and Odloak (1996) propose that for this case when the bounded variables are violated then the violated bound should be enforced. This procedure however may affect the convergence of the SQP algorithm. Here we show how the procedure presented there can be improved to avoid convergence problems. If we think of the real-time optimization problem, then bounded variables shall not be violated. At this point, the suggestion of Tamura and Kobayashi (1991) to evaluate the functions only when the variables do not exceed their bounds, appears interesting. However, how to actually implement this may be tricky, since the reevaluation of the variables whose bounds are violated should be carefully undertaken. According to our experience, for some problems this enforcement can slow down the convergence of the algorithm and may be unnecessary since for intermediate iterations it is not always bad to have unfeasible points due to their bounds.
THE PROPOSED ALGORITHM
In this section we present our implementation of the SQP algorithm. Basically the main steps are those of the general algorithm presented in section I though some other steps are introduced. These are presented next, yielding the algorithm A1.
Algorithm A1: The general SQP algorithm.
1. Add one variable to the scaled problem (P1) yielding problem (P7):
(P7)
2. Choose any starting point {x^{o},l ^{o},m ^{o}}. Also choose a parameter m ^{I}. This parameter will be used to enforce the values on the Lagrange inequality multipliers for unfeasible reduced QP problems (see algorithm A3).
3. Calculate . Use the finite difference approach to calculate the Hessian.
4. If is not of full row rank, apply algorithm A2 (presented later in this section).
5. Form the basis [Z T] by means of the orthonormal approach using the QR factorization obtained in step 4.
6. Form the reduced problem (P6).
7. Apply algorithm A3 (presented later in this section) to solve (P6).
8. Update l _{k} according to equation (4).
9. Use the merit function of Biegler and Cuthrell (1985) to obtain a _{k}.
10. Form x_{k+1}=x_{k}+a _{k}d_{k}.
11. Check the optimality conditions and stop or return to step 3.
The main differences between our algorithm and the others published are:
- We introduced step 1, where one variable was added to the problem. Introducing one single variable does not affect the complexity of the algorithm nor is the optimal solution changed. At the same time, introducing one single variable is sufficient for our algorithm to obtain the minimal representation of the feasible region or a so-called lesser representation. The term "lesser" refers to the minimal representation of the feasible reduced QP problem, as our algorithm possesses convergence properties for infeasible QP problems due to equality infeasible linearized constraints as shown in Gouvêa and Odloak (1996)
- Our algorithm does not need any preprocessing phase. That is, the QP solver will always obtain a solution. When the QP subproblem is feasible, the solution will be optimal. Infeasible QP reduced subproblems will be discussed later.
- In step 4 we check if the linearization of the equality constraints is consistent. Two major inconsistencies may be found: there may be null rows in the linearized constraints and there may be linear dependent ones. Both situations are treated. For the first case if there were null linearized constraints, they would be ignored which could lead to a failure of the SQP method. This happens with some codes presented in the literature. Our algorithm deals with them in a way that convergence is guaranteed. If there are linear dependent constraints, they can be either redundant or infeasible. In both cases, the constraints can be reduced in number and convergence of the algorithm can still be insured. So the treatment in step 4 avoids the need for a preprocessing phase and constitutes one of the advantages of our algorithm over others. Algorithm A2 presented here differs slightly from the one presented in Gouvêa and Odloak (1996). In this paper the algorithm is slightly more conservative so we avoid the need to study the nonlinear equality constraints to see if they are separable or if the Newton method will converge to a solution. On the other hand, by introducing small coefficients in the linearized constraints we may be introducing numerical problems if the numerical implementation of the algorithm is not adequate. Also the algorithm presented here is a bit slower as further inspections must be made. Mainly when some coefficients have small values, we may be introducing linear dependent constraints on the non-null constraints and these will not have to be considered.
- The way we solve the QP reduced problem is novel. No variables are introduced at this point. If the reduced QP problem is feasible, the minimal representation of the feasible region will be obtained even if there are redundant constraints. For infeasible QP problems two situations exist. If there are bound constraints whose kernel can be obtained in terms of the basis Z, the algorithm presented here will be able to converge to the optimal solution of problem (P1). The remaining infeasible QP subproblems are discussed in the next item.
- The way we treat infeasible QP reduced problems is also new. We do not introduce any artificial variables. So as a result we will have a QP problem that will cycle among some different search directions. Later we will make some further comments on these. What is important to notice here is that the Lagrange multipliers associated with these directions may have no physical meaning. Therefore instead of accepting the values calculated in the QP step, we enforce a value on these, which is supplied by the user in step 2 of algorithm A1 (m ^{I}). This procedure is not heuristic. An interpretation of it may be stated as follows. In the QP step some inequality constraints were activated and therefore a search direction is generated which corresponds to the Newton direction leading to the solution of the nonlinear activated set of equations. If these are really to be activated at the optimal solution then somewhere there will be an iteration of the SQP method for which the QP problem must cease to be infeasible. Its advantage over the existing methods is that the rate of convergence can be increased depending on how the parameter m ^{I} is chosen. This is an important result as far as we have achieved convergence of our SQP algorithm for some problems that are cited as not converging for other SQP based algorithms and for Augmented Lagrangian Algorithms (these last methods being referred to as being more robust)! Some of these problems will be presented in section IV.
- In our algorithm we allow a _{k} to be negative since this may enhance convergence for some problems. We only impose the following limit , where e is a small number dependent on the problem.
Now we will present algorithms A2 and A3.
Algorithm A2: Eliminating inconsistent linearizations from the equality set of constraints.
Hypothesis: is of rank nk £ m
1. Form .
2. Form the QR factorization of A, i.e. , , where R is of order nu and r_{i} are its diagonal elements. Set i=0.
3. Set i=i+1.
4. If r_{i} = 0 then permute the i^{th} column of A, so that it now occupies the last position. Store the index of the column permuted and also update the QR factorization.
5. If i < nu-1 return to 3 or else go to 6.
6. By the end of this procedure we will have the following structure: where R_{11} is of order and rank nk. The columns permuted correspond to rows that are linearly dependent on the others and therefore form an inconsistent linearization.
7. Eliminate all the inconsistent constraints (i.e., all the identified linear dependent rows in the matrix of equality linearized constraints, A^{T}).
8. For every null row of attribute a small number to any coefficient. This is necessary so that the algorithm will not ignore the constraint. At this point check if the imposed non-null constraints are linearly dependent on the other constraints in A^{T}. This can be done by introducing these constraints after row k of A^{T} and by introducing non-null coefficients in proper directions so as to have a linear independent set of equations that would cause a perturbation in the null constraints.
9. Update the QR factorization if this is the case and obtain the basis Z and T and the particular solution () using this factorization.
Algorithm A3: Solving the reduced QP problem.
1. Check if there are null inequality constraints. Introduce a small perturbation in one of their coefficients and store their indexes.
2. Form the reduced problem (P6) using the matrices Z and T obtained at the end of algorithm A2.
3. Store the indexes of the null reduced inequality constraints and of the null reduced bound constraints.
4. If there is not any null reduced constraint or if satisfies the bound constraints, go to step 5, otherwise perform the following:
- if then
- if then
5. Eliminate from problem (P4) all the null reduced constraints. The QP problem to be solved is of the following general structure:
where, ; A_{ineq} Î IR^{p}^{´}^{ nr} includes the bound constraints.
6. If the QP problem is solved for the first time, choose an active set by activating the first a constraints of A_{ineq}, where a is given as follows:
- if p > n_{r} then a = n_{r}
- if p £ n_{r} then a = p
If the QP problem is not being solved for the first time then a is the number of the active constraints obtained for the optimal solution of the last iteration of the SQP algorithm A1 and A contains the last optimal set of activated inequality constraints.
7. Obtain the following factorization .
8. Store all indexes of the null diagonal elements of R and attribute to r the number of null elements.
9. If there are null diagonal elements in the first n-r rows of R then permute rows and columns of so that the following partition is formed:
;
where is of full rank. Based on that, partition c as follows:
10. Update the QR factorization of if that is the case.
11. According to the partitioning obtained in step 9, obtain the correspondent partition of A and x, primarily:
12. Obtain the following matrices and vectors:
13. Perform the following steps to yield :
14. Form: and
15. Obtain the following factorization: M^{T} = Q_{2}R_{2}. The order of R_{2} is assumed to be r_{2} = r+a, and its diagonal elements are given by the coefficients rr_{i}. Set j=0 and e=0.
16. Set e=e+1
17. If rr_{e}=0 then j=j+1 and z_{j}=e and forms a unitary vector m_{e} Î IR^{r2} where its e^{th} element is the non-zero term.
18. If e<r_{2} return to 16
19. Set nj=j. If j¹ 0 then form , where each column of corresponds to the vector m_{e} and go to 20, otherwise perform the following:
Obtain: and go to step 25.
20. Update the QR factorization yielding
21. Perform the following operations: and set j=0 and s=0.
22. Set j=j+1. If and , do s=s+1 and store the index z_{j}.
23. If j<nj return to 22.
24. Apply the Pengrose theorem (Stewart and Sun, 1990) to all the s components of given by the indices z_{j} stored in step 22 to force them to be negative.
25. Form .
26. Obtain:
27. Permute the rows of x if step 9 was undertaken so that the original system is recuperated.
28. If all the elements of m are positive go to step 30; otherwise apply the procedure of Lucia and Xu (1990) to update the active set A and the number of active constraints a.
29. If the number of iterations exceeds a predefined value stop, the QP problem is infeasible and the optimal solution will be {x,m }; m =m ^{I}. Otherwise go to step 11.
30. The optimal solution of the QP problem is x and m .
Now some observation will be made in regard to the algorithm presented above, for instance:
1. If for any iteration of the QP problem there are no active constraints then a system of the form H x = -c^{T} is to be calculated. Solve it by the least-squares method and obtain the solution of minimal norm. Skip steps 11 to 26 and go directly to step 27.
2. In Gouvêa and Odloak (1996) some convergence properties are shown which are valid for the algorithm presented here in the following cases:
- feasible QP problems
- feasible QP reduced problems, i.e., there may be inconsistencies in the linearizations of the equality constraints. In this paper we extended algorithm A3 to be valid also for the problems in which the matrix Z is also a basis for the kernel of the bound constraints.
3. A special case of infeasible QP problems is the one having null linearized or null reduced inequality constraints. This kind of inconsistency is different from the others (where the intersection of the set formed by all constraints is empty) as when there are null constraints these will actually be ignored. If they are ignored the algorithm may move in a direction that will always be infeasible for the original NLP problem. So the procedure for overcoming this would be to make the algorithm force the constraints to not be ignored, i.e., a direction of search must be generated that will not produce a null gradient for the constraints. So this kind of problem may be overcome by introducing small perturbations in the null constraints (see step 1 of for example algorithm A3). Some care must be taken when the perturbation is introduced as numerical problems may arise if there is any ill-conditioning in the NLP problem to be solved.
4. So only one last group of problems remains to be analyzed, the problems which possess reduced infeasible non-null constraints. For these problems we have not yet established a convergence theorem. This does not make the algorithm presented here worse than others, since practically all existing codes may have convergence problems for sufficiently "nasty" problems. That is, convergence is not guaranteed for other published algorithms for "all" possible NLP problems either. Next we will make more comments on the procedure adopted in algorithm A3 to "treat" these infeasible problems. Recall that the solution adopted at the QP step was simply the last one obtained. This means that it will be dependent on the maximum allowable number of iterations. We have observed that by properly combining all the directions that will cause the present algorithm to cycle for infeasible problems, similar directions of search are obtained as in the different algorithms from the literature. The only hint is that convergence is not accelerated by doing so. Nor does convergence seem to be affected at all. So we do not see any reason for combining the cycling directions with the exception of the following two. First we could want to reduce the complexity of the QP algorithm so that the CPU time would not be too excessive. Second we would want the algorithm to have a behavior which was as similar as possible to that of the others so that they could be easily compared. The last aspect is not significant while the first must be regarded only when sufficiently large problems are of concern. As the complexity of the algorithm may be calculated a priori, the problem could be overcome by estimating the maximum number of iterations. This can be done as we actually know what it will be like since we know the number of constraints. As we use "warm" active sets this number can also be varied in the SQP algorithm. That is, the active constraints will not vary significantly from one iteration to the other and therefore there is no need (except in the first iterations of the SQP algorithm) to have a large number of maximum allowable iterations in the QP step.
5. Notice that step 6 of algorithm A3 is slightly different from the other approaches in the literature. We did this so as to limit the complexity of the algorithm to the "worst" case. The procedure adopted here seems to work very well in practice. Note that any other number of active constraints could be adopted and the convergence of the algorithm is not dependent on the given choice. One remark ought to be made though. If there are relatively few constraints and not all the variables are bounded the present procedure could lead to a case where both of the bounds on a same variable would be active. Although it is absurd to want to activate both bounds of a same variable, the present algorithm will deal with this problem naturally. That means that one or both bound constraints will automatically be deactivated.
SOME TEST PROBLEMS
In this section we will present some results obtained using a different sets of problems. Basically we used some problems presented in Dembo (1976), Hock and Schittkowski (1981), Biegler and Cuthrell, (1985), Edgar and Himmelblau, (1989), Floudas and Pardalos (1990), Schmid and Biegler, (1994), Psiaki and Park, (1995) and Ryoo and Sahinidis (1995). The selection was made so as to incorporate problems of different natures, that is with different characteristics. For example we included geometric problems and examples for which the Hessian is not PD at the optimal solution. We also included in the selection some of the so-called benchmark problems, generally used for testing NLP solvers. Most of them are nonconvex optimization problems. Some of them are very difficult in nature and convergence problems are cited in the literature. For all of them our solver converged. Table 1 shows the results of the problems selected in this paper which are presented in appendix A. In this appendix we make further comments on the problems including the reported failure of convergence of other NLP packages (including reduced gradient based methods, SQP algorithms and Augmented Lagrangian algorithms). The values in table 1 were obtained by simple precision arithmetic.
Table 1: Selection of test problems
Problem | Starting point | Optimal solution obtained by our | Number |
1a | [0 0 0]^{ T}; | [0 1 0]^{ T} | 10 |
1b | [24 5 0]^{ T}; | [0 1 0]^{ T} | 7 |
1c | [1 1 0]^{ T}; | [0 1 0]^{ T} | 3 |
2 | [0 0 0]^{ T}; | [-3.536 3.536 0]^{ T} | 21 |
3 | [0 0 0 0]^{ T}; | [4.2417 1.8471 -1.8964 0]^{ T} | 6 |
4 | [0 0 0]^{ T}; | [6 0.667 0]^{ T} | 4 |
5 | [0 0 0 0]^{ T}; | [6.2934 3.8218 201.16 0]^{ T} | 7 |
6a | [0 0 0]^{ T}; | [-1.414214 -1.414214 0]^{ T} | 9 |
6b | [2 2 0]^{ T}; | [1 1 0]^{ T} | 9 |
7a | [9700 18125 5000 200 25 0]^{ T}; m^{o},m ^{I}=0 | [579.31 1360.0 5110.0 182.02 | 9 |
7b | [0 0 0 100 100 0]^{ T}; m ^{o},m ^{I}=0 | [579.31 1360.0 5110.0 182.02 | 21 |
8a | [0 0 0 0 0]^{ T} ; | [1.33 4 0 0 0]^{T} | 8 |
8b | [1.5 2 1 0.5 0]^{ T} ; | [1.33 4 0 0 0]^{ T} | 3 |
8c | [3 4 2 1 0] ^{T}; | [1.33 4 0 0 0]^{ T} | 3 |
9a | [0 0.1 0] ^{T}; | [8.1706 7.5602 0]^{ T} | 21 |
9b | [57.9 15 0]^{ T} ; | [8.1706 7.5602 0]^{ T} | 14 |
10a | [0 0 0 0 0.1 0.1 0]^{ T} ; | [0.40787 0.070757 0.37401 | 28 |
10b | [0.5 0.5 0.5 0.5 8. 8. 0]^{ T}; | [0.96261 0.45920 0.036813 | 14 |
11a | [0 0.99 0]^{T}; | does not converge and was aborted in | |
11b | [0 0.99 0]^{T}; | [0.9687 -0.2481 0]^{T} | 77 |
12a | [-2 -3 6 6 6 6 6 0.7 0]^{T}; | [-1 -1 2.41 2.41 0.707 0.707 | 18 |
12b | [-2 -3 6 6 6 6 6 6 0]^{T}; | [-1 -1 2.41 2.41 0.707 0.707 | 27 |
13 | [3 4 2 1 0]^{T}; | [1.33 4 0 0 0]^{T} | 3 |
14 | [78.62 33.44 31.077 44.18 35.22 0]^{T}; | [78.000 33.000 29.996 45.004 | 8 |
15 | [0.87246 0.91597 0.60952 0.5 0.4 | [0.8490 0.4426 0.6062 0.6387 | 17 |
16a | [0.49494 0.44444 0.44444 0.19192 | [0.048148 0.039997 0.45666 | 10 |
16b | [0.49494 0.44444 0.44444 0.19192 | [0.048148 0.039997 0.45666 | 18 |
17a | [100 1000 1000 10 10 10 10 10 0]^{ T}; | [579.33 1360.1 5109.8 182.02 | 36 |
17b | [10000 10000 10000 1000 1000 1000 | [579.33 1360.1 5109.8 182.02 | 34 |
Next we will make some comments on the above results. In the collection of problems, we only scaled problems 15 to 17, as indicated in appendix A. Note that by properly adjusting the Lagrange multipliers our algorithm converged for all problems. This indicates that the algorithm is quite robust as convergence problems exist for other NLP codes for at least 8 of the above 17 problems, as indicated in appendix A, where further considerations are taken into account.
In problem 16 we were interested in showing the difference in the behavior of the algorithm for different problem formulations. Notice that when equality constraints were used the convergence was slowed down a bit. So Schmid and Bieglers (1994) idea of not using inequality constraints must be considered with some care.
Of the test problems selected some are very difficult. Two of them deserve special attention, mainly problems 11 and 12.
Problem 11 is very simple in nature so one might think that it could be easily solved. This is not so in reality since the gradient of the constraint possesses a strong curvature making this problem a very difficult one (see figure 1 in appendix A). If we look at what happens for each iteration we see that for either case 11a or 11b the algorithm yields an intermediate solution by iteration 20 very close to the optimal solution. In spite of this the algorithm is not able to change the search direction and the optimal solution is skipped and for the case 11a it will always be ignored while for case 11b it will be eventually obtained (starting at iteration 71). Also the majority of the iterations performed have a feasible QP problem. This means that global convergence is not guaranteed. This might mean that the merit function is not very adequately chosen for this problem. Also note how strongly this problem is dependent on the parameter m ^{I}. The number of iterations required to achieve convergence is very similar to the one needed by the augmented Lagrangian algorithm of Psiaki and Park (1995). So in spite of the large number obtained the result is indeed excellent.
Problem 12 is also a very difficult one though its form is also very simple. Note that Psiaki and Park (1995) only insured convergence for case 12a. The number of iterations needed for our method is of the same order as theirs. For case 12b our algorithm succeeded for the parameter m ^{I} given in table 1. However this parameter is dependent on the scaling performed. If for example we would like to scale the objective function by 0.1 and keep the value of m ^{I} then the number of iterations needed for the algorithm to converge to the optimal solution would be 82. If other values were chosen for m ^{I}, such as 0.1 or 10 or higher, we would start to have convergence problems. So we see that the convergence is not so easily established, though no exact values of m ^{I} are needed. That is, we use values which are multiples of 10 besides the null value (as we can see from table 1).
So we see that the algorithm presented here is quite robust and at the same time its iterations are not very costly in terms of the CPU time required. The results are encouraging and make it seem possible that the version presented here is suitable as a basis for constructing a solver for larger problems.
CONCLUSIONS
In this paper we presented a new SQP algorithm that showed to be robust and presented no severe convergence problems for all the sets of problems tested. The algorithm presented here treats inconsistent QP problems without the need of a preprocessing phase. This result is quite encouraging as preprocessing phases are usually very time consuming. The algorithm also has some parameters that can be properly chosen to better condition a problem like the scaling factors of the variables and nonlinear functions. Furthermore we introduced another way of dealing with infeasible QP problems and by doing so introduced a new parameter that can be adjusted to ensure both convergence and the rate of convergence of the SQP algorithm. Also it is not necessary that the Hessian be either PD or PSD. This is a nice feature for some kinds of problems. So the method presented here seems quite promising though it could be significantly improved, mainly by establishing a more efficient way of dealing with null linearized constraints and a better way of updating the Hessian. This last point is of great interest as the finite difference approach may lead to severe numerical problems.
ACKNOWLEDMENTS
Support from the Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP) under grants 93/3184-0 and 96/4482-2 is gratefully acknowledged.
NOMENCLATURE
d^{k} Search direction
Particular solution of the set of equality constraints
g(x) Inequality constraints
h(x) Equality constraints
H(x,l ,m ) Hessian of the Lagrangian
f(x) Objective function
l Lower bound on variables
l^{d} Lower bound for the QP problem
L(x,l ,m ) Lagrangian
m Number of equality constraints
n Number of decision variables
p Number of inequality constraints
u Upper bound
u^{d} Upper bound for the QP problem
T Basis for the complementary space of the kernel of the linearized equality constraint
x Decision variables
y Decision reduced variables
Z Basis for the kernel of the equality linearized constraint
Greek Letters
a ^{k} Step length obtained in the line-search step
l ^{k} Lagrange multipliers of the equality constraints
m ^{k} Lagrange multipliers of the inequality constraints
r Penalty parameter used in the merit function
s _{f} Scaling factor for the objective function
s _{g} Scaling factor for the inequality constraints
s _{h} Scaling factor for the equality constraints
s _{o} Scaling factor for the decision variables
s _{x} Scaling factor for the decision variables
F Merit function used in the line-search step
Abbreviations
L.D. Linear dependent
L.I. Linear independent
NLP Nnonlinear programming
PD Positive definite
PSD Positive semidefinite
QP Quadratic programming
SQP Sequential quadratic programming
REFERENCES
Armijo, L., Minimization of functions having Lipschitz continuous first derivatives. Pacif. J. of Math., v. 16, n.1, pp. 1-3 (1966). [ Links ]
Bartholomew-Biggs, M.C.; Hernandez, F.G., Using the KKT Matrix in an Augmented Lagrangian SQP Method for Sparse Constrained Optimization. J.O.T.A., v. 85, n.1, pp. 201-220 (1995). [ Links ]
Biegler, L.T.; Cuthrell, J.E., Improved Infeasible Path Optimization for Sequential Modular Simulations - II: the Optimization Algorithm. Comp. Chem. Engng., v. 9, n. 3, pp. 257-267 (1985). [ Links ]
Boneh, A; Boneh, S.; Caron, R., Constraint Classification in Mathematical Programming. Math. Progr., v. 61, n.1, pp. 61-73 (1993). [ Links ]
Chamberlain, R.M., Some examples of cycling in variable metric methods for constrained optimization. Math Progr., v. 16, pp. 378-384 (1979). [ Links ]
Coleman, T.F., Large Sparse Numerical Optimization. Lecture Notes in Computer Science, 165, Springer Verlag, Berlin (1984). [ Links ]
Dembo, R.S., A set of geometric programming test problems and their solutions. Math. Progr., v. 10, pp. 192-213 (1976). [ Links ]
Edgar, T.F.; Himmelblau, D.M., Optimization of Chemical Processes. MacGraw-Hill, Singapore (1989). [ Links ]
Floudas, C.A.; Pardalos, P.M., A Collection of test problems for constrained global optimization algorithms. Volume 455 of Lecture Notes in Computer Science. Springer Verlag (1990). [ Links ]
Gill, P.E.; Murray, W., In: Computational Mathematical Programming, K. Schittkowski, editor, 23/07 - 02/08, 1984, Bad Windsheim, Germany. NATO ASI Series. Springer-Verlag, pp.209-248 (1985).
Gouvêa, M.T.; Odloak, D., A new treatment of inconsistent quadratic programs in a SQP based algorithm. Submitted to Comput. and Chem. Eng. (1996). [ Links ]
Heinz, J.; Spelucci, P., A Successful Implementation of the Pantoja-Mayne SQP Method. Optimization Methods and Software, v. 4, n. 1, pp. 1-28 (1994). [ Links ]
Hock, W.; Schittkowski, K., Test examples for nonlinear programming codes. Lect. Notes in Economics and Mathematical Systems 187, Berlin-Heidelberg-New York, Springler (1981). [ Links ]
Obuchowska, W.T.; Caron, R., Minimal Representation of quadratically constrained convex feasible regions. Math. Progr., v. 68, n.2, pp. 169-186 (1995). [ Links ]
Palomares, U.R.G.; Mangasarian, O.L., Superlinearly convergent quasi-Newton algorithms for nonlinearly constrained optimization problems. Math. Programming., v. 11, pp. 1-3 (1976). [ Links ]
Pantoja, J.F.A.; Mayne, D.Q., Exact penalty function algorithm with simple updating of the penalty parameter. J.O.T.A., v. 69, pp. 441-467 (1991). [ Links ]
Papalambros, P.Y.; Wilde, D.J., Principles of optimal design: modeling and computation. Cambridge Academic Press, Cambridge (1988). [ Links ]
Psiaki, M.L.; Park, K., Augmented Lagrangian Nonlinear Programming Algorithm That Uses SQP and Trust Region Techniques. J.O.T.A., v. 86, n.2, pp. 311-325 (1995). [ Links ]
Rijckaert, M.J.; Walraven, J.C., Reflections on geometric programming. In: Computational Mathematical Programming, K. Schittkowski, editor, 23/07 - 02/08, 1984, Bad Windsheim, Germany. NATO ASI Series. Springer-Verlag, pp. 141-164 (1985).
Ryoo, H.S; Sahinidis, N.V., Global Optimization of nonconvex NLPs and MINLPs with applications in process design. Comput. and Chem. Engng., v. 19, n. 5, pp. 551-566 (1995). [ Links ]
Shittowski, K., More test examples for nonlinear programming codes. Lect. Notes in Ecomomics and Math. Systems, 282, Berlin-Heidelberg- New York, Springer (1987). [ Links ]
Schmid, C.; Biegler, L.T., Acceleration of Reduced Hessian methods for large-scale nonlinear programming. Comput. Chem. Engng., v. 17, n. 5/6, pp. 451-463 (1993). [ Links ]
Schmid, C.; Biegler, L.T., Quadratic Programming Methods for Reduced Hessian SQP. Computers. Chem. Engng., v. 18, n. 9, pp. 817-832 (1994). [ Links ]
Stewart, G.W.; Sun, Ji-Guang, Matrix Perturbation Theory. Academic Press, Inc. San Diego (1990). [ Links ]
Stoer, J., Principles of Sequential Quadratic Programming Methods for Solving Nonlinear Programs. In: Computational Mathematical Programming, K. Schittkowski, editor, 23/07 - 02/08, 1984, Bad Windsheim, Germany. NATO ASI Series. Springer-Verlag, pp. 165-208 (1985).
Tamura, M.; Kobayashi, Y., Application of Sequential Quadratic Programming Software Program to an Actual Problem. Math. Progr., v. 52, n. 1, pp. 19-27 (1991). [ Links ]
APPENDIX A - Sample problems
Problem 1 (Biegler and Cuthrell, 1985)
The starting points considered are all infeasible. The starting point [0 0 0]^{T} is considered to be a very poor one and convergence problems for it are found in the literature. The second starting point was chosen as it is very far from the optimal solution.
Problem 2 (Edgar and Himmelblau, 1989)
The starting point [0 0 0] was chosen so that the gradient of the equality constraint, is a null vector at this point and the inequality constraint is satisfied. So there are NLP codes in the literature that will have convergence problems for this particular starting point!
Problem 3 (Schmid and Biegler, 1994)
The starting point [0 0 0 0] was chosen as the first linearized equality constraint, is a null vector at this point. For this particular problem there is no need to introduce a perturbation on the first linearized constraint as constraint 2 will force at least one of the search directions to be different from zero.
Problem 4 (Ryoo and Sahinidis, 1995)
The authors suggest three starting points. We presented the results corresponding to only one case as this problem is not very difficult. The chosen point was the null vector as the first constraint will be a null vector for this point. The optimal solution presented in table 1 of section IV is the global optimal solution. Also noteworthy is the fact that the MINOS package is cited as not converging for this particular starting point!
Problem 5 (Ryoo and Sahinidis, 1995)
Again the null vector was chosen as the starting point as convergence problems may exist for this particular point. Note that this is a very interesting problem as the number of equality constraints is actually the same as the number of" real" variables, i.e., x_{4} is actually an artificial variable. This problem was selected as the procedure described in Schmid and Biegler (1994) will fail to converge. Also in our previous version of the SQP algorithm (Gouvêa and Odloak, 1996) we had convergence problems for this particular starting point.
Problem 6 ( Ryoo and Sahinidis, 1995)
The starting points were considered the same as the ones proposed by the authors and are presented in table 1 of section IV. The solution of case 6a is the best-known global optimal point, while the solution of case 6b is a local optimal point.
Problem 7 (Ryoo and Sahinidis, 1995)
The equality constraints were scaled as:
The starting points were considered the same as the ones proposed by the authors and are presented in table 1 of section IV.
Problem 8 (Ryoo and Sahinidis, 1995; Floudas and Pardalos, 1990)
The starting points were considered the same as the ones proposed by the first authors and are presented in table 1 of section IV. The solution presented in table 1 corresponds to the best-known global optimal solution.
Problem 9 (Ryoo and Sahinidis, 1995)
where a is a constant that can be used to improve numerical robustness. In the original paper a=0. When a=0.1 convergence occurs. The starting points were considered the same as the ones proposed by the first authors and are presented in table 1 of section IV. The MINOS package is cited as not converging for this problem.
Problem 10 (Ryoo and Sahinidis, 1995)
The starting points were considered the same as the ones proposed by the first authors and are presented in table 1 of section IV. This problem possesses many local optimal solutions. The global optimal solution presented by the authors is beyond the simple precision arithmetic used in our simulations. Also the solution obtained by them has no physical meaning as such a precision will never be able to be measured by any real sensors.
Problem 11 (Psiaki and Park, 1995)
The authors suggest the starting point of [0 0.99 0] as this is a very "bad" one. That is they chose a bad starting point on purpose to highlight what may happen if the gradients of the constraints possess strong curvatures. They also report that the NPSOL algorithm does not converge for this problem. The required number of iterations of their method is assumed to be 73. This is the number used for comparison with the SQP algorithm NPSOL. Note that this number is similar to ours. However some care must be taken as the algorithms are actually different, as the cost of each iteration is. Figure 1 shows the problem 11.
Figure 1: Representation of the constraint function of problem 11.
Problem 12 (Psiaki and Park, 1995)
The starting points were considered the same as the ones proposed by the authors and are presented in table 1 of section IV. The authors cited only the convergence obtained for the starting point of case 12a and the required number of iterations was 14.
Problem 13 (Psiaki and Park, 1995)
The starting point was considered the same as the one proposed by the authors. Note that for this starting point the linearized equality constraint will be a null vector and therefore the NPSOL solver will not converge.
Problem 14 (Dembo, 1976)
The values of the coefficients and of the bounds can be found in the reference cited. The starting point was considered the same as the one proposed by the authors and is presented in table 1 of section IV. For this problem no scaling was performed.
Problem 15 (Dembo, 1976)
Here we considered of Dembos (1976) problem 3 with the introduction of one artificial variable. The decision variables were scaled according to equation (7).
The objective function and the constraints were then scaled as follows:
The bound constraints were chosen to be 0 and 1. The same starting point as that cited in the reference was taken and is presented in its dimensionless form in table 1 of section IV. The optimal solution presented by the authors is a bit different from the one presented in table 1; the difference lies in the 5^{th} decimal digit and is due to the fact that we worked under simple precision arithmetic.
Problem 16 (Dembo, 1976)
Here we considered of Dembos (1976) problem 5 with the introduction of one artificial variable. The decision variables were scaled according to equation (7).
The artificial variable x_{9} was further scaled by 0.1. The objective function and the constraints were then scaled as follows:
The bound constraints on the scaled variables were chosen to be 0 and 1. The same starting point as cited in the reference was taken and is presented in dimensionless form in table 1 of section IV. The optimal solution presented by the authors is a bit different from the one presented in table 1; the difference lies in the 5^{th} decimal digit and is due to the fact that we used simple precision arithmetic. We considered two variants here. Case 16a corresponds to the original problem with bounds and in case 16b the inequality constraints were transformed into equality ones. The purpose of this was to compare the performance of the algorithm for two different formulations.
Problem 17 (Floudas and Pardalos, 1990)
Here we considered Floudas and Pardaloss (1990) problem 3.1.1 with the introduction of one artificial variable. The decision variables and the objective function were not scales. The inequality constraints (with the exception of the bound constraints) were scales as follows:
The authors do not suggest any starting point. So we chose the results corresponding to a starting point equal to the lower (case 17a) or upper (case 17b) bounds. The number of iterations used to obtain convergence does not change significantly from one starting point to another. The solution reported in table 1 is the best-known global optimal solution for the precision used.