SciELO - Scientific Electronic Library Online

 
vol.34 issue3OPTIMIZATION WITH LINEAR COMPLEMENTARITY CONSTRAINTSON AUTOMATIC DIFFERENTIATION AND ALGORITHMIC LINEARIZATION author indexsubject indexarticles search
Home Pagealphabetic serial listing  

Services on Demand

Journal

Article

Indicators

Related links

Share


Pesquisa Operacional

Print version ISSN 0101-7438

Pesqui. Oper. vol.34 no.3 Rio de Janeiro Sept./Dec. 2014

http://dx.doi.org/10.1590/0101-7438.2014.034.03.0585 

Articles

A SURVEY ON MULTIOBJECTIVE DESCENT METHODS

Ellen H. Fukuda1 

Luis Mauricio Graña Drummond2  * 

1Graduate School of Informatics, Kyoto University, 606-8501, Kyoto, Japan. E-mail: ellen@i.kyoto-u.ac.jp

2Faculty of Business and Administration, Federal University of Rio de Janeiro, 22290-240 Rio de Janeiro, RJ, Brazil. E-mail: bolsigeno@gmail.com

ABSTRACT

We present a rigorous and comprehensive survey on extensions to the multicriteria setting of three well-known scalar optimization algorithms. Multiobjective versions of the steepest descent, the projected gradient and the Newton methods are analyzed in detail. At each iteration, the search directions of these methods are computed by solving real-valued optimization problems and, in order to guarantee an adequate objective value decrease, Armijo-like rules are implemented by means of a backtracking procedure. Under standard assumptions, convergence to Pareto (weak Pareto) optima is established. For the Newton method, superlinear convergence is proved and, assuming Lipschitz continuity of the objectives second derivatives, it is shown that the rate is quadratic

Key words: multiobjective optimization; Newton method; nonlinear optimization; projected gradient method; steepest descent method

1 INTRODUCTION

In multiobjective (or multicriteria) optimization, finitely many objective functions have to be minimized simultaneously. Hardly ever a single point will minimize all of them at once, so we need another notion of optimality. Here, we will use the concepts of Pareto and weak Pareto optimality. A point is called Pareto optimal or efficient, if there does not exist a different point with the same or smaller objective function values, such that there is a strict decrease in at least one objective function value. A point is called weakly Pareto optimal or weakly efficient if there does not exist a different point with strict decrease in all its objective values. Applications for this type of problem can be found in engineering [14] (especially truss optimization [12,32], design [20,31], space exploration [39,41], statistics [10], management science [15,40,42], environmental analysis [34,16], etc.

One of the main solution strategies for multiobjective optimization problems is the scalarization approach [23,24,43]: we obtain the optimal solutions of the vector-valued problem by solving one or several parametrized single-objective (i.e., scalar-valued) optimization problems. Among the scalarization techniques, we have the so-called weighting method, where one minimizes a nonnegative linear combination of the objective functions [26,29,30,37]. The parameters are not known in advance and the modeler or decision-maker has to choose them. For some problems, this choice can be problematic, as shown in the example provided in [18], Section 7], where almost all choices of the parameters lead to unbounded scalar problems. Only recently, adaptive scalarization techniques, where scalarization parameters are chosen automatically during the course of the algorithm such that a certain quality of approximation is maintained, have been proposed [13,17]. Still other techniques, working only in the bicriteria case, can be viewed as choosing a fixed grid in a particular parameter space [11,12].

Multicriteria optimization algorithms that do not scalarize have recently been developed (see, e.g., [5,6] for an overview on the subject). Some of these techniques are extensions of scalar optimization algorithms (notably the steepest descent algorithm [19] with at most linear convergence), while others borrow heavily from ideas developed in heuristic optimization [33,38]. For the latter, no convergence proofs are known, and empirical results show that convergence generally is, as in the scalar case, quite slow [44]. Other parameter-free multicriteria optimization techniques use an ordering of the different criteria, i.e., an ordering of importance of the components of the objective function vector. In this case, the ordering has to be prespecified. Moreover, the corresponding optimization process is usually augmented by an interactive procedure [36].

Following the research line opened in 2001 with [19], other classical (scalar) optimization procedures where extended in recent years, not just for the multicriteria setting, but also for vector optimization problems, i.e., when other underlying ordering cones are used instead of the nonnegative orthant. In [28], a steepest descent method is proposed, while in [1,21,22,25], several versions of the projected gradient method are studied. Proximal point type methods are analyzed in [4]. Trust region strategies for multicriteria were developed in [8,9]. Even a steepest descent method for multicriteria optimization on Riemannian manifolds was developed in [2]. In 2009, Fliege et al. [18] came up with a multiobjective version of the Newton method, while later on, Graña Drummond et al. [27] proposed a Newton method for vector optimization.

These extensions share an essential feature: they are all descent methods, i.e., the vector objective value decreases at each iteration in the partial order induced by the underlying cones. Under reasonable assumptions, full convergence of sequences produced by these algorithms is established in all those works. As expected, for the vector-valued Newton methods, convergence is local and fast, while for the others it is global (and not so fast).

In this survey we study just three of these procedures: the steepest descent [19,28], the projected gradient [21,22,25] and the Newton [18,27] methods. For the sake of simplicity, we just analyze their multiobjective versions and, in order to emphasize the underlying ideas and avoid technicalities, we do not present the results in their maximum degree of generality. For instance, we assume that the objectives are continuously differentiable in the whole space, while, sometimes, we just need local differentiability, etc.

The outline of this paper is as follows. In Section 2 we establish the notation, as well as the basic concepts, and define the (smooth) unconstrained multiobjective problem. We also state and prove a simple but important result on descent directions. In Section 3, we study the multiobjective steepest descent method. First, we introduce the search direction and analyze its properties. Then we describe the algorithm and, finally, we perform the convergence analysis of the method. Basically, we establish that, under convexity of the objectives and another natural assumption, any sequence produced by the method is globally convergent to a weak Pareto optimal point. In Section 4 we define the smooth constrained multicriteria problem, as well as the multiobjective projected gradient algorithm and we present its convergence analysis, which is quite similar to the previous one. In Section 5, for twice continuously differentiable strongly convex objectives, we define the multiobjective Newton method and, assuming similar conditions as those of the previous cases, we establish its convergence to Pareto optimal points with superlinear rate. Under the additional hypothesis of Lipschitz continuity of the objectives' second derivatives, we prove quadratic convergence. In Section 6 we make some comments on how to obtain (strong) Pareto optima with the first two algorithms and on how to approach the whole efficient frontier with the three methods.

2 PRELIMINARIES

All over this work, (·,·) will stand for the usual canonical inner product in ℝp (here, we will just deal with p = n or p = m), i.e., (u, v)= ∑pi=1 uivi = uTv. Likewise, ||·|| will always be the Euclidean norm, i.e., ||u|| = . For an m × n real matrix A, ||A|| will denote its operator (Euclidean) norm. As usual, ℝ+ and ℝ++ designate the sets of nonnegative and positive real numbers, respectively.

Let ℝm+ := ℝ+ ×···× ℝ+ and ℝm++ := ℝ++ ×···× ℝ++ be the nonnegative orthant or Paretian cone and the positive orthant of ℝm, respectively. Consider the partial order induced by ℝm+: for u, v ∈ ℝm, uv(alternatively, vu)if v - u ∈ ℝm+, as well as the following stronger relation: u < v (alternatively, v > u) if v - u ∈ ℝm++. In other words, uv stands for uivi for all i = 1,..., m, and u < v should also be understood as a (strict) componentwise inequality.

Let

be a continuously differentiable mapping. Our problem consists of finding Pareto or weak Pareto points for F in ℝn and we note it by

Recall that x* ∈ ℝn is a Pareto optimal (or efficient) point for F, if

there is no x ∈ ℝn such that F(x) ≤ F(x*) and F(x) ≠ F(x*),

that is to say, there does not exist x ∈ ℝn such that Fi (x) ≤ Fi (x*) for all i = 1, ..., m and Fi0(x) < Fi0(x*) for at least one i0. A point x* ∈ ℝn is a weakly Pareto optimal (or weakly efficient) point for F, if

there is no x ∈ ℝn such that F(x) < F(x*),

i.e., there does not exist x ∈ ℝn such that Fi(x) < Fi (x*) for all i = 1, ..., m. Clearly, if x* ∈ ℝn is a Pareto optimal point, then x* is a weak Pareto point, and the converse is not always true.

As wee will see in our first lemma, a necessary condition for optimality of a point xC is stationarity (or criticality):

where J F (x) stands for the Jacobian matrix of F at x (the m×n matrix with entries (J F(x)i,j = ∂Fi(x)/∂xj), J F(x)(ℝn) := {J F(x)v: v ∈ ℝn} and -ℝm++ := {-u : u ∈ ℝm++}. So x ∈ ℝn is stationary for F if, and only if, for all v ∈ ℝn we have J F(x), that is to say for all v there exists j = j (v)such that

and this condition implies

Note that for m = 1, we retrieve the classical stationarity condition for unconstrained scalar-valued optimization: ∇ F(x) = 0.

By definition, a point x ∈ ℝn is nonstationary if there exists a direction v ∈ ℝn such that J F(x)v < 0. The next result shows not only that such v is a ℝm+-descent direction for F at x (there exists ε > 0 such that F(x + tv) < F(x) for all t ∈ (0,ε]), but it also estimates the ℝm+- function decrease. As we will see later on, this result allows us to implement, by a backtracking procedure, an Armijo-like rule on all the multiobjective (descent) methods we will study here.

Lemma 2.1. Let F : ℝn → ℝm be continuously differentiable, v ∈ ℝn such that J F(x)v < 0 and σ ∈ (0,1). Then there exists some ε > 0 (which may depend on x, v and σ) such that

In particular, v is a descent direction for F at x.

Proof. Since F is differentiable, ∇ Fi(x)Tv < 0 and σ ∈ (0,1),we have

So there exists ε > 0 such that

3 THE MULTIOBJECTIVE STEEPEST DESCENT METHOD

In this section we propose an extension of the classical steepest descent method. First we state and prove some results that will allow us to define the algorithm. Then we present its convergence analysis and, finally, we briefly expose a numerically robust version of the method which admits relative errors on computing the search directions at each iteration.

3.1 The search direction for the multiobjective steepest descent method

For a given point x ∈ ℝn, define the function φx : ℝn → ℝ by

As being the maximum of linear functions, φx is convex and positively homogeneous (φxv) = λφx (v) for all λ > 0 and all v ∈ ℝn). Moreover, since u → maxi=1, ..., m ui is Lipschitz continuous with constant 1, we also have

for all v, w ∈ ℝn. Consider now the unconstrained scalar-valued minimization problem

Since the objective function is proper, closed and strongly convex, this problem has always a (unique) optimal solution v(x), which we call steepest descent direction, and is therefore given by

Let us call θ(x) the optimal value of (6), that is to say,

If m = 1, we fall back to scalar minimization and φx(v) = ∇F(x)T v,so v(x) = -∇ F(x) and we retrieve the classical steepest descent direction at x ∈ ℝn.

Note that, in order to get rid of the possible nondifferentiability of (6), we can reformulate the problem as

a convex problem with variables (τ, v) ∈ ℝ × ℝn for which there are efficient numerical solvers.

Let us now give a characterization of stationarity in terms of θ(·) and v(·) and study some features of these functions.

Proposition 3.1. Let v: ℝn → ℝn and θ : ℝn → ℝ given by (7) and (8), respectively. Then, the following statements hold.

  • 1. θ(x) ≤ 0 for all x ∈ ℝn.

  • 2. The function θ(·) is continuous.

  • 3. The following conditions are equivalent:

    • (a) The point x is nonstationary.

    • (b) θ(x) < 0.

    • (c) v(x) ≠ 0.

      In particular, x is stationary if and only if θ(x) = 0.

Proof. 1. From (7) and (8), for any x ∈ ℝn, we have θ(x)

2. Let us proof the continuity of θ in an arbitrary but fixed xº ∈ ℝn. First let us show that the function xv(x) is bounded on compact subsets of ℝn. Let C ⊂ ℝn be compact. Note that for all xC and all i = 1, ..., m, we have

where we used the Cauchy-Schwarz inequality, the definitions of φx and θ, as well as item 1. Therefore, since F is continuously differentiable, we conclude that there exists κ = κ(C) > 0 such that ||v(x)|| ≤ κ for all xC. In particular, there exists κ > 0 such that

where B(xº, 1) is the closed ball with center at xº and radius 1.

Now, for xB (xº, 1) and i ∈ {1, ..., m}, define

by

From (9), the family is equicontinuous. Consider now the family , where Φx = maxi=1, ..., m ϕ x,i. Since u → maxi ui is Lipschitz continuous with constant 1, is also an equicontinuous family of functions. Then for any ε > 0 and all xB (xº, 1), there exists 0 < δ < 1 such that

Hence for ||z - xº|| < δ, from the definitions of functions θ, v, Φx0 and the fact that Φx0(xº) = θ(xº), we have

so θ(z) - θ(xº) < ε. Interchanging the roles of z and xº, we conclude that, for ||z - xº|| < δ,

3. (a) ⇒ (b). Suppose that x is nonstationary. Then, there exists v ∈ ℝn such that φx (v) = maxiF(x)Tv < 0. Therefore, from the optimality of v(x) and the fact that φx is positively homogeneous,

for all τ > 0. Taking 0 < τ < -2φx (v)/||v||2, we see that θ(x) < 0.

(b) ⇒ (c). Since, by item 1, θ(x) ≤ 0, it is enough to notice that v(x) = 0 implies θ(x) = 0.

(c) ⇒ (a). If v(x) ≠ 0, then maxiFi(x)Tv(x) = φx (v(x)) < θ(x) ≤ 0, so J F(x)v(x) ∈ J F(x)(ℝn ) ∩[-ℝm++] and x is nonstationary.

Note that for a nonstationary point x, using the above characterization, we have θ(x) < 0 and so, from (8), we get φx (v(x)) < < 0, which in turn implies that J F(x)v(x) > 0. Whence, by Lemma 2.1, there exists ε > 0 such that

and, in particular, v(x) is a descent direction, i.e., F(x + t v(x)) < F(x) ∀t ∈ (0,ε].

Finally, observe that, while proving Proposition 3.1, we saw that v: ℝn → ℝn, defined by (7), is bounded on compacts sets of ℝn. Actually, it can be seen something stronger, namely that v(·) is a continuous function [28], Lemma 3.3].

3.2 The multiobjective steepest descent algorithm

We are now in conditions to define the extension of the classical steepest descent method for the unconstrained multiobjective optimization problem (1).

Algorithm 3.2. The multiobjective steepest descent method (MSDM)

1. Choose σ ∈ (0, 1) and x0 ∈ ℝn. Set k := 0.

2.

3. Compute θ(xk) = = 0, then stop.

4. Choose tk as the largest t ∈ {1/2j: j = 0, 1, 2,...} such that

5. Set xk+1 := xk + tk vk, k := k + 1 and go to Step 2.

Some comments are in order. Since is strongly convex, vk exists and is unique. Note that, by Proposition 3.1, the stopping rule θ(xk) = 0 can be replaced by vk = 0. Also note that, if at iteration k the algorithm does not reach Step 4, i.e., if it stops at Step 3, then, by Proposition 3.1, xk is a stationary point for F.

If Step 4 is reached at iteration k, by our comments after Proposition 3.1 and Lemma 2.1, the computation of the step length in Step 4 ends up in a finite number of half reductions. Observe that instead of the factor 1/2 in the reductions of Step 4 we could have used any other scalar ν ∈ (0, 1). If the algorithm does not stop, from Step 4 and the fact that J F(xk)vk < 0, we have that the objective values sequence {F(xk)} is ℝm+-decreasing, i.e.,

If the algorithm stops at iteration k0, the above inequality holds for k = 0, 1, ..., k0 - 1.

3.3 Convergence analysis of the MSDM: the general case

Since Algorithm 3.2 ends up with a stationary point or produces infinitely many iterates, from now on, we assume that an infinite sequence {xk} of nonstationary points is generated. We begin presenting a simple application of the standard convergence argument for the scalar-valued steepest descent method.

Theorem 3.3. Let {xk} be a sequence produced by Algorithm 3.2. Every accumulation point of {xk}, if any, is a stationary point.

Proof. Let x be an accumulation point of the sequence {xk}. Consider v (x) and θ (x), given by (7) and (8), respectively, i.e., the solution and the optimal value of (6) at x = (x). According to Proposition 3.1, it is enough to prove that θ (x)= 0.

We know that the sequence {F(xk)} is ℝm+-decreasing (i.e., componentwise strictly decreasing), so we have that

Therefore,

But, by Steps 2-5 of Algorithm 3.2,

and therefore

which, componentwise, can be written as

Considering that tk ∈ (0,1] for all k, we have the following two possibilities:

or

First assume that (11) holds. Then there exists a subsequence {xkj}j converging to x and t > 0 such that limj→∞ tk j = t. So, using (10), we get

where the last equality is a consequence of Proposition 3.1, 2. So, by Proposition 3.1, 3, we conclude that θ(x) = 0, and x is stationary.

Now assume that (12) holds. Due to the the fact that v(·) is bounded on compacts, v = v(xk) for all k and {xk} has a convergent subsequence, it follows that the sequence {vk} has also a bounded subsequence. Therefore there exists v ∈ ℝn and a subsequence {vk j}j such that limj→∞ vk j = v and limj→∞ tk j = 0. Note that we have

so letting j → ∞, we get

Take now a fixed but arbitrary positive integer q. Since tk j → 0, for j large enough, we have

which means that the Armijo-like condition at xkj in Step 4 of Algorithm 3.2 is not satisfied for t = 1/2q, i.e.,

So for all j there exists i = i(kj) ∈ {1,...,m} such that

Since {i(k j )}j ⊂ {1,...,m}, there exists a subsequence {kj} and an index i0 such that i0 = i(kj) for all ℓ = 1, 2, ... and

Taking the limit ℓ → ∞ in the above inequality, we obtain

Since this inequality holds for any positive integer q and for i0 (depending on q), by Lemma 2.1 it follows that J F(x) v 0, so

which, together with (13), implies θ(x) = 0. Therefore, we conclude that x is a stationary point.

Assume now that x0 ∈ {x ∈ ℝn : F(x) ≤ F(x0)}, a bounded level set. Since {F(xk)}is ℝm+- decreasing, the whole sequence {xk}is contained in the above set, so it is bounded and it has at least one accumulation point, which, by the previous theorem, is stationary for F.

3.4 Full convergence of the MSDM:the ℝm+-convex case

In order to characterize v(x), the optimal solution of (6), we begin this section by recalling a well-known result on nonsmooth unconstrained optimization, which establishes optimality conditions for max-type real-valued objective functions.

Proposition 3.4. Let hi : ℝn → ℝ be a differentiable function for i = 1, ..., m. If ∈ ℝn is an unconstrained minimizer of maxi=1, ..., m hi (x), then there exists α = α () ∈ ℝm, with αi ≥ 0 for all i andmi=1, α = 1, such that, for := maxi hi () , it holds

If hi is convex for all i, the above conditions are also sufficient for the optimality of

Proof. First note that the problem minx∈Rn maxi=1,...,m hi (x) can be reformulated as

with optimal solution (, ) ∈ ℝ × ℝn. Since the Lagrangian of this problem is given by L((τ,x),α) := τ + ∑mi=11 αi(hi(x) - τ), its Karush-Kuhn-Tucker (KKT) conditions at (, ) are just those stated in this proposition. Note now that (14) has a Slater point (e. g.,(, 0) ∈ ℝ × ℝn, where > hi(0) for all i). So there exists a vector of KKT multipliers α = α () such that those conditions are satisfied.

For the converse, just note that as (14) is a convex problem on the variables (τ, x),the KKT conditions are sufficient for the optimality of (, ).

By (6)-(7), v(x) = argminv maxiFi(x)T v+(1/2)||v||2. Therefore, in view of the above proposition, there exists ω(x) := α(v(x)) ∈ ℝm+ with ∑mi=1 ωi(x) = 1 such that

and so

The above equality can be rewritten as

Note that (15) tells us that the steepest descent direction v(x) is a classical steepest descent direction for the scalar-valued problem minxmi=1 ωi Fi(x), whose weights ωi := ωi (x) ≥ 0 are the a priori unknown KKT multipliers for problem (14), with hi = Fi for all i, at (θ(x), v(x)).

Let us now introduce an important concept. We say that the mapping F : ℝn → ℝm is ℝm+-convex if

Clearly, F : ℝn → ℝm is ℝm+-convex if and only if its components Fi : ℝn → ℝ are all convex. Under this additional assumption, we have the following extension of the well-known scalar inequality which establishes that a convex differentiable function always overestimates its linear approximation:

In fact, this vector inequality is equivalent to the following m scalar inequalities Fi(u) ≥ Fi (w) + ∇Fi (w)T(u - w) for all u, w ∈ ℝn and all i.

Let us now state and prove a simple result relating Pareto optimality and stationarity.

Lemma 3.5 Let be a weak Pareto optimum for F. Then is a stationary point. If, in addition, F ism+-convex, this condition is also sufficient for weak Pareto optimality. So, underm+-convexity, the concepts of weak efficiency and stationarity are equivalent.

Proof. First, assume that is weak Pareto. As we know, if is not a stationary point, there exists v ∈ ℝn such that J F ()v < 0, so (3) holds for x = , in contradiction with the weak Pareto optimality of

Now suppose that is stationary for the ℝm+-convex mapping F and take any x ∈ ℝn. Since is stationary, (2) holds for x = , v := x - and some j = j (). Therefore, by (18) for u = x and w = Fj(x) ≥ Fj , and so is weak a Pareto optimum for F.

We will keep on studying the case in which the algorithm does not have finite termination and therefore it produces infinite sequences {xk}, {vk} and {tk}. Let us now state the additional assumption under which, in the Rm+-convex case, we will prove full convergence of {xk} to a stationary point or, in view of the above lemma, to a weak unconstrained optimum of F.

Assumption 3.6. Everym+-decreasing sequence {yk} ⊆ F(ℝn) := {F(x): x ∈ ℝn} ism+-bounded from below by a point in F(ℝn), i.e., for any {yk} contained in F(ℝn) with yk+1 < yk for all k, there exists ∈ ℝn such that F () ≤ yk for all k.

Some comments concerning the generality/restrictiveness of this assumption are in order. In the classical unconstrained (convex) optimization case, this condition is equivalent to existence of solutions of the optimization problem. This assumption, known as ℝm+- completeness, is standard for ensuring existence of efficient points for vector optimization problems [[35], Section 3].

We will need the following technical result in order to prove that the MSDM is convergent.

Lemma 3.7. Suppose that F ism+-convex and let ∈ ℝn be such that F () ≤ F(xk) for some k. Then, we have

Proof. By (16), there exists wk := w(xk) ∈ ℝm+ such that

Using the ℝm+-convexity of F, we have that (18) holds, so F(xk) + J F( - xk ) ≤ F (). Since F () ≤ F(xk), we get

Taking into account that wk ∈ ℝm+ and using the above vector inequality as well as (19), we get

Recall that xk+1 = xk + tkvk, with tk > 0. Therefore, from the above inequality we get

which implies the desired inequality, because

We still need a couple of technical results. The first one, basically, establishes that any accumulation point of a sequence produced by the MSDM furnishes an ℝm+-lower bound for the whole objective values sequence. We state it for an arbitrary sequence of nonstationary points, because we will also need it for the convergence analysis of another method.

Lemma 3.8. Let {x(k)} be any infinite sequence of nonstationary points inn such that F(x(k+1)) ≤ F(x(k)) for all k.

  1. If x is an accumulation point of {x(k)},then

    and

    Also, F is constant in the set of accumulation points of {x(k)}.

  2. If {xk}, generated by Algorithm 3.2, has an accumulation point, then all results of item 1 hold for this sequence.

Proof. 1. Let {x(kj)}j be a subsequence that converges to x. Take a fixed but arbitrary k. For j large enough, we have kj > k and F(x(kj)) ≤ F(x(k)). So letting j → ∞, we get

Since the above inequality holds componentwise and Fi (x(kj)) → Fi (x) as j → ∞ for all i, we have limk→∞ F(x(k)) = F (x)

Now let be another accumulation point of {x(k)}. Then there exists a subsequence {x(kℓ)} that converges to . By (20), F(x) ≤ F(x(kℓ)) for all ℓ, so letting ℓ → ∞, we get F(x) ≤ F(). Interchanging the roles of x and , by the same reasoning, we get F() ≤ F(x). These two ℝm+-inequalities imply that F() = F(x).

2. It suffices to note that any (infinite) sequence {xk} generated by Algorithm 3.2 satisfies F(xk+1) < F(xk) for all k.

We now study the speed of convergence to zero of the sequences {vk} and {θ(xk)}.

Lemma 3.9. Suppose that {F(xk)} ism+-bounded from below by y. Then

Proof. We know that Fi(xk+1) ≤ Fi(xk)+σtkFi(xk)Tvk for any i and all k. Adding up from k = 0 to k = N, we get

where we used (4) and (8) for x = xk. As y and θ(xk) < 0 for all i, k (Proposition 3.1, 3), we obtain

Since the last inequality holds for any nonnegative integer N, the conclusions follow immediately.

Before stating and proving the last theorem of this section, let us introduce the main tool for the convergence analysis. Recall that a sequence {uk} ⊂ ℝn is quasi-Fejér convergent to a set U ⊂ ℝn if for every uU there exists a sequence {εk} ⊂ ℝ, εk ≥ 0 for all k and such that

We will need the following well-known result concerning quasi-Fejér convergent sequences, whose proof can be found in [7].

Theorem 3.10. If a sequence {uk} is quasi-Fejér convergent to a nonempty set U ⊂ ℝn, then {uk} is bounded. If, furthermore, {uk} has a cluster point u which belongs to U, then limk→∞uk = u.

In [7] it is proved that the steepest descent method for smooth (scalar) convex minimization, with step size obtained using the Armijo rule implemented by a backtracking procedure, is globally convergent to a solution (essentially, under the sole assumption of existence of optima). Using the same techniques, we extend this result to the multiobjective setting, that is to say, we show that any sequence produced by the MSDM converges to an optimum of problem (1), no matter how poor the initial guess might be.

Theorem 3.11. Suppose that F ism+-convex and that Assumption 3.6 holds. Then any sequence {xk} produced by Algorithm 3.2 converges to a weak Pareto optimal point x* ∈ ℝn.

Proof. As {F(xk)} is an ℝm+-decreasing sequence, by Assumption 3.6, there exists ∈ ℝn such that

Now observe that 0 < tk ≤ 1 for all k, so

Therefore, from (21), Lemma 3.9 and the above inequality, it follows that

Define

Using the ℝm+-convexity of F and (21), from Lemma 3.7, we see that for any xL

As L is nonempty, because L, from (22) and the above inequality, it follows that {xk} is quasi-Fejér convergent to the set L. Therefore, {xk} has accumulation points. Let x* be one of them. By Lemma 3.8, x*L. Then, using Theorem 3.10, we see that the whole sequence {xk} converges to x*. We finish the proof by observing that Theorem 3.3 guarantees that x* is stationary and so, from the ℝm+-convexity of F, by virtue of Lemma 3.5, x* is a weak Pareto optimum.

3.5 The MSDM with relative errors

Here we briefly sketch an inexact version of the MSDM. Instead of computing vk = v(xk) = argmin in Step 2 of Algorithm 3.2, we accept approximate solutions of the problem with some prespecified tolerance. For a nonstationary point x ∈ ℝn and a given tolerance ε ∈[0, 1), we say that a direction v ∈ ℝn is an ε-approximate solution of minv∈ℝn if

where φx(v) = maxi=1,...,mFi(x)Tv and θ(·) is the optimal value of the above optimization problem. Calling hx the objective function in this problem, i.e., hx (v) :=, we see that v is an ε-approximate solution if, and only if, the relative error of hx at v is at most ε,that is to say,

Clearly, the exact direction v(x) is always ε-approximate at x for any ε ∈ [0, 1). Moreover, v(x) is the unique 0-approximate direction at x. Also note that given a nonstationary point x ∈ ℝn and ε ∈ [0, 1), an ε-approximate direction v is always a descent direction. Indeed, from Proposition 3.1 and the definition of ε-approximate solution, it follows that for all i. Whence, J F(x)v < 0 and so, by lemma 2.1, v is a descent direction for F at x.

We can now define the inexact MSDM in almost the same way as its exact counterpart: in Step 1, there is also the choice of the tolerance parameter ε ∈[0, 1), in Step 2, vk is taken as an ε-approximate solution of and, finally, the stopping criterion in Step 3 becomes hxk(vk)= 0. In [19,28] it is shown that the method is well-defined and that it has the same convergence features as the exact one.

 

4 THE MULTIOBJECTIVE PROJECTED GRADIENT METHOD

The scheme of this section is similar to the previous one. Here, for the constrained multiobjective problem, we propose an extension of the classical scalar-valued projected gradient method. First, we prove some technical results. Then, we present the convergence analysis of the method. And, finally, we sketch a numerically robust version of the method which admits relative errors on computing the search directions at each iteration.

Let F : ℝn → ℝm be a continuously differentiable mapping and C ⊆ ℝn be a closed and convex set. We are now interested in the following constrained multiobjective optimization problem:

For this problem, we seek a Pareto optimum (or efficient point), i.e., a feasible point (x*C) such that there does not exist xC with F(x*) ≤ F(x*) and F(x) ≠ F(x*), or we look for a weak Pareto optimum (or weakly efficient point), that is to say, a feasible x* such that there does not exist xC with F(x) < F(x*).

4.1 The search direction for the multiobjective projected gradient method

For the constrained multiobjective problem, a necessary condition for optimality of a point xC is stationarity (or criticality):

where J F(x)(C − x) := {J F(x)(x− x) : x ∈ C}. So x ∈ C. So xC is stationary for F if, and only if, for all vC - x we have J F(x) v 0. Note that when m = 1, we retrieve the classical stationarity condition for constrained scalar-valued optimization: (∇F(x), x − x) ≤ for all xC.

Let us now develop an extension of the classical (scalar) projected gradient method for the vector-valued problem (23). First, for xC, we extend the search direction, originally defined for C = ℝn in (7), by

where β > 0 is a parameter and φx is defined in (4). Note that, in view of the strong convexity of the minimand, for any xC, the projected gradient direction v(x) is always well-defined.

Now we extend the optimal value function θ(·) (given by (8) in the unconstrained case):

We also need the following extension of Proposition 3.1 to the constrained case.

Proposition 4.1. Let v: C → ℝn and θ : C → ℝ be given by (24) and (25), respectively. Then, the following statements hold.

  1. θ(x) ≤ 0 for all xC.

  2. The function θ(·) is continuous.

  3. The following conditions are equivalent:

    • (a) The point xC is nonstationary.

    • (b) θ(x) < 0.

    • (c) v(x) ≠ 0.

    • In particular, x is stationary if and only if θ(x) = 0.

Proof. 1. Recalling that 0 ∈ C - x, the result follows as in the proof of Proposition 3.1.

2. Let xC and {x(k)} ⊂ C be a sequence such that limk→∞ x(k) = x. Since v(x) ∈ C - x, we have v(x) + x - x(k)C - x(k). Thus, from (24) and (25), we have

Hence, from the first inequality of (5), we obtain

Since F is continuously differentiable and u → maxi{ui} is continuous, taking lim supk→∞ on both sides of the above inequality yields

Now, observe that since v(x(k)) ∈ C - x(k), we have v(x(k)) + x(k) - xC - x. Once again from (24) and (25), we obtain

Using once again the first inequality of (5), we have

Note that for all i and any zC, from the Cauchy-Schwarz inequality and item 1, we have that . Therefore, since F is continuously differentiable and {x(k)} is bounded, because it converges, we see that ||v(x(k))|| is bounded for all k. So, taking lim infk→∞ on both sides of the above inequality, we get

where the second inequality follows from (5). We complete the proof by combining this inequality with with (26).

3. All results follow as in the proof of Proposition 3.1. (A formal proof of this equivalences for a more general case can be found in [[25], Proposition 3.]

4.2 The multiobjective projected gradient algorithm

We can now define the extension of the classical (scalar) projected gradient method for the constrained multiobjective optimization problem (23).

Algorithm 4.2. The multiobjective projected gradient method (MPGM)

  1. Choose β > 0, σ ∈ (0,1) and x0C. Set k := 0.

  2. Compute θ(xk)= =0, then stop.

  3. Choose tk as the largest t ∈ {1/2j : j = 0, 1, 2, ...} such that F(xk + tvk) ≤ F(xk) + σ t J F(xk)vk.

  4. Set xk+1 := xk + tkvk, k := k + 1 and go to Step 2.

Note that, whenever C = ℝn, taking β = 1, we retrieve the MSDM. Observe that, by virtue of Proposition 4.1, an alternative stopping criterion in Step 3 is vk = 0. Note also that Algorithm 4.2 either terminates in a finite number of iterations with a stationary point or it generates an infinite sequence of nonstationary points (Proposition 4.1).

Now observe that if Step 4 is reached, by virtue of Lemma 2.1, in finitely many half reductions, we obtain the step length which is used in Step 5 to define the next iterate. Note that instead of the factor 1/2 in the reductions of Step 4, we could have used any other scalar ν ∈ (0,1). Finally, observe that since θ(xk) < 0 implies J F(xk)vk < 0, from steps 3-5, it follows that whenever the method generates and infinite sequence, the objective values are ℝm+-decreasing, i.e.,

If the method stops at iteration k0, this inequality hold for k = 0, 1, ..., k0 - 1.

4.3 Convergence analysis of the MPGM: the general case

As we know, Algorithm 4.2 has finite termination, ending with a stationary point, or generates an infinite sequence of nonstationary iterates. In the sequel we study the second case. First, we show a simple fact: the algorithm produces feasible sequences.

Lemma 4.3. Let {xk} ⊂ ℝn be a sequence generated by Algorithm 4.2. Then, we have {xk} ⊂ C.

Proof. Let us use an inductive argument. From the initialization of the algorithm, x0C. Now assume that xkC. Observe that, by Step 4 of Algorithm 4.2, tk ∈ (0, 1] and, by Step 2, vkC - xk . Then, for some zkC, vk = zk - xk and from the convexity of C, it follows that xk+1 = xk + tkvk = (1 -tk)xk + tkzkC and the proof is complete.

Now we see that whenever a sequence produced by Algorithm 4.2 has accumulation points, they are all feasible stationary points for the problem.

Theorem 4.4. Every accumulation point, if any, of a sequence {xk} generated by Algorithm 4.2 is a feasible stationary point.

Proof. The feasibility follows combining the fact that C is closed with Lemma 4.3. The rest of the proof is similar to the one of Theorem 3.3: it suffices to use the definitions of v(x) and θ(x) given in (24) and (25), respectively.

4.4 Full convergence of the MPGM: the ℝm+-convex case

Let us note that, as being the maximum of convex functions, the objective of the minimization problem which defines v(x) in (24) is also convex and, as we also have that C - x is a closed convex subset of ℝn, by [3], Proposition 4.7.2, a necessary condition for the optimality of v(x) is the existence of w(x) ∈ ℝm with

such that

This condition can be written as

Now using a very well-known characterization of orthogonal projections (see, for instance, [[3], Proposition 2.2.1], we have

where PC(z) := inf{||u - z|| : uC} is well-defined because C is closed and convex. We conclude that there exists w(x) ∈ ℝm satisfying (27) such that

Observe that in the constrained scalar-valued case (m = 1), we have J F(x)T = ∇F(x) and w(x) = 1, which means that

and this is precisely the search direction used in the projected gradient method for real-valued optimization. So this is an extension to the multiobjective setting of the classical projected gradient method.

Lemma 4.5. Suppose that F ism+-convex. Let {xk} be an infinite sequence generated by Algorithm 4.2. If for C and k ≥ 0, we have F () ≤ F(xk), then

where wk := w(xk) ∈ ℝm is such that (27) and (29) hold with x = xk.

Proof. Since xk+1 = xk + tkvk, we have

Let us analyze the rightmost term of the above expression. Since vk = v(xk) and wk = w(xk), by (28) with x = xk, we get

Taking v = - xkC - xk in the above inequality, we obtain

Now, observe that, the convexity of F and (18) yield J F(xk)( - xk) ≤ F() − F(xk). This fact, together with wk ≥ 0, and F () ≤ F(xk) implies

Also, since J F(xk)vk < 0, because xk is nonstationary, we have 〈wk, J F(xk)vk〉 < 0. Thus, recalling that β > 0, from (31) it follows that

The above inequality, together with (30), shows that

And the result follows because tk ≥ 0.

As in the unconstrained case, we need to make an extra assumption in order to prove full convergence of the method.

Assumption 4.6. Everym+-decreasing sequence {yk} ⊂ F(C) := {F(x): xC} ism+-bounded from below by an element of F(C).

We can now state and prove the main result on the convergence analysis of the MPGM: under similar conditions of those of Theorem 3.11, regardless of the initial (feasible) guess, the method converges to an optimum of problem (23).

Theorem 4.7. Assume that F : ℝn → ℝm ism+-convex and that Assumption 4.6 holds. Then, any sequence {xk} generated by Algorithm 4.2 converges to a weak Pareto optimum.

Proof. Let us define the set

and take L, which exists by Assumption 4.6. Since F is ℝm+-convex, it follows from Lemma 4.5 that

First, we prove that the sequence {xk} is quasi-Féjer convergent to the set L. Let {e1,..., em} be the canonical basis of ℝm. We observe that, for each k, we can write

Let us note that, from (27), 0 ≤ wki ≤ 1 for all i and k. Thus, from (32), we have

Defining εk := 2βtkmi=1 |〈ei, J F(xk)vk〉|, we have that εk ≥ 0 and

Thus, let us prove that ∑k=0 εk < ∞. From the Armijo-like condition in Step 4 of the algorithm, we obtain

Since xk is nonstationary, we also have for all i and k that

which means that -tkei, J F(xk)vk〉 = tk|〈ei, J F(xk)vk〉|. Hence, we obtain

Adding up from k = 0 to k = N at the above inequality, where N is any positive integer, we get

Since N is an arbitrary positive integer, β, σ > 0 and F () ≤ F(xk) for all k, we conclude that ∑k=0 εk < ∞. Therefore, recalling that was an arbitrary element of L, we see that {xk} converges quasi-Féjer to L.

By Theorem 3.10, it follows that {xk} is bounded. So, since C is closed, {xk} ⊂ C has at least one feasible accumulation point, say x*, which is feasible and stationary by Theorem 4.4. Using now the ℝm+-convexity of F, from Lemma 3.5, it follows that x*C is a weak Pareto optimum.

As in the end of the proof of Theorem 3.11, we now see that x*L. Since F(xk+1) < F(xk) for all k, by Lemma 3.8, 1, F(x*) ≤ F(xk) for all k, and x*L. So, once again from Theorem 3.10, we conclude that {xk} converges to x*C, a weak Pareto optimum.

4.5 The MPGM with relative errors

As we did for the MSDM in Subsection 3.5, we present here a quick overview of an MPGM version implemented with relative errors in the search directions. Given a nonstationary xC and a tolerance ε ∈ [0, 1), we say that vC - x is an ε-approximate solution of the problem minvC - x φx(v) + ||v||2 if

where hx(v) := φx (v) + 1/2||v||2, with φx given by (4).

Similar comments as those made in Subsection 3.5 for ε-approximate solutions for unconstrained problems can be made here for the constrained case. The modification of the (exact) MPGM are basically the same as those proposed in Subsection 3.5 for the (exact) MSDM: in Step 2 of Algorithm 4.2, instead of defining vk as the exact solution of minvC - xk φxk(v) + 1/2||v||2, we take vk as an ε-approximate solution of this problem. The stopping rule at Step 3 (θ(xk) = 0) is replaced by hxk(vk) = 0.

For a more general case (vector optimization), a comprehensive study of the inexact version of the MPGM can be found in [22], where, under essentially the same hypotheses made here for the (exact) MPGM, all convergence results are fully proved.

5 THE MULTIOBJECTIVE NEWTON METHOD

We now go back to the unconstrained multiobjective problem (1). In order to define a Newton-type search direction for this kind of problems, we first recall some convexity notions for vector-valued mappings. We say that F : ℝn → ℝm is strictlym+-convex if the inequality (17) is satisfied strictly, that is, if F is componentwise strictly convex. We also say that F is stronglym+-convex if there exists û ∈ ℝm++ such that

for all x, y ∈ ℝn and all λ ∈ (0, 1), which is equivalent to say that F is componentwise strongly convex (each component Fi has ûi as its modulus of strong convexity). Clearly, as in the scalar case, strong ℝm+-convexity implies strict ℝm+-convexity, which, in turn, implies ℝm+-convexity.

The following result relates Pareto optimality with strict ℝm+-convexity and characterizes strong ℝm+-convexity in terms of the eigenvalues of the objective functions' Hessians.

Lemma 5.1. Assume that F : ℝn → ℝm is twice continuously differentiable. Then, the following properties hold.

  1. If F is strictlym+-convex and x ∈ ℝn is stationary for F, then x is a Pareto optimal point.

  2. The mapping F is stronglym+-convex if and only if there exists ρ > 0 such that

    where λmin : ℝn×n → ℝ denotes the smallest eigenvalue function.

Proof. It follows from [[27], Corollary 2.2 and Proposition 2.3].

5.1 The search direction for the multiobjective Newton method

From now on, we will assume that F is strongly ℝm+-convex and twice continuously differentiable. For all i = 1, ..., m, let us define ψi : ℝn × ℝn → ℝ by

Hence, for any i, sFi(x) + ψi(x, s) is the local quadratic approximation of Fi at x. For x ∈ ℝn, we define the Newton direction for F at x as

Since ψi (x, ·) is strongly convex for all i, s → maxi ψi (x, ·) is also strongly convex and it has a unique minimizer in ℝn, and so s(x) is well-defined. Clearly, for m = 1 we retrieve the Newton direction for unconstrained real-valued optimization.

Observe that whenever ∇2Fi(x) is equal to I, the identity matrix of ℝn×n, for all i, we have that maxi=1,...,m ψi (x, s) = φx(s) + (1/2)||s||2, where φ is given by (4). Therefore, in multiobjective optimization, if ∇2Fi(x) = I for all i, then, as it happens in the scalar case, the Newton and the steepest descent directions coincide.

Let us call θ(x) the optimal value of the minimization problem mins∈ℝn maxi=1,...,m ψi (x, s), that is to say,

In order to compute s(x), we can reformulate the (possibly nonsmooth) min-max problem as:

a smooth convex problem with variables (τ, v) ∈ ℝ × ℝn for which there are many efficient numerical solvers.

We now present analogous results as those of Propositions 3.1 and 4.1.

Proposition 5.2. Let s : ℝn → ℝn and θ : ℝn → ℝ be given by (34) and (35), respectively. Then, the following statements hold.

  1. θ(x) ≤ 0 for all x ∈ ℝn.

  2. The following conditions are equivalent:

    • (a) The point x is nonstationary.

    • (b) θ(x) < 0.

    • (c) s(x) ≠ 0.

    • In particular, x is stationary if and only if θ(x) = 0.

Proof. 1. By (33) and (35), θ(x) ≤ maxi ψi (x, 0) =0.

2. (a) ⇒ (b). By (a), there exists ∈ ℝn such that J F(x) ∈ -ℝn++, which means that ∇Fi(x)T < 0 for all i. From (33) and (35), for τ > 0, we get

Therefore, for τ > 0 small enough the right hand side of the above inequality is negative and (b) holds.

(b) ⇒ (c). Note that if θ(x) < 0 then, in view of (33) and (35), we have s(x) ≠ 0.

(c) ⇒ (a). From the strong ℝm+-convexity of F, it follows that ∇2Fi(x) is positive definite for all i, so from (c) and item 1, we get

Hence, J F(x)s(x) < 0 and (a) holds.

In particular, we just saw that, for any nonstationary x ∈ ℝn, s(x) is a descent direction. The next lemma supplies an implicit definition of s(x), which will be used in order to obtain bounds for ||s(x)||. These bounds will allow us to prove continuity of s(·), θ(·) and will be used to establish the rate of convergence of the Newton method.

Lemma 5.3. For any x ∈ ℝn, the Newton direction defined in (34) can be written as

with αi = αi(x) ≥ 0 andmi=1 αi = 1.

Proof. We know that the convex problem mins maxi ψi (x, s), where ψi is defined in (33), has a unique solution s(x). Applying Proposition 3.4, we see that there exists α := α(x) ∈ ℝm such that the following conditions hold at s = s(x) and τ = θ(x):

Since ∇2Fi(x) is positive definite and the multipliers αi are nonnegative for all i, from both conditions in (37), the result follows.

So s(x), defined in (34), is a Newton direction for a standard scalar optimization problem (minxmi=1 αiFi(x)), implicitly induced by weighting the given objective functions by the non-negative a priori unknown KKT multipliers. We now give upper bounds for the norm of the Newton direction.

Corollary 5.4. For any x ∈ ℝn, we have

where ρ > 0 is given by Lemma 5.1.

Proof. Let us proof the first inequality. By Proposition 5.2, we have θ(x) ≤ 0, which, by (35), means that maxi ψi (x, s(x)) ≤ 0. Hence, by (33),

From Lemma 5.1, 2, we get

The proof follows from these two inequalities and the operator norm properties.

We now proof the second inequality. We know that the complementarity condition (39) holds for (τ, s) = (θ(x), s(x)). So, by (33) and (35), αi > 0 implies θ(x) = ψi (x, s(x)), and then, using the first equality in (37), we have

Replacing ∑mi=1 αiFi(x) by its expression obtained from (36), using Lemma 5.1, 2 and the fact that ∑i αi = 1 with αi ≥ 0 for all i, we conclude that

and the proof follows immediately from Proposition 5.2, 1.

We now establish the continuity of the functions θ(·) and s(·).

Proposition 5.5. The functions θ : ℝn → ℝ and s : ℝ → ℝn, given by (35) and (34) respectively, are continuous.

Proof. Take x, x' ∈ ℝn. Let s = s(x), s' = s(x') and i0 such that maxi ψi (x', s') = ψi0 (x', s). Then, from (35), (33) and the Cauchy-Schwarz inequality,

where D2F stands for the second derivative of F. Therefore, from the above inequality and Corollary 5.4 we conclude that

Since the above inequality also holds interchanging x and x' and F is twice continuously differentiable, we conclude that θ(·) is continuous.

Now we prove the continuity of s(·). By the same reasoning as above,

where we used the Cauchy-Schwarz inequality and (33) with x = x'. By Lemma 5.1, 2 λmin(∇2Fi(x)) ≥ ρ for all x ∈ ℝn and all i. Therefore, ψi (x', ·) is strongly convex with modulus ρ for all i and so is maxi ψi (x', ·). Hence, using the fact that maxi ψi (x', ·) is minimized at s', we have

Combining inequality (40) with the above one, we get

Since θ(·) is continuous and F is twice continuously differentiable, using Corollary 5.4 and taking the limit x'x in both sides of the above inequality, we conclude that s's and so s(·) is continuous.

5.2 The multiobjective Newton algorithm

Let us now describe the extension of the classical (scalar-valued) Newton method for the unconstrained multiobjective optimization problem (1).

Algorithm 5.6. The multiobjective Newton method (MNM)

  1. Choose σ ∈ (0,1) and x0 ∈ ℝn. Set k := 0.

  2. Compute θ(xk) = = 0, then stop.

  3. Choose tk as the largest t ∈ {1/2j : j = 0, 1, 2, ...} such that

  4. Set xk+1 := xk + tkvk, k := k + 1 and go to Step 2.

As usual, the stopping rule at Step 3 can be replaced by sk = 0 (Proposition 5.2). Now we show that if, at Step 3, θ(xk) ≠ 0, the backtracking procedure prescribed in Step 4 is well-defined, i.e., it ends up in a finite number of (inner) iterations. Indeed, it suffices to note that

for any nonstationary x,t > 0 and all i. As J F(x)s(x) < 0, from lemma 2.1 it follows that there exists ε = ε(x) > 0 such that the first, and therefore the second, inequality in (41) hold for any t ∈ (0, ε). So, for x = xk, we will need finitely many half reductions in order to obtain tk ∈ (0,εk), where εk = ε(xk). Observe also that the reducing factor 1/2 at Step 4 can be substituted by any ν ∈ (0,1).

Moreover, when θ(xk) ≠ 0 for all k, we know that we have θ(xk) < 0, so from step 4 it follows that

This inequality holds for k = 0, ..., k0 - 1 if the algorithm stops at iteration k0. Finally, we point out that, by Proposition 5.2, whenever the method has finite termination, it ends up with a stationary point and therefore a Pareto optimum for F (Lemma 5.1, 1). Therefore, the relevant case for the convergence analysis is the one in which the algorithm generates infinite sequences {xk}, {sk} and {tk}. So, from now on, we suppose that the method does not stop.

5.3 Convergence analysis of the MNM

Let us begin with a technical result which will be useful in the sequel.

Proposition 5.7. For all k = 0, 1, ... and all i = 1, ..., m, we have

Proof. From Step 3 of Algorithm 5.6, for any i and all j, we have

We complete the proof by adding up for j = 1, ..., k and taking into account that θ(xj) < 0 for all j.

We can now prove that if the sequence {xk} has accumulation points, they are all Pareto optima for F.

Proposition 5.8. If (x) ∈ ℝn is an accumulation point of {xk}, then (x) is Pareto optimum for F.

Proof. There exists a subsequence {xkj}j such that

Using Proposition 5.7 with k = kj and taking the limit j → ∞, from the continuity of F, we conclude that

Therefore, limk→∞tkθ(xk) = 0 and, in particular,

Suppose that x is nonstationary, which, by Proposition 5.2, is equivalent to

Define

Using the definition of θ(·) and Lemma 2.1, we conclude that there exists a nonnegative integer q such that

Since s(·), θ(·) and g(·) are continuous,

and, for j large enough

which, in view of the definition of g and Step 4 of the algorithm, implies that tkj≥ 2-q > 0 for j large enough. Hence, from the second limit in (44), we conclude that lim infj→∞ tkj|θ(xkj)| > 0, in contradiction with (42). Therefore, (43) is not true and, by Proposition 5.2, x is stationary. Whence, in view of Proposition 5.1, 1, x is a Pareto optimal point for F.

We can now show that, in order to have full convergence to a Pareto optimum, it suffices to take the initial point x0 in a bounded level set of F.

Theorem 5.9. Let x0 ∈ ℝn be in a compact level set of F and {xk} a sequence generated by Algorithm 5.6. Then {xk} converges to a Pareto optimum x* ∈ ℝn.

Proof. Let ⌈0 be the F(x0)-level set of F, that is, ⌈0 := {x ∈ ℝn : F(x) ≤ F(x0)}. We know that the sequence {F(xk)} is ℝm+-decreasing, so we have xk ∈ ⌈0 for all k. Therefore {xk} is bounded, all its accumulation points belong to ⌈0 and so, using Proposition 5.8, we conclude that they are all Pareto optima for F. Moreover, applying Lemma 3.8, 1 with x(k) = xk for all k, we see that all these accumulation points have the same objective value. As F is strongly (an therefore strictly) Rm +-convex, there exists just one accumulation point of {xk}, say x*, and the proof is complete.

5.4 Convergence rate of the MNM

Here we analyze the convergence rate of any infinite sequence {xk} generated by Algorithm 5.6. First, we provide a bound for θ(xk+1) based on data at the former iterate xk. Then we show that for k large enough, full Newton steps are performed, that is to say, tk = 1. Using this result, we prove superlinear convergence. Finally, under additional regularity assumptions, we prove quadratic convergence.

Lemma 5.10. For any k, there exist αk ∈ ℝm such thatmi=1 αki = 1, αki ≥ 0 for all i with

and

where ρ > 0 is given by Lemma 5.1.

Proof. By virtue of (37)-(38) with x = xk, s = sk, the Lagrangian condition (45) holds for any k with a vector of KKT multipliers α = α(xk) =: αk ≥ 0 such that ∑mi=1 αki = 1.

In order to prove (46), observe that, from Steps 2-3 of Algorithm 5.6 and the properties of αk,

where the last inequality holds in view of Lemma 5.1 for x = xk+1. The function

is convex, so its minimal value is attained where its gradient vanishes, i.e., at

So

The proof follows from this equality and (47).

Let us now see that, for any sequence defined by Algorithm 5.6, full Newton steps will be eventually taken at Step 4.

Theorem 5.11. Let x0 ∈ ℝn be in a compact level set of F and {xk} a sequence generated by Algorithm 5.6. Then {xk} converges to a Pareto optimum point x* ∈ ℝn. Moreover, tk = 1 for k large enough and the convergence of {xk} to x* is superlinear.

Proof. Convergence of {xk} to a Pareto optimum x* was established in Theorem 5.9. By Lemma 3.5 and Proposition 5.2, 3, we know that x* is stationary and

Since F is twice continuously differentiable, for any ε > 0 there exists δ > 0 such that

where B(x*, δ) stands for the open ball with center in x* and radius δ. As {xk} converges to x*, using (48) and the continuity of s(·) (Proposition 5.5), we conclude that {sk} converges to 0. Therefore there exists kε such that for kkε we have

and using (49) for estimating the integral remainder on Taylor's second order expansion of Fi at xk, we conclude that for any i

So, using (33), (35), Corollary 5.4 and the fact that σ < 1, we obtain

The above inequality shows that if ε < (1 - σ)ρ, then, at Step 4, tk = 1 is accepted for kkε. Suppose that

Using (45) and (49) for estimating the integral remainder on Taylor's first order expansion of ∑mi=1 αikFi at xk, in xk +sk, we conclude that for any kkε it holds

which, combined with (46), shows that

Using the above inequality and Corollary 5.4, we conclude that for kkε we have

Therefore, if kkε and j ≥ 1 then

To prove superlinear convergence, take 0 < ξ < 1 and define

If ε < ε* and kkε, using (50) and the convergence of {xk} to x*, we get

Hence,

Combining the above two inequalities, we conclude that, if ε < ε* and kkε, then

Since ξ was arbitrary in (0, 1), the above quotient tends to zero and the proof is complete.

Recall that in the classical Newton method for minimizing a scalar convex twice continuously differentiable function, the proof of quadratic convergence requires Lipschitz continuity of the objective function's second derivative. Likewise, under the assumption of Lipschitz continuity of D2F, we can see that the MNM also converges quadratically.

Theorem 5.12. Let x0 ∈ ℝn be in a compact level set of F, {xk} a sequence generated by Algorithm 5.6 and D2F Lipschitz continuous onn. Then {xk} converges quadratically to a Pareto optimal solution x* ∈ ℝn.

Proof. By Theorem 5.11, {xk} converges superlinearly to a Pareto optimal point x* and tk = 1 for k large enough. Let be the Lipschitz constant for D2F. Then, for αk ∈ ℝm as in Lemma 5.10, we have that ∑mi=1 αki2Fi is also -Lipschitz continuous. Therefore, for k large enough,

where the last inequality follows from the second equality, (45) and Taylor's development of ∑mi=1 αkiFi at xk, in xk + sk. Using the above inequality and (46), we obtain

which, combined with Corollary 5.4 yields

Take ξ ∈ (0, 1). Since {xk} converges superlinearly to x*, there exists k0 such that for kk0 we have (51), (52) and

Therefore, applying the triangle inequality to ||x* - x|| and then to ||xℓ+1 - x||, for ℓ ≥ k0 we obtain

Whence, using the above rightmost inequality for ℓ = kk0 and the second equality in (51), we have

while using the first inequality in (53) for ℓ = k + 1 and the second equality in (51) (for k + 1 instead of k), we obtain

Quadratic convergence of {xk} to x* follows from the two inequalities above and (52).

6 FINAL REMARKS

Let us make some final comments. First, it is worth to notice that, by implementing the vector optimization versions of the steepest descent method [28] and the projected gradient method [21,22,25], we can guarantee convergence to Pareto optima instead of just weak Pareto points. Indeed, it suffices to take an underlying ordering (closed convex) cone with ℝm+\{0} contained in the interior of and such that a -version of Assumption 3.6 (Assumption 4.6) in the unconstrained (constrained) case is satisfied [[21], Section 6].

Without using this technique, in some cases, convergence of these two methods to (strong) Pareto optima can also be obtained. For instance, if the objective function is strictly ℝm+-convex, then, under the assumptions of the convergence theorems, this will actually happen. Indeed, as we saw in the proof of Theorem 3.11 (Theorem 4.7), sequences generated by the MSDM (MPGM) converge to stationary points, which, in this case, are Pareto optima (Lemma 5.1, 1).

We also point out that, with these (and other) descent vector-valued methods, we are just concerned with finding a single optimum and not all the points in the optimal set. Nevertheless, from a numerical point of view, we can expect to somehow approximate the solution set by just performing any of these methods for different initial points (e.g., [18], Section 8). In the so-called weighting method, this kind of idea also appears. To be more precise, in such a case, one weighting vector possibly gives rise to a Pareto optimal point, which means that we should perform the method for different weights in order to find the Pareto frontier. However, in some cases, as shown in [[18], Section 7], almost any choices of the weighting vectors may lead to unbounded problems. With arbitrary choices of initial points, we do not have this disadvantage in the descent vector-valued methods discussed here. We believe this is one of the main motivations for the study of such methods.

ACKNOWLEDGMENTS

This work was supported by FAPESP (grant 2010/20572-0) and by FAPERJ/CNPq through PRONEX-Optimization. We are also grateful to Benar F. Svaiter for his suggestions and to an anonymous referee for his helpful comments

REFERENCES

BELLO CRUZ JY, LUCAMBIO PÉREZ LR & MELO JG. 2011. Convergence of the projected gradient method for quasiconvex multiobjective optimization. Nonlinear Analysis, 74:5268–5273. [ Links ]

BENTO GC, FERREIRA OP & OLIVEIRA PR. 2012. Unconstrained steepest descent method for multicriteria optimization on Riemannian manifolds. Journal of Optimization Theory and Applications, 154(1):88–107. [ Links ]

BERTSEKAS DP. 2003. Convex Analysis and Optimization. Athena Scientific, Belmont, MA. [ Links ]

BONNEL H, IUSEM AN & SVAITER BF. 2005. Proximal methods in vector optimization. SIAM Journal on Optimization, 15:953–970. [ Links ]

BRANKE J, DEB K, MIETTINEN K & SLOWINSKI R, EDITORS. 2006. Practical Approaches to Multi-Objective Optimization. Dagstuhl Seminar, 2006. Seminar 06501. Forthcoming. [ Links ]

BRANKE J, DEB K, MIETTINEN K & STEUER RE, EDITORS. 2004. Practical Approaches to Multi-Objective Optimization. Dagstuhl Seminar, 2004. Seminar 04461. Electronically available at http://drops.dagstuhl.de/portals/index.php?semnr=04461. [ Links ]

BURACHIK R, GRAÑA DRUMMOND LM, IUSEM AN & SVAITER BF. 1995. Full convergence of the steepest descentmethod with inexact line searches. Optimization, 32(2):137–146. [ Links ]

CARRIZO GA & MACIEL C. 2013. Trust region globalization strategy for the nonconvex unconstrained multiobjective optimization problem: Global convergence analysis. Submitted for publication. [ Links ]

CARRIZO GA & MACIEL C. 2013. Trust region globalization strategy for the nonconvex unconstrained multiobjective optimization problem: Local convergence. Submitted for publication. [ Links ]

CARRIZOSA E & FRENK JBG. 1998. Dominating sets for convex functions with some applications. Journal of Optimization Theory and Applications, 96(2):281–295. [ Links ]

DAS I & DENNIS JE. 1997. A closer look at drawbacks of minimizing weighted sums of objectives for pareto set generation in multicriteria optimization problems. Structural Optimization, 14(1):63–69. [ Links ]

DAS I & DENNIS JE. 1998. Normal-boundary intersection: a new method for generating the pareto surface in nonlinear multicriteria optimization problems. SIAM Journal on Optimization, 8(3):631–657. [ Links ]

EICHFELDER G. 2009. An adaptive scalarization method in multiobjective optimization. SIAM Journal on Optimization, 19(4):1694–1718. [ Links ]

ESCHENAUER H, KOSKI J & OSYCZKA A. 1990. Multicriteria Design Optimization: Procedures and Applications. Springer-Verlag, Berlin. [ Links ]

EVANS GW. 1984. An overview of techniques for solving multiobjective mathematical programs. Management Science, 30(11):1268–1282. [ Links ]

FLIEGE J. 2001. OLAF – A general modeling system to evaluate and optimize the location of an air polluting facility. OR Spektrum, 23:117–136. [ Links ]

FLIEGE J. 2006. An efficient interior-point method for convex multicriteria optimization problems. Mathematics of Operations Research, 31(4):825–845. [ Links ]

FLIEGE J, GRAÑA DRUMMOND LM & SVAITER BF. 2009. Newton's method for multiobjective optimization. SIAM Journal on Optimization, 20(2):602–626. [ Links ]

FLIEGE J & SVAITER BF. 2000. Steepest descent methods for multicriteria optimization. Mathematical Methods of Operations Research, 51(3):479–494. [ Links ]

FU Y & DIWEKAR UM. 2004. An efficient sampling approach to multiobjective optimization. Annals of Operations Research, 132(1-4):109–134. [ Links ]

FUKUDA EH & GRAÑA DRUMMOND LM. 2011. On the convergence of the projected gradient method for vector optimization. Optimization, 60(8-9):1009–1021. [ Links ]

FUKUDA EH & GRAÑA DRUMMOND LM. 2013. Inexact projected gradient method for vector optimization. Computational Optimization and Applications, 54(3): 473–493. [ Links ]

GASS S & SAATY T. 1955. The computational algorithm for the parametric objective function. Naval Research Logistics Quarterly, 2(1-2):39–45, 1955. [ Links ]

GEOFFRION AM. 1968. Proper efficiency and the theory of vector maximization. Journal of Mathematical Analysis and Applications, 22:618–630. [ Links ]

GRAÑA DRUMMOND LM & IUSEM AN. 2004. A projected gradient method for vector optimization problems. Computational Optimization and Applications, 28(1): 5–29. [ Links ]

GRAÑA DRUMMOND LM, MACULAN N & SVAITER BF. 2008. On the choice of parameters for the weighting method in vector optimization. Mathematical Programming, 111(1-2). [ Links ]

GRAÑA DRUMMOND LM, RAUPP FMP & SVAITER BF. 2011. A quadratically convergent Newton method for vector optimization. Optimization, 2011. Accepted for publication. [ Links ]

GRAÑA DRUMMOND LM & SVAITER BF. 2005. A steepest descent method for vector optimization. Journal of Computational and Applied Mathematics, 175(2):395–414. [ Links ]

JAHN J. 1984. Scalarization in vector optimization. Mathematical Programming, 29:203–218. [ Links ]

JAHN J. 2003. Vector Optimization – Theory, Applications and Extensions. Springer, Erlangen. [ Links ]

JÜSCHKE A, JAHN J & KIRSCH A. 1997. A bicriterial optimization problem of antenna design. Computational Optimization and Applications, 7(3):261–276. [ Links ]

KIM IY & WECK OL. 2005. Adaptive weighted-sum method for bi-objective optimization: Pareto front generation. Structured and Multidisciplinary Optimization, 29(2):149–158. [ Links ]

LAUMANNS M, THIELE L, DEB K & ZITZLER E. 2002. Combining convergence and diversity in evolutionary multiobjective optimization. Evolutionary Computation, 10:263–282. [ Links ]

LESCHINE TM, WALLENIUS H & VERDINI WA. 1992. Interactive multiobjective analysis and assimilative capacity-based ocean disposal decisions. European Journal of Operational Research, 56:278–289. [ Links ]

LUC DT. 1989. Theory of vector optimization. In Lecture Notes in Economics and Mathematical Systems, 319, Berlin, Springer. [ Links ]

MIETTINEN K & MÄKELÄ MM. 1995. Interactive bundle-based method for nondifferentiable multiobjective optimization: Nimbus. Optimization, 34:231–246. [ Links ]

MIETTINEN KM. 1999. Nonlinear Multiobjective Optimization. Kluwer Academic Publishers, Norwell. [ Links ]

MOSTAGHIM S, BRANKE J & SCHMECK H. 2006. Multi-objective particle swarm optimization on computer grids. Technical report. 502, Institut AIFB, University of Karlsruhe (TH), December. [ Links ]

PALERMO G, SILVANO C, VALSECCHI S & ZACCARIA V. 2003. A system-level methodology for fast multi-objective design space exploration. ACM, New York. [ Links ]

PRABUDDHA D, GHOSH JB & WELLS CE. 1992. On the minimization of completion time variance with a bicriteria extension. Operations Research, 40(6):1148–1155. [ Links ]

TAVANA M. 2004. A subjective assessment of alternative mission architectures for the human exploration of mars at NASA using multicriteria decision making. Computational and Operations Research, 31:1147–1164. [ Links ]

WHITE DJ. 1998. Epsilon-dominating solutions in mean-variance portfolio analysis. European Journal on Operations Research, 105:457–466. [ Links ]

ZADEH L. 1963. Optimality and non-scalar-valued performance criteria. IEEE Transactions on Automatic Control, 8(1):59–60. [ Links ]

ZITZLER E, DEB K & THIELE L. 2000. Comparison of multiobjective evolutionary algorithms: empirical results. Evolutionary Computation, 8:173–195. [ Links ]

Received: September 28, 2013; Accepted: January 02, 2014

*Corresponding author.

Creative Commons License This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.