A SURVEY ON MULTIOBJECTIVE DESCENT METHODS

Fukuda, Ellen H.; Drummond, Luis Mauricio Graña

doi:10.1590/0101-7438.2014.034.03.0585

Abstract

We present a rigorous and comprehensive survey on extensions to the multicriteria setting of three well-known scalar optimization algorithms. Multiobjective versions of the steepest descent, the projected gradient and the Newton methods are analyzed in detail. At each iteration, the search directions of these methods are computed by solving real-valued optimization problems and, in order to guarantee an adequate objective value decrease, Armijo-like rules are implemented by means of a backtracking procedure. Under standard assumptions, convergence to Pareto (weak Pareto) optima is established. For the Newton method, superlinear convergence is proved and, assuming Lipschitz continuity of the objectives second derivatives, it is shown that the rate is quadratic

multiobjective optimization; Newton method; nonlinear optimization; projected gradient method; steepest descent method

1 INTRODUCTION

In multiobjective (or multicriteria) optimization, finitely many objective functions have to be minimized simultaneously. Hardly ever a single point will minimize all of them at once, so we need another notion of optimality. Here, we will use the concepts of Pareto and weak Pareto optimality. A point is called Pareto optimal or efficient, if there does not exist a different point with the same or smaller objective function values, such that there is a strict decrease in at least one objective function value. A point is called weakly Pareto optimal or weakly efficient if there does not exist a different point with strict decrease in all its objective values. Applications for this type of problem can be found in engineering ^[¹⁴[14] ESCHENAUER H, KOSKI J & OSYCZKA A. 1990. Multicriteria Design Optimization: Procedures and Applications. Springer-Verlag, Berlin.^] (especially truss optimization ^[¹²[12] DAS I & DENNIS JE. 1998. Normal-boundary intersection: a new method for generating the pareto surface in nonlinear multicriteria optimization problems. SIAM Journal on Optimization, 8(3):631–657.^,³²[32] KIM IY & WECK OL. 2005. Adaptive weighted-sum method for bi-objective optimization: Pareto front generation. Structured and Multidisciplinary Optimization, 29(2):149–158.^], design ^[²⁰[20] FU Y & DIWEKAR UM. 2004. An efficient sampling approach to multiobjective optimization. Annals of Operations Research, 132(1-4):109–134.^,³¹[31] JÜSCHKE A, JAHN J & KIRSCH A. 1997. A bicriterial optimization problem of antenna design. Computational Optimization and Applications, 7(3):261–276.^], space exploration ^[³⁹[39] PALERMO G, SILVANO C, VALSECCHI S & ZACCARIA V. 2003. A system-level methodology for fast multi-objective design space exploration. ACM, New York.^,⁴¹[41] TAVANA M. 2004. A subjective assessment of alternative mission architectures for the human exploration of mars at NASA using multicriteria decision making. Computational and Operations Research, 31:1147–1164.^], statistics ^[¹⁰[10] CARRIZOSA E & FRENK JBG. 1998. Dominating sets for convex functions with some applications. Journal of Optimization Theory and Applications, 96(2):281–295.^], management science ^[¹⁵[15] EVANS GW. 1984. An overview of techniques for solving multiobjective mathematical programs. Management Science, 30(11):1268–1282.^,⁴⁰[40] PRABUDDHA D, GHOSH JB & WELLS CE. 1992. On the minimization of completion time variance with a bicriteria extension. Operations Research, 40(6):1148–1155.^,⁴²[42] WHITE DJ. 1998. Epsilon-dominating solutions in mean-variance portfolio analysis. European Journal on Operations Research, 105:457–466.^], environmental analysis ^[³⁴[34] LESCHINE TM, WALLENIUS H & VERDINI WA. 1992. Interactive multiobjective analysis and assimilative capacity-based ocean disposal decisions. European Journal of Operational Research, 56:278–289.^,¹⁶[16] FLIEGE J. 2001. OLAF – A general modeling system to evaluate and optimize the location of an air polluting facility. OR Spektrum, 23:117–136.^], etc.

One of the main solution strategies for multiobjective optimization problems is the scalarization approach ^[²³[23] GASS S & SAATY T. 1955. The computational algorithm for the parametric objective function. Naval Research Logistics Quarterly, 2(1-2):39–45, 1955.^,²⁴[24] GEOFFRION AM. 1968. Proper efficiency and the theory of vector maximization. Journal of Mathematical Analysis and Applications, 22:618–630.^,⁴³[43] ZADEH L. 1963. Optimality and non-scalar-valued performance criteria. IEEE Transactions on Automatic Control, 8(1):59–60.^]: we obtain the optimal solutions of the vector-valued problem by solving one or several parametrized single-objective (i.e., scalar-valued) optimization problems. Among the scalarization techniques, we have the so-called weighting method, where one minimizes a nonnegative linear combination of the objective functions ^[²⁶[26] GRAÑA DRUMMOND LM, MACULAN N & SVAITER BF. 2008. On the choice of parameters for the weighting method in vector optimization. Mathematical Programming, 111(1-2).^,²⁹[29] JAHN J. 1984. Scalarization in vector optimization. Mathematical Programming, 29:203–218.^,³⁰[30] JAHN J. 2003. Vector Optimization – Theory, Applications and Extensions. Springer, Erlangen.^,³⁷[37] MIETTINEN KM. 1999. Nonlinear Multiobjective Optimization. Kluwer Academic Publishers, Norwell.^]. The parameters are not known in advance and the modeler or decision-maker has to choose them. For some problems, this choice can be problematic, as shown in the example provided in ^[¹⁸[18] FLIEGE J, GRAÑA DRUMMOND LM & SVAITER BF. 2009. Newton's method for multiobjective optimization. SIAM Journal on Optimization, 20(2):602–626.^], Section 7], where almost all choices of the parameters lead to unbounded scalar problems. Only recently, adaptive scalarization techniques, where scalarization parameters are chosen automatically during the course of the algorithm such that a certain quality of approximation is maintained, have been proposed ^[¹³[13] EICHFELDER G. 2009. An adaptive scalarization method in multiobjective optimization. SIAM Journal on Optimization, 19(4):1694–1718.^,¹⁷[17] FLIEGE J. 2006. An efficient interior-point method for convex multicriteria optimization problems. Mathematics of Operations Research, 31(4):825–845.^]. Still other techniques, working only in the bicriteria case, can be viewed as choosing a fixed grid in a particular parameter space ^[¹¹[11] DAS I & DENNIS JE. 1997. A closer look at drawbacks of minimizing weighted sums of objectives for pareto set generation in multicriteria optimization problems. Structural Optimization, 14(1):63–69.^,¹²[12] DAS I & DENNIS JE. 1998. Normal-boundary intersection: a new method for generating the pareto surface in nonlinear multicriteria optimization problems. SIAM Journal on Optimization, 8(3):631–657.^].

Multicriteria optimization algorithms that do not scalarize have recently been developed (see, e.g., ^[⁵[5] BRANKE J, DEB K, MIETTINEN K & SLOWINSKI R, EDITORS. 2006. Practical Approaches to Multi-Objective Optimization. Dagstuhl Seminar, 2006. Seminar 06501. Forthcoming.^,⁶[6] BRANKE J, DEB K, MIETTINEN K & STEUER RE, EDITORS. 2004. Practical Approaches to Multi-Objective Optimization. Dagstuhl Seminar, 2004. Seminar 04461. Electronically available at http://drops.dagstuhl.de/portals/index.php?semnr=04461.
http://drops.dagstuhl.de/portals/index.p... ^] for an overview on the subject). Some of these techniques are extensions of scalar optimization algorithms (notably the steepest descent algorithm ^[¹⁹[19] FLIEGE J & SVAITER BF. 2000. Steepest descent methods for multicriteria optimization. Mathematical Methods of Operations Research, 51(3):479–494.^] with at most linear convergence), while others borrow heavily from ideas developed in heuristic optimization ^[³³[33] LAUMANNS M, THIELE L, DEB K & ZITZLER E. 2002. Combining convergence and diversity in evolutionary multiobjective optimization. Evolutionary Computation, 10:263–282.^,³⁸[38] MOSTAGHIM S, BRANKE J & SCHMECK H. 2006. Multi-objective particle swarm optimization on computer grids. Technical report. 502, Institut AIFB, University of Karlsruhe (TH), December.^]. For the latter, no convergence proofs are known, and empirical results show that convergence generally is, as in the scalar case, quite slow ^[⁴⁴[44] ZITZLER E, DEB K & THIELE L. 2000. Comparison of multiobjective evolutionary algorithms: empirical results. Evolutionary Computation, 8:173–195.^]. Other parameter-free multicriteria optimization techniques use an ordering of the different criteria, i.e., an ordering of importance of the components of the objective function vector. In this case, the ordering has to be prespecified. Moreover, the corresponding optimization process is usually augmented by an interactive procedure ^[³⁶[36] MIETTINEN K & MÄKELÄ MM. 1995. Interactive bundle-based method for nondifferentiable multiobjective optimization: Nimbus. Optimization, 34:231–246.^].

Following the research line opened in 2001 with ^[¹⁹[19] FLIEGE J & SVAITER BF. 2000. Steepest descent methods for multicriteria optimization. Mathematical Methods of Operations Research, 51(3):479–494.^], other classical (scalar) optimization procedures where extended in recent years, not just for the multicriteria setting, but also for vector optimization problems, i.e., when other underlying ordering cones are used instead of the nonnegative orthant. In ^[²⁸[28] GRAÑA DRUMMOND LM & SVAITER BF. 2005. A steepest descent method for vector optimization. Journal of Computational and Applied Mathematics, 175(2):395–414.^], a steepest descent method is proposed, while in ^[¹[1] BELLO CRUZ JY, LUCAMBIO PÉREZ LR & MELO JG. 2011. Convergence of the projected gradient method for quasiconvex multiobjective optimization. Nonlinear Analysis, 74:5268–5273.^,²¹[21] FUKUDA EH & GRAÑA DRUMMOND LM. 2011. On the convergence of the projected gradient method for vector optimization. Optimization, 60(8-9):1009–1021.^,²²[22] FUKUDA EH & GRAÑA DRUMMOND LM. 2013. Inexact projected gradient method for vector optimization. Computational Optimization and Applications, 54(3): 473–493.^,²⁵[25] GRAÑA DRUMMOND LM & IUSEM AN. 2004. A projected gradient method for vector optimization problems. Computational Optimization and Applications, 28(1): 5–29.^], several versions of the projected gradient method are studied. Proximal point type methods are analyzed in ^[⁴[4] BONNEL H, IUSEM AN & SVAITER BF. 2005. Proximal methods in vector optimization. SIAM Journal on Optimization, 15:953–970.^]. Trust region strategies for multicriteria were developed in ^[⁸[8] CARRIZO GA & MACIEL C. 2013. Trust region globalization strategy for the nonconvex unconstrained multiobjective optimization problem: Global convergence analysis. Submitted for publication.^,⁹[9] CARRIZO GA & MACIEL C. 2013. Trust region globalization strategy for the nonconvex unconstrained multiobjective optimization problem: Local convergence. Submitted for publication.^]. Even a steepest descent method for multicriteria optimization on Riemannian manifolds was developed in ^[²[2] BENTO GC, FERREIRA OP & OLIVEIRA PR. 2012. Unconstrained steepest descent method for multicriteria optimization on Riemannian manifolds. Journal of Optimization Theory and Applications, 154(1):88–107.^]. In 2009, Fliege et al. ^[¹⁸[18] FLIEGE J, GRAÑA DRUMMOND LM & SVAITER BF. 2009. Newton's method for multiobjective optimization. SIAM Journal on Optimization, 20(2):602–626.^] came up with a multiobjective version of the Newton method, while later on, Graña Drummond et al. ^[²⁷[27] GRAÑA DRUMMOND LM, RAUPP FMP & SVAITER BF. 2011. A quadratically convergent Newton method for vector optimization. Optimization, 2011. Accepted for publication.^] proposed a Newton method for vector optimization.

These extensions share an essential feature: they are all descent methods, i.e., the vector objective value decreases at each iteration in the partial order induced by the underlying cones. Under reasonable assumptions, full convergence of sequences produced by these algorithms is established in all those works. As expected, for the vector-valued Newton methods, convergence is local and fast, while for the others it is global (and not so fast).

In this survey we study just three of these procedures: the steepest descent ^[¹⁹[19] FLIEGE J & SVAITER BF. 2000. Steepest descent methods for multicriteria optimization. Mathematical Methods of Operations Research, 51(3):479–494.^,²⁸[28] GRAÑA DRUMMOND LM & SVAITER BF. 2005. A steepest descent method for vector optimization. Journal of Computational and Applied Mathematics, 175(2):395–414.^], the projected gradient ^[²¹[21] FUKUDA EH & GRAÑA DRUMMOND LM. 2011. On the convergence of the projected gradient method for vector optimization. Optimization, 60(8-9):1009–1021.^,²²[22] FUKUDA EH & GRAÑA DRUMMOND LM. 2013. Inexact projected gradient method for vector optimization. Computational Optimization and Applications, 54(3): 473–493.^,²⁵[25] GRAÑA DRUMMOND LM & IUSEM AN. 2004. A projected gradient method for vector optimization problems. Computational Optimization and Applications, 28(1): 5–29.^] and the Newton ^[¹⁸[18] FLIEGE J, GRAÑA DRUMMOND LM & SVAITER BF. 2009. Newton's method for multiobjective optimization. SIAM Journal on Optimization, 20(2):602–626.^,²⁷[27] GRAÑA DRUMMOND LM, RAUPP FMP & SVAITER BF. 2011. A quadratically convergent Newton method for vector optimization. Optimization, 2011. Accepted for publication.^] methods. For the sake of simplicity, we just analyze their multiobjective versions and, in order to emphasize the underlying ideas and avoid technicalities, we do not present the results in their maximum degree of generality. For instance, we assume that the objectives are continuously differentiable in the whole space, while, sometimes, we just need local differentiability, etc.

The outline of this paper is as follows. In Section 2 we establish the notation, as well as the basic concepts, and define the (smooth) unconstrained multiobjective problem. We also state and prove a simple but important result on descent directions. In Section 3, we study the multiobjective steepest descent method. First, we introduce the search direction and analyze its properties. Then we describe the algorithm and, finally, we perform the convergence analysis of the method. Basically, we establish that, under convexity of the objectives and another natural assumption, any sequence produced by the method is globally convergent to a weak Pareto optimal point. In Section 4 we define the smooth constrained multicriteria problem, as well as the multiobjective projected gradient algorithm and we present its convergence analysis, which is quite similar to the previous one. In Section 5, for twice continuously differentiable strongly convex objectives, we define the multiobjective Newton method and, assuming similar conditions as those of the previous cases, we establish its convergence to Pareto optimal points with superlinear rate. Under the additional hypothesis of Lipschitz continuity of the objectives' second derivatives, we prove quadratic convergence. In Section 6 we make some comments on how to obtain (strong) Pareto optima with the first two algorithms and on how to approach the whole efficient frontier with the three methods.

2 PRELIMINARIES

All over this work, (·,·) will stand for the usual canonical inner product in ℝ^p (here, we will just deal with p = n or p = m), i.e., (u, v)= ∑^p_i=1u_iv_i = u^Tv. Likewise, ||·|| will always be the Euclidean norm, i.e., ||u|| = . For an m × n real matrix A, ||A|| will denote its operator (Euclidean) norm. As usual, ℝ₊ and ℝ₊₊ designate the sets of nonnegative and positive real numbers, respectively.

Let ℝ^m₊ := ℝ₊ ×···× ℝ₊ and ℝ^m₊₊ := ℝ₊₊ ×···× ℝ₊₊ be the nonnegative orthant or Paretian cone and the positive orthant of ℝ^m, respectively. Consider the partial order induced by ℝ^m₊: for u, v ∈ ℝ^m, u ≤ v(alternatively, v ≥ u)if v - u ∈ ℝ^m₊, as well as the following stronger relation: u < v (alternatively, v > u) if v - u ∈ ℝ^m₊₊. In other words, u ≤ v stands for u_i ≤ v_i for all i = 1,..., m, and u < v should also be understood as a (strict) componentwise inequality.

Let

be a continuously differentiable mapping. Our problem consists of finding Pareto or weak Pareto points for F in ℝⁿ and we note it by

Recall that x^* ∈ ℝⁿ is a Pareto optimal (or efficient) point for F, if

there is no x ∈ ℝⁿ such that F(x) ≤ F(x^*) and F(x) ≠ F(x^*),

that is to say, there does not exist x ∈ ℝⁿ such that Fi (x) ≤ Fi (x^*) for all i = 1, ..., m and F_i0(x) < F_i0(x^*) for at least one i₀. A point x^* ∈ ℝⁿ is a weakly Pareto optimal (or weakly efficient) point for F, if

there is no x ∈ ℝⁿ such that F(x) < F(x^*),

i.e., there does not exist x ∈ ℝⁿ such that F_i(x) < F_i (x^*) for all i = 1, ..., m. Clearly, if x^* ∈ ℝⁿ is a Pareto optimal point, then x^* is a weak Pareto point, and the converse is not always true.

As wee will see in our first lemma, a necessary condition for optimality of a point x ∈ C is stationarity (or criticality):

where J F (x) stands for the Jacobian matrix of F at x (the m×n matrix with entries (J F(x)_i,j = ∂F_i(x)/∂x_j), J F(x)(ℝⁿ) := {J F(x)v: v ∈ ℝⁿ} and -ℝ^m₊₊ := {-u : u ∈ ℝ^m₊₊}. So x ∈ ℝⁿ is stationary for F if, and only if, for all v ∈ ℝⁿ we have J F(x), that is to say for all v there exists j = j (v)such that

and this condition implies

Note that for m = 1, we retrieve the classical stationarity condition for unconstrained scalar-valued optimization: ∇ F(x) = 0.

By definition, a point x ∈ ℝⁿ is nonstationary if there exists a direction v ∈ ℝⁿ such that J F(x)v < 0. The next result shows not only that such v is a ℝ^m₊-descent direction for F at x (there exists ε > 0 such that F(x + tv) < F(x) for all t ∈ (0,ε]), but it also estimates the ℝ^m₊- function decrease. As we will see later on, this result allows us to implement, by a backtracking procedure, an Armijo-like rule on all the multiobjective (descent) methods we will study here.

Lemma 2.1.Let F : ℝⁿ → ℝ^m be continuously differentiable, v ∈ ℝⁿ such that J F(x)v < 0 and σ ∈ (0,1). Then there exists some ε > 0 (which may depend on x, v and σ) such that

In particular, v is a descent direction for F at x.

Proof. Since F is differentiable, ∇ F_i(x)^Tv < 0 and σ ∈ (0,1),we have

So there exists ε > 0 such that

3 THE MULTIOBJECTIVE STEEPEST DESCENT METHOD

In this section we propose an extension of the classical steepest descent method. First we state and prove some results that will allow us to define the algorithm. Then we present its convergence analysis and, finally, we briefly expose a numerically robust version of the method which admits relative errors on computing the search directions at each iteration.

3.1 The search direction for the multiobjective steepest descent method

For a given point x ∈ ℝⁿ, define the function φ_x : ℝⁿ → ℝ by

As being the maximum of linear functions, φ_x is convex and positively homogeneous (φ_x (λv) = λφ_x (v) for all λ > 0 and all v ∈ ℝⁿ). Moreover, since u → max_i=1, ..., m u_i is Lipschitz continuous with constant 1, we also have

for all v, w ∈ ℝⁿ. Consider now the unconstrained scalar-valued minimization problem

Since the objective function is proper, closed and strongly convex, this problem has always a (unique) optimal solution v(x), which we call steepest descent direction, and is therefore given by

Let us call θ(x) the optimal value of (6), that is to say,

If m = 1, we fall back to scalar minimization and φ_x(v) = ∇F(x)^T v,so v(x) = -∇ F(x) and we retrieve the classical steepest descent direction at x ∈ ℝⁿ.

Note that, in order to get rid of the possible nondifferentiability of (6), we can reformulate the problem as

a convex problem with variables (τ, v) ∈ ℝ × ℝⁿ for which there are efficient numerical solvers.

Let us now give a characterization of stationarity in terms of θ(·) and v(·) and study some features of these functions.

Proposition 3.1.Let v: ℝⁿ → ℝⁿ and θ : ℝⁿ → ℝ given by (7) and (8), respectively. Then, the following statements hold.

1. θ(x) ≤ 0 for all x ∈ ℝⁿ.
2. The function θ(·) is continuous.
3. The following conditions are equivalent:
- (a) The point x is nonstationary.
- (b) θ(x) < 0.
- (c) v(x) ≠ 0.
  
  In particular, x is stationary if and only if θ(x) = 0.

Proof. 1. From (7) and (8), for any x ∈ ℝⁿ, we have θ(x)

2. Let us proof the continuity of θ in an arbitrary but fixed xº ∈ ℝⁿ. First let us show that the function x → v(x) is bounded on compact subsets of ℝⁿ. Let C ⊂ ℝⁿ be compact. Note that for all x ∈ C and all i = 1, ..., m, we have

where we used the Cauchy-Schwarz inequality, the definitions of φ_x and θ, as well as item 1. Therefore, since F is continuously differentiable, we conclude that there exists κ = κ(C) > 0 such that ||v(x)|| ≤ κ for all x ∈ C. In particular, there exists κ > 0 such that

where B(xº, 1) is the closed ball with center at xº and radius 1.

Now, for x ∈ B (xº, 1) and i ∈ {1, ..., m}, define

by

From (9), the family is equicontinuous. Consider now the family , where Φ_x = max_{i=1, ...,
m ϕ x,i}. Since u → max_i u_i is Lipschitz continuous with constant 1, is also an equicontinuous family of functions. Then for any ε > 0 and all x ∈ B (xº, 1), there exists 0 < δ < 1 such that

Hence for ||z - xº|| < δ, from the definitions of functions θ, v, Φ_x0 and the fact that Φ_x0(xº) = θ(xº), we have

so θ(z) - θ(xº) < ε. Interchanging the roles of z and xº, we conclude that, for ||z - xº|| < δ,

3. (a) ⇒ (b). Suppose that x is nonstationary. Then, there exists v ∈ ℝⁿ such that φ_x (v) = max_i ∇F(x)^Tv < 0. Therefore, from the optimality of v(x) and the fact that φ_x is positively homogeneous,

for all τ > 0. Taking 0 < τ < -2φx (v)/||v||², we see that θ(x) < 0.

(b) ⇒ (c). Since, by item 1, θ(x) ≤ 0, it is enough to notice that v(x) = 0 implies θ(x) = 0.

(c) ⇒ (a). If v(x) ≠ 0, then max_i ∇F_i(x)^Tv(x) = φ_x (v(x)) < θ(x) ≤ 0, so J F(x)v(x) ∈ J F(x)(ℝⁿ ) ∩[-ℝ^m₊₊] and x is nonstationary.

Note that for a nonstationary point x, using the above characterization, we have θ(x) < 0 and so, from (8), we get φ_x (v(x)) < < 0, which in turn implies that J F(x)v(x) > 0. Whence, by Lemma 2.1, there exists ε > 0 such that

and, in particular, v(x) is a descent direction, i.e., F(x + t v(x)) < F(x) ∀t ∈ (0,ε].

Finally, observe that, while proving Proposition 3.1, we saw that v: ℝⁿ → ℝⁿ, defined by (7), is bounded on compacts sets of ℝⁿ. Actually, it can be seen something stronger, namely that v(·) is a continuous function ^[²⁸[28] GRAÑA DRUMMOND LM & SVAITER BF. 2005. A steepest descent method for vector optimization. Journal of Computational and Applied Mathematics, 175(2):395–414.^], Lemma 3.3].

3.2 The multiobjective steepest descent algorithm

We are now in conditions to define the extension of the classical steepest descent method for the unconstrained multiobjective optimization problem (1).

Algorithm 3.2. The multiobjective steepest descent method (MSDM)

1. Choose σ ∈ (0, 1) and x⁰ ∈ ℝⁿ. Set k := 0.

2.

3. Compute θ(x^k) = = 0, then stop.

4. Choose t_k as the largest t ∈ {1/2^j: j = 0, 1, 2,...} such that

5. Set x^k+1 := x^k + t_k v^k, k := k + 1 and go to Step 2.

Some comments are in order. Since is strongly convex, v^k exists and is unique. Note that, by Proposition 3.1, the stopping rule θ(x^k) = 0 can be replaced by v^k = 0. Also note that, if at iteration k the algorithm does not reach Step 4, i.e., if it stops at Step 3, then, by Proposition 3.1, x^k is a stationary point for F.

If Step 4 is reached at iteration k, by our comments after Proposition 3.1 and Lemma 2.1, the computation of the step length in Step 4 ends up in a finite number of half reductions. Observe that instead of the factor 1/2 in the reductions of Step 4 we could have used any other scalar ν ∈ (0, 1). If the algorithm does not stop, from Step 4 and the fact that J F(x^k)v^k < 0, we have that the objective values sequence {F(x^k)} is ℝ^m₊-decreasing, i.e.,

If the algorithm stops at iteration k₀, the above inequality holds for k = 0, 1, ..., k₀ - 1.

3.3 Convergence analysis of the MSDM: the general case

Since Algorithm 3.2 ends up with a stationary point or produces infinitely many iterates, from now on, we assume that an infinite sequence {x^k} of nonstationary points is generated. We begin presenting a simple application of the standard convergence argument for the scalar-valued steepest descent method.

Theorem 3.3.Let {x^k} be a sequence produced by Algorithm 3.2. Every accumulation point of {x^k}, if any, is a stationary point.

Proof. Let x be an accumulation point of the sequence {x^k}. Consider v (x) and θ (x), given by (7) and (8), respectively, i.e., the solution and the optimal value of (6) at x = (x). According to Proposition 3.1, it is enough to prove that θ (x)= 0.

We know that the sequence {F(x^k)} is ℝ^m₊-decreasing (i.e., componentwise strictly decreasing), so we have that

Therefore,

But, by Steps 2-5 of Algorithm 3.2,

and therefore

which, componentwise, can be written as

Considering that t_k ∈ (0,1] for all k, we have the following two possibilities:

or

First assume that (11) holds. Then there exists a subsequence {x^kj}_j converging to x and t > 0 such that lim_{j→∞ tk j} = t. So, using (10), we get

where the last equality is a consequence of Proposition 3.1, 2. So, by Proposition 3.1, 3, we conclude that θ(x) = 0, and x is stationary.

Now assume that (12) holds. Due to the the fact that v(·) is bounded on compacts, v = v(x^k) for all k and {x^k} has a convergent subsequence, it follows that the sequence {v^k} has also a bounded subsequence. Therefore there exists v ∈ ℝⁿ and a subsequence {v^{k
j}}j such that lim_j→∞ v^{k j} = v and lim_j→∞ t_{k j} = 0. Note that we have

so letting j → ∞, we get

Take now a fixed but arbitrary positive integer q. Since t_{k j} → 0, for j large enough, we have

which means that the Armijo-like condition at x^kj in Step 4 of Algorithm 3.2 is not satisfied for t = 1/2^q, i.e.,

So for all j there exists i = i(k_j) ∈ {1,...,m} such that

Since {i(k _j )}_j ⊂ {1,...,m}, there exists a subsequence {k_jℓ}_ℓ and an index i₀ such that i₀ = i(k_jℓ) for all ℓ = 1, 2, ... and

Taking the limit ℓ → ∞ in the above inequality, we obtain

Since this inequality holds for any positive integer q and for i₀ (depending on q), by Lemma 2.1 it follows that J F(x) v 0, so

which, together with (13), implies θ(x) = 0. Therefore, we conclude that x is a stationary point.

Assume now that x⁰ ∈ {x ∈ ℝⁿ : F(x) ≤ F(x⁰)}, a bounded level set. Since {F(x^k)}is ℝ^m₊- decreasing, the whole sequence {x^k}is contained in the above set, so it is bounded and it has at least one accumulation point, which, by the previous theorem, is stationary for F.

3.4 Full convergence of the MSDM:the ℝ^m₊-convex case

In order to characterize v(x), the optimal solution of (6), we begin this section by recalling a well-known result on nonsmooth unconstrained optimization, which establishes optimality conditions for max-type real-valued objective functions.

Proposition 3.4.Let h_i : ℝⁿ → ℝ be a differentiable function for i = 1, ..., m. If ⁿ is an unconstrained minimizer of max_i=1, ..., m h_i (x), then there exists α = α (^m, with α_i ≥ 0 for all i and ∑^m_i=1, α = 1, such that, for _i h_i (, it holds := max ∈ ℝ) ∈ ℝ)

If h_i is convex for all i, the above conditions are also sufficient for the optimality of

Proof. First note that the problem min_x∈Rn max_i=1,...,m h_i (x) can be reformulated as

with optimal solution (ⁿ. Since the Lagrangian of this problem is given by L((τ,x),α) := τ + ∑^m_i=11 α_i(h_i(x) - τ), its Karush-Kuhn-Tucker (KKT) conditions at (ⁿ, where h_i(0) for all i). So there exists a vector of KKT multipliers α = α (, , , 0) ∈ ℝ × ℝ > ) ∈ ℝ × ℝ) are just those stated in this proposition. Note now that (14) has a Slater point (e. g.,() such that those conditions are satisfied.

For the converse, just note that as (14) is a convex problem on the variables (τ, x),the KKT conditions are sufficient for the optimality of (, ).

By (6)-(7), v(x) = argmin_v max_i ∇F_i(x)^T v+(1/2)||v||². Therefore, in view of the above proposition, there exists ω(x) := α(v(x)) ∈ ℝ^m₊ with ∑^m_i=1 ω_i(x) = 1 such that

and so

The above equality can be rewritten as

Note that (15) tells us that the steepest descent direction v(x) is a classical steepest descent direction for the scalar-valued problem min_x ∑^m_i=1 ω_i F_i(x), whose weights ω_i := ω_i (x) ≥ 0 are the a priori unknown KKT multipliers for problem (14), with h_i = F_i for all i, at (θ(x), v(x)).

Let us now introduce an important concept. We say that the mapping F : ℝⁿ → ℝ^m is ℝ^m₊-convex if

Clearly, F : ℝⁿ → ℝ^m is ℝ^m₊-convex if and only if its components F_i : ℝⁿ → ℝ are all convex. Under this additional assumption, we have the following extension of the well-known scalar inequality which establishes that a convex differentiable function always overestimates its linear approximation:

In fact, this vector inequality is equivalent to the following m scalar inequalities F_i(u) ≥ F_i (w) + ∇F_i (w)^T(u - w) for all u, w ∈ ℝⁿ and all i.

Let us now state and prove a simple result relating Pareto optimality and stationarity.

Lemma 3.5Letbe a weak Pareto optimum for F. Thenis a stationary point. If, in addition, F is ℝ^m₊-convex, this condition is also sufficient for weak Pareto optimality. So, under ℝ^m₊-convexity, the concepts of weak efficiency and stationarity are equivalent.

Proof. First, assume that v ∈ ℝⁿ such that J F (v < 0, so (3) holds for x = is weak Pareto. As we know, if is not a stationary point, there exists ), in contradiction with the weak Pareto optimality of

Now suppose that ^m₊-convex mapping F and take any x ∈ ℝⁿ. Since x = v := x - j = j (u = x and w = F_j(x) ≥ F_j F. is stationary for the ℝ is stationary, (2) holds for , and some ). Therefore, by (18) for , and so is weak a Pareto optimum for

We will keep on studying the case in which the algorithm does not have finite termination and therefore it produces infinite sequences {x^k}, {v^k} and {t_k}. Let us now state the additional assumption under which, in the R^m₊-convex case, we will prove full convergence of {x^k} to a stationary point or, in view of the above lemma, to a weak unconstrained optimum of F.

Assumption 3.6.Every ℝ^m₊-decreasing sequence {y^k} ⊆ F(ℝⁿ) := {F(x): x ∈ ℝⁿ} is ℝ^m₊-bounded from below by a point in F(ℝⁿ), i.e., for any {y^k} contained in F(ℝⁿ) with y^k+1 < y^k for all k, there exists ⁿ such that F (y^k for all k. ∈ ℝ) ≤

Some comments concerning the generality/restrictiveness of this assumption are in order. In the classical unconstrained (convex) optimization case, this condition is equivalent to existence of solutions of the optimization problem. This assumption, known as ℝ^m₊- completeness, is standard for ensuring existence of efficient points for vector optimization problems [^[³⁵[35] LUC DT. 1989. Theory of vector optimization. In Lecture Notes in Economics and Mathematical Systems, 319, Berlin, Springer.^], Section 3].

We will need the following technical result in order to prove that the MSDM is convergent.

Lemma 3.7.Suppose that F is ℝ^m₊-convex and let ⁿ be such that F (F(x^k) for some k. Then, we have ∈ ℝ) ≤

Proof. By (16), there exists w^k := w(x^k) ∈ ℝ^m₊ such that

Using the ℝ^m₊-convexity of F, we have that (18) holds, so F(x^k) + J F(- x^k ) ≤ F (F (^k), we get ). Since ) ≤ F(x

Taking into account that w^k ∈ ℝ^m₊ and using the above vector inequality as well as (19), we get

Recall that x^k+1 = x^k + t_kv^k, with t_k > 0. Therefore, from the above inequality we get

which implies the desired inequality, because

We still need a couple of technical results. The first one, basically, establishes that any accumulation point of a sequence produced by the MSDM furnishes an ℝ^m₊-lower bound for the whole objective values sequence. We state it for an arbitrary sequence of nonstationary points, because we will also need it for the convergence analysis of another method.

Lemma 3.8.Let {x^(k)} be any infinite sequence of nonstationary points in ℝⁿ such that F(x^(k+1)) ≤ F(x^(k)) for all k.

If x is an accumulation point of {x^(k)},then

and

Also, F is constant in the set of accumulation points of {x^(k)}.
If {x^k}, generated by Algorithm 3.2, has an accumulation point, then all results of item 1 hold for this sequence.

Proof. 1. Let {x^(kj)}j be a subsequence that converges to x. Take a fixed but arbitrary k. For j large enough, we have k_j > k and F(x^(kj)) ≤ F(x^(k)). So letting j → ∞, we get

Since the above inequality holds componentwise and F_i (x^(kj)) → F_i (x) as j → ∞ for all i, we have lim_k→∞ F(x^(k)) = F (x)

Now let x^(k)}. Then there exists a subsequence {x^(kℓ)}_ℓ that converges to x) ≤ F(x^(kℓ)) for all ℓ, so letting ℓ → ∞, we get F(x) ≤ F(x and x). These two ℝ^m₊-inequalities imply that F(x). be another accumulation point of {. By (20), F(). Interchanging the roles of , by the same reasoning, we get F() ≤ F() = F(

2. It suffices to note that any (infinite) sequence {x^k} generated by Algorithm 3.2 satisfies F(x^k+1) < F(x^k) for all k.

We now study the speed of convergence to zero of the sequences {v^k} and {θ(x^k)}.

Lemma 3.9.Suppose that {F(x^k)} is ℝ^m₊-bounded from below by y. Then

Proof. We know that F_i(x^k+1) ≤ F_i(x^k)+σt_k∇F_i(x^k)^Tv^k for any i and all k. Adding up from k = 0 to k = N, we get

where we used (4) and (8) for x = x^k. As y and θ(x^k) < 0 for all i, k (Proposition 3.1, 3), we obtain

Since the last inequality holds for any nonnegative integer N, the conclusions follow immediately.

Before stating and proving the last theorem of this section, let us introduce the main tool for the convergence analysis. Recall that a sequence {u^k} ⊂ ℝⁿ is quasi-Fejér convergent to a set U ⊂ ℝⁿ if for every u ∈ U there exists a sequence {ε_k} ⊂ ℝ, ε_k ≥ 0 for all k and such that

We will need the following well-known result concerning quasi-Fejér convergent sequences, whose proof can be found in ^[⁷[7] BURACHIK R, GRAÑA DRUMMOND LM, IUSEM AN & SVAITER BF. 1995. Full convergence of the steepest descentmethod with inexact line searches. Optimization, 32(2):137–146.^].

Theorem 3.10.If a sequence {u^k} is quasi-Fejér convergent to a nonempty set U ⊂ ℝⁿ, then {u^k} is bounded. If, furthermore, {u^k} has a cluster point u which belongs to U, then lim_k→∞u^k = u.

In ^[⁷[7] BURACHIK R, GRAÑA DRUMMOND LM, IUSEM AN & SVAITER BF. 1995. Full convergence of the steepest descentmethod with inexact line searches. Optimization, 32(2):137–146.^] it is proved that the steepest descent method for smooth (scalar) convex minimization, with step size obtained using the Armijo rule implemented by a backtracking procedure, is globally convergent to a solution (essentially, under the sole assumption of existence of optima). Using the same techniques, we extend this result to the multiobjective setting, that is to say, we show that any sequence produced by the MSDM converges to an optimum of problem (1), no matter how poor the initial guess might be.

Theorem 3.11.Suppose that F is ℝ^m₊-convex and that Assumption 3.6 holds. Then any sequence {x^k} produced by Algorithm 3.2 converges to a weak Pareto optimal point x^* ∈ ℝⁿ.

Proof. As {F(x^k)} is an ℝ^m₊-decreasing sequence, by Assumption 3.6, there exists ⁿ such that ∈ ℝ

Now observe that 0 < t_k ≤ 1 for all k, so

Therefore, from (21), Lemma 3.9 and the above inequality, it follows that

Define

Using the ℝ^m₊-convexity of F and (21), from Lemma 3.7, we see that for any x ∈ L

As L is nonempty, because L, from (22) and the above inequality, it follows that {x^k} is quasi-Fejér convergent to the set L. Therefore, {x^k} has accumulation points. Let x^* be one of them. By Lemma 3.8, x^* ∈ L. Then, using Theorem 3.10, we see that the whole sequence {x^k} converges to x^*. We finish the proof by observing that Theorem 3.3 guarantees that x^* is stationary and so, from the ℝ^m₊-convexity of F, by virtue of Lemma 3.5, x^* is a weak Pareto optimum. ∈

3.5 The MSDM with relative errors

Here we briefly sketch an inexact version of the MSDM. Instead of computing v^k = v(x^k) = argmin x ∈ ℝⁿ and a given tolerance ε ∈[0, 1), we say that a direction v ∈ ℝⁿ is an ε-approximate solution of min_v∈ℝⁿ if in Step 2 of Algorithm 3.2, we accept approximate solutions of the problem with some prespecified tolerance. For a nonstationary point

where φ_x(v) = max_i=1,...,m ∇F_i(x)^Tv and θ(·) is the optimal value of the above optimization problem. Calling h_x the objective function in this problem, i.e., h_x (v) :=v is an ε-approximate solution if, and only if, the relative error of h_x at v is at most ε,that is to say,, we see that

Clearly, the exact direction v(x) is always ε-approximate at x for any ε ∈ [0, 1). Moreover, v(x) is the unique 0-approximate direction at x. Also note that given a nonstationary point x ∈ ℝⁿ and ε ∈ [0, 1), an ε-approximate direction v is always a descent direction. Indeed, from Proposition 3.1 and the definition of ε-approximate solution, it follows that for all i. Whence, J F(x)v < 0 and so, by lemma 2.1, v is a descent direction for F at x.

We can now define the inexact MSDM in almost the same way as its exact counterpart: in Step 1, there is also the choice of the tolerance parameter ε ∈[0, 1), in Step 2, v^k is taken as an ε-approximate solution of and, finally, the stopping criterion in Step 3 becomes h_xk(v^k)= 0. In ^[¹⁹[19] FLIEGE J & SVAITER BF. 2000. Steepest descent methods for multicriteria optimization. Mathematical Methods of Operations Research, 51(3):479–494.^,²⁸[28] GRAÑA DRUMMOND LM & SVAITER BF. 2005. A steepest descent method for vector optimization. Journal of Computational and Applied Mathematics, 175(2):395–414.^] it is shown that the method is well-defined and that it has the same convergence features as the exact one.

4 THE MULTIOBJECTIVE PROJECTED GRADIENT METHOD

The scheme of this section is similar to the previous one. Here, for the constrained multiobjective problem, we propose an extension of the classical scalar-valued projected gradient method. First, we prove some technical results. Then, we present the convergence analysis of the method. And, finally, we sketch a numerically robust version of the method which admits relative errors on computing the search directions at each iteration.

Let F : ℝⁿ → ℝ^m be a continuously differentiable mapping and C ⊆ ℝⁿ be a closed and convex set. We are now interested in the following constrained multiobjective optimization problem:

For this problem, we seek a Pareto optimum (or efficient point), i.e., a feasible point (x^* ∈ C) such that there does not exist x ∈ C with F(x^*) ≤ F(x^*) and F(x) ≠ F(x^*), or we look for a weak Pareto optimum (or weakly efficient point), that is to say, a feasible x^* such that there does not exist x ∈ C with F(x) < F(x^*).

4.1 The search direction for the multiobjective projected gradient method

For the constrained multiobjective problem, a necessary condition for optimality of a point x ∈ C is stationarity (or criticality):

where J F(x)(C − x) := {J F(x)(x− x) : x ∈ C}. So x ∈ C. So x ∈ C is stationary for F if, and only if, for all v ∈ C - x we have J F(x) v m = 1, we retrieve the classical stationarity condition for constrained scalar-valued optimization: (∇F(x), x − x) ≤ for all x ∈ C. 0. Note that when

Let us now develop an extension of the classical (scalar) projected gradient method for the vector-valued problem (23). First, for x ∈ C, we extend the search direction, originally defined for C = ℝⁿ in (7), by

where β > 0 is a parameter and φ_x is defined in (4). Note that, in view of the strong convexity of the minimand, for any x ∈ C, the projected gradient direction v(x) is always well-defined.

Now we extend the optimal value function θ(·) (given by (8) in the unconstrained case):

We also need the following extension of Proposition 3.1 to the constrained case.

Proposition 4.1.Let v: C → ℝⁿ and θ : C → ℝ be given by (24) and (25), respectively. Then, the following statements hold.

θ(x) ≤ 0 for all x ∈ C.
The function θ(·) is continuous.
The following conditions are equivalent:
- (a) The point x ∈ C is nonstationary.
- (b) θ(x) < 0.
- (c) v(x) ≠ 0.
- In particular, x is stationary if and only if θ(x) = 0.

Proof. 1. Recalling that 0 ∈ C - x, the result follows as in the proof of Proposition 3.1.

2. Let x ∈ C and {x^(k)} ⊂ C be a sequence such that lim_k→∞ x^(k) = x. Since v(x) ∈ C - x, we have v(x) + x - x^(k) ∈ C - x^(k). Thus, from (24) and (25), we have

Hence, from the first inequality of (5), we obtain

Since F is continuously differentiable and u → max_i{u_i} is continuous, taking lim sup_k→∞ on both sides of the above inequality yields

Now, observe that since v(x^(k)) ∈ C - x^(k), we have v(x^(k)) + x^(k) - x ∈ C - x. Once again from (24) and (25), we obtain

Using once again the first inequality of (5), we have

Note that for all i and any z ∈ C, from the Cauchy-Schwarz inequality and item 1, we have that F is continuously differentiable and {x^(k)} is bounded, because it converges, we see that ||v(x^(k))|| is bounded for all k. So, taking lim inf_k→∞ on both sides of the above inequality, we get. Therefore, since

where the second inequality follows from (5). We complete the proof by combining this inequality with with (26).

3. All results follow as in the proof of Proposition 3.1. (A formal proof of this equivalences for a more general case can be found in [^[²⁵[25] GRAÑA DRUMMOND LM & IUSEM AN. 2004. A projected gradient method for vector optimization problems. Computational Optimization and Applications, 28(1): 5–29.^], Proposition 3.]

4.2 The multiobjective projected gradient algorithm

We can now define the extension of the classical (scalar) projected gradient method for the constrained multiobjective optimization problem (23).

Algorithm 4.2. The multiobjective projected gradient method (MPGM)

Choose β > 0, σ ∈ (0,1) and x⁰ ∈ C. Set k := 0.
Compute θ(x^k)= =0, then stop.
Choose t_k as the largest t ∈ {1/2^j : j = 0, 1, 2, ...} such that F(x^k + tv^k) ≤ F(x^k) + σ t J F(x^k)v^k.
Set x^k+1 := x^k + t_kv^k, k := k + 1 and go to Step 2.

Note that, whenever C = ℝⁿ, taking β = 1, we retrieve the MSDM. Observe that, by virtue of Proposition 4.1, an alternative stopping criterion in Step 3 is v^k = 0. Note also that Algorithm 4.2 either terminates in a finite number of iterations with a stationary point or it generates an infinite sequence of nonstationary points (Proposition 4.1).

Now observe that if Step 4 is reached, by virtue of Lemma 2.1, in finitely many half reductions, we obtain the step length which is used in Step 5 to define the next iterate. Note that instead of the factor 1/2 in the reductions of Step 4, we could have used any other scalar ν ∈ (0,1). Finally, observe that since θ(x^k) < 0 implies J F(x^k)v^k < 0, from steps 3-5, it follows that whenever the method generates and infinite sequence, the objective values are ℝ^m₊-decreasing, i.e.,

If the method stops at iteration k₀, this inequality hold for k = 0, 1, ..., k₀ - 1.

4.3 Convergence analysis of the MPGM: the general case

As we know, Algorithm 4.2 has finite termination, ending with a stationary point, or generates an infinite sequence of nonstationary iterates. In the sequel we study the second case. First, we show a simple fact: the algorithm produces feasible sequences.

Lemma 4.3.Let {x^k} ⊂ ℝⁿ be a sequence generated by Algorithm 4.2. Then, we have {x^k} ⊂ C.

Proof. Let us use an inductive argument. From the initialization of the algorithm, x⁰ ∈ C. Now assume that x^k ∈ C. Observe that, by Step 4 of Algorithm 4.2, t_k ∈ (0, 1] and, by Step 2, v^k ∈ C - x^k . Then, for some z^k ∈ C, v^k = z^k - x^k and from the convexity of C, it follows that x^k+1 = x^k + t_kv^k = (1 -t_k)x^k + t_kz^k ∈ C and the proof is complete.

Now we see that whenever a sequence produced by Algorithm 4.2 has accumulation points, they are all feasible stationary points for the problem.

Theorem 4.4.Every accumulation point, if any, of a sequence {x^k} generated by Algorithm 4.2 is a feasible stationary point.

Proof. The feasibility follows combining the fact that C is closed with Lemma 4.3. The rest of the proof is similar to the one of Theorem 3.3: it suffices to use the definitions of v(x) and θ(x) given in (24) and (25), respectively.

4.4 Full convergence of the MPGM: the ℝ^m₊-convex case

Let us note that, as being the maximum of convex functions, the objective of the minimization problem which defines v(x) in (24) is also convex and, as we also have that C - x is a closed convex subset of ℝⁿ, by ^[³[3] BERTSEKAS DP. 2003. Convex Analysis and Optimization. Athena Scientific, Belmont, MA.^], Proposition 4.7.2, a necessary condition for the optimality of v(x) is the existence of w(x) ∈ ℝ^m with

such that

This condition can be written as

Now using a very well-known characterization of orthogonal projections (see, for instance, [^[³[3] BERTSEKAS DP. 2003. Convex Analysis and Optimization. Athena Scientific, Belmont, MA.^], Proposition 2.2.1], we have

where P_C(z) := inf{||u - z|| : u ∈ C} is well-defined because C is closed and convex. We conclude that there exists w(x) ∈ ℝ^m satisfying (27) such that

Observe that in the constrained scalar-valued case (m = 1), we have J F(x)^T = ∇F(x) and w(x) = 1, which means that

and this is precisely the search direction used in the projected gradient method for real-valued optimization. So this is an extension to the multiobjective setting of the classical projected gradient method.

Lemma 4.5.Suppose that F is ℝ^m₊-convex. Let {x^k} be an infinite sequence generated by Algorithm 4.2. If for C and k ≥ 0, we have F (F(x^k), then ∈ ) ≤

where w^k := w(x^k) ∈ ℝ^m is such that (27) and (29) hold with x = x^k.

Proof. Since x^k+1 = x^k + t_kv^k, we have

Let us analyze the rightmost term of the above expression. Since v^k = v(x^k) and w^k = w(x^k), by (28) with x = x^k, we get

Taking v = - x^k ∈ C - x^k in the above inequality, we obtain

Now, observe that, the convexity of F and (18) yield J F(x^k)(^k) ≤ F(^k). This fact, together with w^k ≥ 0, and F (^k) implies - x) − F(x) ≤ F(x

Also, since J F(x^k)v^k < 0, because x^k is nonstationary, we have 〈w^k, J F(x^k)v^k〉 < 0. Thus, recalling that β > 0, from (31) it follows that

The above inequality, together with (30), shows that

And the result follows because t_k ≥ 0.

As in the unconstrained case, we need to make an extra assumption in order to prove full convergence of the method.

Assumption 4.6.Every ℝ^m₊-decreasing sequence {y^k} ⊂ F(C) := {F(x): x ∈ C} is ℝ^m₊-bounded from below by an element of F(C).

We can now state and prove the main result on the convergence analysis of the MPGM: under similar conditions of those of Theorem 3.11, regardless of the initial (feasible) guess, the method converges to an optimum of problem (23).

Theorem 4.7.Assume that F : ℝⁿ → ℝ^m is ℝ^m₊-convex and that Assumption 4.6 holds. Then, any sequence {x^k} generated by Algorithm 4.2 converges to a weak Pareto optimum.

Proof. Let us define the set

and take L, which exists by Assumption 4.6. Since F is ℝ^m₊-convex, it follows from Lemma 4.5 that ∈

First, we prove that the sequence {x^k} is quasi-Féjer convergent to the set L. Let {e¹,..., e^m} be the canonical basis of ℝ^m. We observe that, for each k, we can write

Let us note that, from (27), 0 ≤ w^k_i ≤ 1 for all i and k. Thus, from (32), we have

Defining ε_k := 2βtk ∑^m_i=1 |〈eⁱ, J F(x^k)v^k〉|, we have that ε_k ≥ 0 and

Thus, let us prove that ∑^∞_k=0 ε_k < ∞. From the Armijo-like condition in Step 4 of the algorithm, we obtain

Since x^k is nonstationary, we also have for all i and k that

which means that -t_k〈eⁱ, J F(x^k)v^k〉 = t_k|〈eⁱ, J F(x^k)v^k〉|. Hence, we obtain

Adding up from k = 0 to k = N at the above inequality, where N is any positive integer, we get

Since N is an arbitrary positive integer, β, σ > 0 and F (F(x^k) for all k, we conclude that ∑^∞_k=0 ε_k < ∞. Therefore, recalling that L, we see that {x^k} converges quasi-Féjer to L.) ≤ was an arbitrary element of

By Theorem 3.10, it follows that {x^k} is bounded. So, since C is closed, {x^k} ⊂ C has at least one feasible accumulation point, say x^*, which is feasible and stationary by Theorem 4.4. Using now the ℝ^m₊-convexity of F, from Lemma 3.5, it follows that x^* ∈ C is a weak Pareto optimum.

As in the end of the proof of Theorem 3.11, we now see that x^* ∈ L. Since F(x^k+1) < F(x^k) for all k, by Lemma 3.8, 1, F(x^*) ≤ F(x^k) for all k, and x^* ∈ L. So, once again from Theorem 3.10, we conclude that {x^k} converges to x^* ∈ C, a weak Pareto optimum.

4.5 The MPGM with relative errors

As we did for the MSDM in Subsection 3.5, we present here a quick overview of an MPGM version implemented with relative errors in the search directions. Given a nonstationary x ∈ C and a tolerance ε ∈ [0, 1), we say that v ∈ C - x is an ε-approximate solution of the problem min_{v∈C -
x φx}(v) + v||² if ||

where h_x(v) := φx (v) + 1/2||v||², with φ_x given by (4).

Similar comments as those made in Subsection 3.5 for ε-approximate solutions for unconstrained problems can be made here for the constrained case. The modification of the (exact) MPGM are basically the same as those proposed in Subsection 3.5 for the (exact) MSDM: in Step 2 of Algorithm 4.2, instead of defining v^k as the exact solution of min_{v∈C - x}k φ_xk(v) + 1/2||v||², we take v^k as an ε-approximate solution of this problem. The stopping rule at Step 3 (θ(x^k) = 0) is replaced by h_x^k(v^k) = 0.

For a more general case (vector optimization), a comprehensive study of the inexact version of the MPGM can be found in ^[²²[22] FUKUDA EH & GRAÑA DRUMMOND LM. 2013. Inexact projected gradient method for vector optimization. Computational Optimization and Applications, 54(3): 473–493.^], where, under essentially the same hypotheses made here for the (exact) MPGM, all convergence results are fully proved.

5 THE MULTIOBJECTIVE NEWTON METHOD

We now go back to the unconstrained multiobjective problem (1). In order to define a Newton-type search direction for this kind of problems, we first recall some convexity notions for vector-valued mappings. We say that F : ℝⁿ → ℝ^m is strictly ℝ^m₊-convex if the inequality (17) is satisfied strictly, that is, if F is componentwise strictly convex. We also say that F is strongly ℝ^m₊-convex if there exists û ∈ ℝ^m₊₊ such that

for all x, y ∈ ℝⁿ and all λ ∈ (0, 1), which is equivalent to say that F is componentwise strongly convex (each component F_i has û_i as its modulus of strong convexity). Clearly, as in the scalar case, strong ℝ^m₊-convexity implies strict ℝ^m₊-convexity, which, in turn, implies ℝ^m₊-convexity.

The following result relates Pareto optimality with strict ℝ^m₊-convexity and characterizes strong ℝ^m₊-convexity in terms of the eigenvalues of the objective functions' Hessians.

Lemma 5.1.Assume that F : ℝⁿ → ℝ^m is twice continuously differentiable. Then, the following properties hold.

If F is strictly ℝ^m₊-convex and x ∈ ℝⁿ is stationary for F, then x is a Pareto optimal point.
The mapping F is strongly ℝ^m₊-convex if and only if there exists ρ > 0 such that

where^λmin : ℝ^n×n → ℝ denotes the smallest eigenvalue function.

Proof. It follows from [^[²⁷[27] GRAÑA DRUMMOND LM, RAUPP FMP & SVAITER BF. 2011. A quadratically convergent Newton method for vector optimization. Optimization, 2011. Accepted for publication.^], Corollary 2.2 and Proposition 2.3].

5.1 The search direction for the multiobjective Newton method

From now on, we will assume that F is strongly ℝ^m₊-convex and twice continuously differentiable. For all i = 1, ..., m, let us define ψi : ℝⁿ × ℝⁿ → ℝ by

Hence, for any i, s → F_i(x) + ψ_i(x, s) is the local quadratic approximation of F_i at x. For x ∈ ℝⁿ, we define the Newton direction for F at x as

Since ψ_i (x, ·) is strongly convex for all i, s → max_i ψ_i (x, ·) is also strongly convex and it has a unique minimizer in ℝⁿ, and so s(x) is well-defined. Clearly, for m = 1 we retrieve the Newton direction for unconstrained real-valued optimization.

Observe that whenever ∇²F_i(x) is equal to I, the identity matrix of ℝ^n×n, for all i, we have that max_i=1,...,m ψ_i (x, s) = φ_x(s) + (1/2)||s||², where φ is given by (4). Therefore, in multiobjective optimization, if ∇²F_i(x) = I for all i, then, as it happens in the scalar case, the Newton and the steepest descent directions coincide.

Let us call θ(x) the optimal value of the minimization problem min_s∈ℝⁿ max_i=1,...,m ψ_i (x, s), that is to say,

In order to compute s(x), we can reformulate the (possibly nonsmooth) min-max problem as:

a smooth convex problem with variables (τ, v) ∈ ℝ × ℝⁿ for which there are many efficient numerical solvers.

We now present analogous results as those of Propositions 3.1 and 4.1.

Proposition 5.2.Let s : ℝⁿ → ℝⁿ and θ : ℝⁿ → ℝ be given by (34) and (35), respectively. Then, the following statements hold.

θ(x) ≤ 0 for all x ∈ ℝⁿ.
The following conditions are equivalent:
1. (a) The point x is nonstationary.
2. (b) θ(x) < 0.
3. (c) s(x) ≠ 0.
4. In particular, x is stationary if and only if θ(x) = 0.

Proof. 1. By (33) and (35), θ(x) ≤ max_i ψ_i (x, 0) =0.

2. (a) ⇒ (b). By (a), there exists ⁿ such that J F(x) ⁿ₊₊, which means that ∇F_i(x)^T i. From (33) and (35), for τ > 0, we get ∈ ℝ ∈ -ℝ < 0 for all

Therefore, for τ > 0 small enough the right hand side of the above inequality is negative and (b) holds.

(b) ⇒ (c). Note that if θ(x) < 0 then, in view of (33) and (35), we have s(x) ≠ 0.

(c) ⇒ (a). From the strong ℝ^m₊-convexity of F, it follows that ∇²F_i(x) is positive definite for all i, so from (c) and item 1, we get

Hence, J F(x)s(x) < 0 and (a) holds.

In particular, we just saw that, for any nonstationary x ∈ ℝⁿ, s(x) is a descent direction. The next lemma supplies an implicit definition of s(x), which will be used in order to obtain bounds for ||s(x)||. These bounds will allow us to prove continuity of s(·), θ(·) and will be used to establish the rate of convergence of the Newton method.

Lemma 5.3.For any x ∈ ℝⁿ, the Newton direction defined in (34) can be written as

with α_i = α_i(x) ≥ 0 and ∑^m_i=1 α_i = 1.

Proof. We know that the convex problem min_s max_i ψ_i (x, s), where ψ_i is defined in (33), has a unique solution s(x). Applying Proposition 3.4, we see that there exists α := α(x) ∈ ℝ^m such that the following conditions hold at s = s(x) and τ = θ(x):

Since ∇²F_i(x) is positive definite and the multipliers α_i are nonnegative for all i, from both conditions in (37), the result follows.

So s(x), defined in (34), is a Newton direction for a standard scalar optimization problem (min_x ∑^m_i=1 α_iF_i(x)), implicitly induced by weighting the given objective functions by the non-negative a priori unknown KKT multipliers. We now give upper bounds for the norm of the Newton direction.

Corollary 5.4.For any x ∈ ℝⁿ, we have

where ρ > 0 is given by Lemma 5.1.

Proof. Let us proof the first inequality. By Proposition 5.2, we have θ(x) ≤ 0, which, by (35), means that max_i ψ_i (x, s(x)) ≤ 0. Hence, by (33),

From Lemma 5.1, 2, we get

The proof follows from these two inequalities and the operator norm properties.

We now proof the second inequality. We know that the complementarity condition (39) holds for (τ, s) = (θ(x), s(x)). So, by (33) and (35), α_i > 0 implies θ(x) = ψ_i (x, s(x)), and then, using the first equality in (37), we have

Replacing ∑^m_i=1 α_i∇F_i(x) by its expression obtained from (36), using Lemma 5.1, 2 and the fact that ∑_i α_i = 1 with α_i ≥ 0 for all i, we conclude that

and the proof follows immediately from Proposition 5.2, 1.

We now establish the continuity of the functions θ(·) and s(·).

Proposition 5.5.The functions θ : ℝⁿ → ℝ and s : ℝ → ℝⁿ, given by (35) and (34) respectively, are continuous.

Proof. Take x, x' ∈ ℝⁿ. Let s = s(x), s' = s(x') and i₀ such that max_i ψ_i (x', s') = ψ_i0 (x', s). Then, from (35), (33) and the Cauchy-Schwarz inequality,

where D²F stands for the second derivative of F. Therefore, from the above inequality and Corollary 5.4 we conclude that

Since the above inequality also holds interchanging x and x^' and F is twice continuously differentiable, we conclude that θ(·) is continuous.

Now we prove the continuity of s(·). By the same reasoning as above,

where we used the Cauchy-Schwarz inequality and (33) with x = x'. By Lemma 5.1, 2 λmin(∇²F_i(x)) ≥ ρ for all x ∈ ℝⁿ and all i. Therefore, ψ_i (x', ·) is strongly convex with modulus ρ for all i and so is max_i ψ_i (x', ·). Hence, using the fact that max_i ψ_i (x', ·) is minimized at s', we have

Combining inequality (40) with the above one, we get

Since θ(·) is continuous and F is twice continuously differentiable, using Corollary 5.4 and taking the limit x' → x in both sides of the above inequality, we conclude that s' → s and so s(·) is continuous.

5.2 The multiobjective Newton algorithm

Let us now describe the extension of the classical (scalar-valued) Newton method for the unconstrained multiobjective optimization problem (1).

Algorithm 5.6. The multiobjective Newton method (MNM)

Choose σ ∈ (0,1) and x⁰ ∈ ℝⁿ. Set k := 0.
Compute θ(x^k) = = 0, then stop.
Choose t_k as the largest t ∈ {1/2^j : j = 0, 1, 2, ...} such that
Set x^k+1 := x^k + t_kv^k, k := k + 1 and go to Step 2.

As usual, the stopping rule at Step 3 can be replaced by s^k = 0 (Proposition 5.2). Now we show that if, at Step 3, θ(x^k) ≠ 0, the backtracking procedure prescribed in Step 4 is well-defined, i.e., it ends up in a finite number of (inner) iterations. Indeed, it suffices to note that

for any nonstationary x,t > 0 and all i. As J F(x)s(x) < 0, from lemma 2.1 it follows that there exists ε = ε(x) > 0 such that the first, and therefore the second, inequality in (41) hold for any t ∈ (0, ε). So, for x = x^k, we will need finitely many half reductions in order to obtain t_k ∈ (0,ε_k), where ε_k = ε(x^k). Observe also that the reducing factor 1/2 at Step 4 can be substituted by any ν ∈ (0,1).

Moreover, when θ(x^k) ≠ 0 for all k, we know that we have θ(x^k) < 0, so from step 4 it follows that

This inequality holds for k = 0, ..., k₀ - 1 if the algorithm stops at iteration k₀. Finally, we point out that, by Proposition 5.2, whenever the method has finite termination, it ends up with a stationary point and therefore a Pareto optimum for F (Lemma 5.1, 1). Therefore, the relevant case for the convergence analysis is the one in which the algorithm generates infinite sequences {x^k}, {s^k} and {t_k}. So, from now on, we suppose that the method does not stop.

5.3 Convergence analysis of the MNM

Let us begin with a technical result which will be useful in the sequel.

Proposition 5.7.For all k = 0, 1, ... and all i = 1, ..., m, we have

Proof. From Step 3 of Algorithm 5.6, for any i and all j, we have

We complete the proof by adding up for j = 1, ..., k and taking into account that θ(x^j) < 0 for all j.

We can now prove that if the sequence {x^k} has accumulation points, they are all Pareto optima for F.

Proposition 5.8.If (x) ∈ ℝⁿ is an accumulation point of {x^k}, then (x) is Pareto optimum for F.

Proof. There exists a subsequence {x^kj}_j such that

Using Proposition 5.7 with k = k_j and taking the limit j → ∞, from the continuity of F, we conclude that

Therefore, lim_k→∞t_kθ(x^k) = 0 and, in particular,

Suppose that x is nonstationary, which, by Proposition 5.2, is equivalent to

Define

Using the definition of θ(·) and Lemma 2.1, we conclude that there exists a nonnegative integer q such that

Since s(·), θ(·) and g(·) are continuous,

and, for j large enough

which, in view of the definition of g and Step 4 of the algorithm, implies that t_{k_j}≥ 2^-q > 0 for j large enough. Hence, from the second limit in (44), we conclude that lim inf_j→∞ t_k_j|θ(x^k_j)| > 0, in contradiction with (42). Therefore, (43) is not true and, by Proposition 5.2, x is stationary. Whence, in view of Proposition 5.1, 1, x is a Pareto optimal point for F.

We can now show that, in order to have full convergence to a Pareto optimum, it suffices to take the initial point x⁰ in a bounded level set of F.

Theorem 5.9.Let x⁰ ∈ ℝⁿ be in a compact level set of F and {x^k} a sequence generated by Algorithm 5.6. Then {x^k} converges to a Pareto optimum x^* ∈ ℝⁿ.

Proof. Let ⌈₀ be the F(x⁰)-level set of F, that is, ⌈₀ := {x ∈ ℝⁿ : F(x) ≤ F(x⁰)}. We know that the sequence {F(x^k)} is ℝ^m₊-decreasing, so we have x^k ∈ ⌈₀ for all k. Therefore {x^k} is bounded, all its accumulation points belong to ⌈₀ and so, using Proposition 5.8, we conclude that they are all Pareto optima for F. Moreover, applying Lemma 3.8, 1 with x^(k) = x^k for all k, we see that all these accumulation points have the same objective value. As F is strongly (an therefore strictly) R^m ₊-convex, there exists just one accumulation point of {x^k}, say x^*, and the proof is complete.

5.4 Convergence rate of the MNM

Here we analyze the convergence rate of any infinite sequence {x^k} generated by Algorithm 5.6. First, we provide a bound for θ(x^k+1) based on data at the former iterate x^k. Then we show that for k large enough, full Newton steps are performed, that is to say, t_k = 1. Using this result, we prove superlinear convergence. Finally, under additional regularity assumptions, we prove quadratic convergence.

Lemma 5.10.For any k, there exist α^k ∈ ℝ^m such that ∑^m_i=1 α^k_i = 1, α^k_i ≥ 0 for all i with

and

where ρ > 0 is given by Lemma 5.1.

Proof. By virtue of (37)-(38) with x = x^k, s = s^k, the Lagrangian condition (45) holds for any k with a vector of KKT multipliers α = α(x^k) =: α^k ≥ 0 such that ∑^m_i=1 α^k_i = 1.

In order to prove (46), observe that, from Steps 2-3 of Algorithm 5.6 and the properties of α^k,

where the last inequality holds in view of Lemma 5.1 for x = x^k+1. The function

is convex, so its minimal value is attained where its gradient vanishes, i.e., at

So

The proof follows from this equality and (47).

Let us now see that, for any sequence defined by Algorithm 5.6, full Newton steps will be eventually taken at Step 4.

Theorem 5.11. Let x⁰ ∈ ℝⁿ be in a compact level set of F and {x^k} a sequence generated by Algorithm 5.6. Then {x^k} converges to a Pareto optimum point x^* ∈ ℝⁿ. Moreover, t_k = 1 for k large enough and the convergence of {x^k} to x^* is superlinear.

Proof. Convergence of {x^k} to a Pareto optimum x^* was established in Theorem 5.9. By Lemma 3.5 and Proposition 5.2, 3, we know that x^* is stationary and

Since F is twice continuously differentiable, for any ε > 0 there exists δ > 0 such that

where B(x^*, δ) stands for the open ball with center in x^* and radius δ. As {x^k} converges to x^*, using (48) and the continuity of s(·) (Proposition 5.5), we conclude that {s^k} converges to 0. Therefore there exists k_ε such that for k ≥ k_ε we have

and using (49) for estimating the integral remainder on Taylor's second order expansion of F_i at x^k, we conclude that for any i

So, using (33), (35), Corollary 5.4 and the fact that σ < 1, we obtain

The above inequality shows that if ε < (1 - σ)ρ, then, at Step 4, t_k = 1 is accepted for k ≥ k_ε. Suppose that

Using (45) and (49) for estimating the integral remainder on Taylor's first order expansion of ∑^m_i=1 αⁱ_k ∇F_i at x^k, in x^k +s^k, we conclude that for any k ≥ k_ε it holds

which, combined with (46), shows that

Using the above inequality and Corollary 5.4, we conclude that for k ≥ k_ε we have

Therefore, if k ≥ k_ε and j ≥ 1 then

To prove superlinear convergence, take 0 < ξ < 1 and define

If ε < ε^* and k ≥ k_ε, using (50) and the convergence of {x^k} to x^*, we get

Hence,

Combining the above two inequalities, we conclude that, if ε < ε^* and k ≥ k_ε, then

Since ξ was arbitrary in (0, 1), the above quotient tends to zero and the proof is complete.

Recall that in the classical Newton method for minimizing a scalar convex twice continuously differentiable function, the proof of quadratic convergence requires Lipschitz continuity of the objective function's second derivative. Likewise, under the assumption of Lipschitz continuity of D²F, we can see that the MNM also converges quadratically.

Theorem 5.12.Let x⁰ ∈ ℝⁿ be in a compact level set of F, {x^k} a sequence generated by Algorithm 5.6 and D²F Lipschitz continuous on ℝⁿ. Then {x^k} converges quadratically to a Pareto optimal solution x^* ∈ ℝⁿ.

Proof. By Theorem 5.11, {x^k} converges superlinearly to a Pareto optimal point x^* and t_k = 1 for k large enough. Let D²F. Then, for α^k ∈ ℝ^m as in Lemma 5.10, we have that ∑^m_i=1 α^k_i∇²F_i is also k large enough, be the Lipschitz constant for -Lipschitz continuous. Therefore, for

where the last inequality follows from the second equality, (45) and Taylor's development of ∑^m_i=1 α^k_i∇F_i at x^k, in x^k + s^k. Using the above inequality and (46), we obtain

which, combined with Corollary 5.4 yields

Take ξ ∈ (0, 1). Since {x^k} converges superlinearly to x^*, there exists k₀ such that for k ≥ k₀ we have (51), (52) and

Therefore, applying the triangle inequality to ||x^* - x^ℓ|| and then to ||x^ℓ+1 - x^ℓ||, for ℓ ≥ k₀ we obtain

Whence, using the above rightmost inequality for ℓ = k ≥ k₀ and the second equality in (51), we have

while using the first inequality in (53) for ℓ = k + 1 and the second equality in (51) (for k + 1 instead of k), we obtain

Quadratic convergence of {x^k} to x^* follows from the two inequalities above and (52).

6 FINAL REMARKS

Let us make some final comments. First, it is worth to notice that, by implementing the vector optimization versions of the steepest descent method ^[²⁸[28] GRAÑA DRUMMOND LM & SVAITER BF. 2005. A steepest descent method for vector optimization. Journal of Computational and Applied Mathematics, 175(2):395–414.^] and the projected gradient method ^[²¹[21] FUKUDA EH & GRAÑA DRUMMOND LM. 2011. On the convergence of the projected gradient method for vector optimization. Optimization, 60(8-9):1009–1021.^,²²[22] FUKUDA EH & GRAÑA DRUMMOND LM. 2013. Inexact projected gradient method for vector optimization. Computational Optimization and Applications, 54(3): 473–493.^,²⁵[25] GRAÑA DRUMMOND LM & IUSEM AN. 2004. A projected gradient method for vector optimization problems. Computational Optimization and Applications, 28(1): 5–29.^], we can guarantee convergence to Pareto optima instead of just weak Pareto points. Indeed, it suffices to take an underlying ordering (closed convex) cone ^m₊\{0} contained in the interior of ^[²¹[21] FUKUDA EH & GRAÑA DRUMMOND LM. 2011. On the convergence of the projected gradient method for vector optimization. Optimization, 60(8-9):1009–1021.^], Section 6]. with ℝ and such that a -version of Assumption 3.6 (Assumption 4.6) in the unconstrained (constrained) case is satisfied [

Without using this technique, in some cases, convergence of these two methods to (strong) Pareto optima can also be obtained. For instance, if the objective function is strictly ℝ^m₊-convex, then, under the assumptions of the convergence theorems, this will actually happen. Indeed, as we saw in the proof of Theorem 3.11 (Theorem 4.7), sequences generated by the MSDM (MPGM) converge to stationary points, which, in this case, are Pareto optima (Lemma 5.1, 1).

We also point out that, with these (and other) descent vector-valued methods, we are just concerned with finding a single optimum and not all the points in the optimal set. Nevertheless, from a numerical point of view, we can expect to somehow approximate the solution set by just performing any of these methods for different initial points (e.g., ^[¹⁸[18] FLIEGE J, GRAÑA DRUMMOND LM & SVAITER BF. 2009. Newton's method for multiobjective optimization. SIAM Journal on Optimization, 20(2):602–626.^], Section 8). In the so-called weighting method, this kind of idea also appears. To be more precise, in such a case, one weighting vector possibly gives rise to a Pareto optimal point, which means that we should perform the method for different weights in order to find the Pareto frontier. However, in some cases, as shown in [^[¹⁸[18] FLIEGE J, GRAÑA DRUMMOND LM & SVAITER BF. 2009. Newton's method for multiobjective optimization. SIAM Journal on Optimization, 20(2):602–626.^], Section 7], almost any choices of the weighting vectors may lead to unbounded problems. With arbitrary choices of initial points, we do not have this disadvantage in the descent vector-valued methods discussed here. We believe this is one of the main motivations for the study of such methods.

ACKNOWLEDGMENTS

This work was supported by FAPESP (grant 2010/20572-0) and by FAPERJ/CNPq through PRONEX-Optimization. We are also grateful to Benar F. Svaiter for his suggestions and to an anonymous referee for his helpful comments

REFERENCES

^[1]
BELLO CRUZ JY, LUCAMBIO PÉREZ LR & MELO JG. 2011. Convergence of the projected gradient method for quasiconvex multiobjective optimization. Nonlinear Analysis, 74:5268–5273.
^[2]
BENTO GC, FERREIRA OP & OLIVEIRA PR. 2012. Unconstrained steepest descent method for multicriteria optimization on Riemannian manifolds. Journal of Optimization Theory and Applications, 154(1):88–107.
^[3]
BERTSEKAS DP. 2003. Convex Analysis and Optimization Athena Scientific, Belmont, MA.
^[4]
BONNEL H, IUSEM AN & SVAITER BF. 2005. Proximal methods in vector optimization. SIAM Journal on Optimization, 15:953–970.
^[5]
BRANKE J, DEB K, MIETTINEN K & SLOWINSKI R, EDITORS. 2006. Practical Approaches to Multi-Objective Optimization Dagstuhl Seminar, 2006. Seminar 06501. Forthcoming.
^[6]
BRANKE J, DEB K, MIETTINEN K & STEUER RE, EDITORS. 2004. Practical Approaches to Multi-Objective Optimization Dagstuhl Seminar, 2004. Seminar 04461. Electronically available at http://drops.dagstuhl.de/portals/index.php?semnr=04461.
» http://drops.dagstuhl.de/portals/index.php?semnr=04461
^[7]
BURACHIK R, GRAÑA DRUMMOND LM, IUSEM AN & SVAITER BF. 1995. Full convergence of the steepest descentmethod with inexact line searches. Optimization, 32(2):137–146.
^[8]
CARRIZO GA & MACIEL C. 2013. Trust region globalization strategy for the nonconvex unconstrained multiobjective optimization problem: Global convergence analysis. Submitted for publication
^[9]
CARRIZO GA & MACIEL C. 2013. Trust region globalization strategy for the nonconvex unconstrained multiobjective optimization problem: Local convergence. Submitted for publication
^[10]
CARRIZOSA E & FRENK JBG. 1998. Dominating sets for convex functions with some applications. Journal of Optimization Theory and Applications, 96(2):281–295.
^[11]
DAS I & DENNIS JE. 1997. A closer look at drawbacks of minimizing weighted sums of objectives for pareto set generation in multicriteria optimization problems. Structural Optimization, 14(1):63–69.
^[12]
DAS I & DENNIS JE. 1998. Normal-boundary intersection: a new method for generating the pareto surface in nonlinear multicriteria optimization problems. SIAM Journal on Optimization, 8(3):631–657.
^[13]
EICHFELDER G. 2009. An adaptive scalarization method in multiobjective optimization. SIAM Journal on Optimization, 19(4):1694–1718.
^[14]
ESCHENAUER H, KOSKI J & OSYCZKA A. 1990. Multicriteria Design Optimization: Procedures and Applications Springer-Verlag, Berlin.
^[15]
EVANS GW. 1984. An overview of techniques for solving multiobjective mathematical programs. Management Science, 30(11):1268–1282.
^[16]
FLIEGE J. 2001. OLAF – A general modeling system to evaluate and optimize the location of an air polluting facility. OR Spektrum, 23:117–136.
^[17]
FLIEGE J. 2006. An efficient interior-point method for convex multicriteria optimization problems. Mathematics of Operations Research, 31(4):825–845.
^[18]
FLIEGE J, GRAÑA DRUMMOND LM & SVAITER BF. 2009. Newton's method for multiobjective optimization. SIAM Journal on Optimization, 20(2):602–626.
^[19]
FLIEGE J & SVAITER BF. 2000. Steepest descent methods for multicriteria optimization. Mathematical Methods of Operations Research, 51(3):479–494.
^[20]
FU Y & DIWEKAR UM. 2004. An efficient sampling approach to multiobjective optimization. Annals of Operations Research, 132(1-4):109–134.
^[21]
FUKUDA EH & GRAÑA DRUMMOND LM. 2011. On the convergence of the projected gradient method for vector optimization. Optimization, 60(8-9):1009–1021.
^[22]
FUKUDA EH & GRAÑA DRUMMOND LM. 2013. Inexact projected gradient method for vector optimization. Computational Optimization and Applications, 54(3): 473–493.
^[23]
GASS S & SAATY T. 1955. The computational algorithm for the parametric objective function. Naval Research Logistics Quarterly, 2(1-2):39–45, 1955.
^[24]
GEOFFRION AM. 1968. Proper efficiency and the theory of vector maximization. Journal of Mathematical Analysis and Applications, 22:618–630.
^[25]
GRAÑA DRUMMOND LM & IUSEM AN. 2004. A projected gradient method for vector optimization problems. Computational Optimization and Applications, 28(1): 5–29.
^[26]
GRAÑA DRUMMOND LM, MACULAN N & SVAITER BF. 2008. On the choice of parameters for the weighting method in vector optimization. Mathematical Programming, 111(1-2).
^[27]
GRAÑA DRUMMOND LM, RAUPP FMP & SVAITER BF. 2011. A quadratically convergent Newton method for vector optimization. Optimization, 2011. Accepted for publication.
^[28]
GRAÑA DRUMMOND LM & SVAITER BF. 2005. A steepest descent method for vector optimization. Journal of Computational and Applied Mathematics, 175(2):395–414.
^[29]
JAHN J. 1984. Scalarization in vector optimization. Mathematical Programming, 29:203–218.
^[30]
JAHN J. 2003. Vector Optimization – Theory, Applications and Extensions. Springer, Erlangen.
^[31]
JÜSCHKE A, JAHN J & KIRSCH A. 1997. A bicriterial optimization problem of antenna design. Computational Optimization and Applications, 7(3):261–276.
^[32]
KIM IY & WECK OL. 2005. Adaptive weighted-sum method for bi-objective optimization: Pareto front generation. Structured and Multidisciplinary Optimization, 29(2):149–158.
^[33]
LAUMANNS M, THIELE L, DEB K & ZITZLER E. 2002. Combining convergence and diversity in evolutionary multiobjective optimization. Evolutionary Computation, 10:263–282.
^[34]
LESCHINE TM, WALLENIUS H & VERDINI WA. 1992. Interactive multiobjective analysis and assimilative capacity-based ocean disposal decisions. European Journal of Operational Research, 56:278–289.
^[35]
LUC DT. 1989. Theory of vector optimization. In Lecture Notes in Economics and Mathematical Systems, 319, Berlin, Springer.
^[36]
MIETTINEN K & MÄKELÄ MM. 1995. Interactive bundle-based method for nondifferentiable multiobjective optimization: Nimbus. Optimization, 34:231–246.
^[37]
MIETTINEN KM. 1999. Nonlinear Multiobjective Optimization Kluwer Academic Publishers, Norwell.
^[38]
MOSTAGHIM S, BRANKE J & SCHMECK H. 2006. Multi-objective particle swarm optimization on computer grids. Technical report. 502, Institut AIFB, University of Karlsruhe (TH), December.
^[39]
PALERMO G, SILVANO C, VALSECCHI S & ZACCARIA V. 2003. A system-level methodology for fast multi-objective design space exploration ACM, New York.
^[40]
PRABUDDHA D, GHOSH JB & WELLS CE. 1992. On the minimization of completion time variance with a bicriteria extension. Operations Research, 40(6):1148–1155.
^[41]
TAVANA M. 2004. A subjective assessment of alternative mission architectures for the human exploration of mars at NASA using multicriteria decision making. Computational and Operations Research, 31:1147–1164.
^[42]
WHITE DJ. 1998. Epsilon-dominating solutions in mean-variance portfolio analysis. European Journal on Operations Research, 105:457–466.
^[43]
ZADEH L. 1963. Optimality and non-scalar-valued performance criteria. IEEE Transactions on Automatic Control, 8(1):59–60.
^[44]
ZITZLER E, DEB K & THIELE L. 2000. Comparison of multiobjective evolutionary algorithms: empirical results. Evolutionary Computation, 8:173–195.

Publication Dates

Publication in this collection
Sep-Dec 2014

History

Received
28 Sept 2013
Accepted
02 Jan 2014

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

[1] ^[1]
BELLO CRUZ JY, LUCAMBIO PÉREZ LR & MELO JG. 2011. Convergence of the projected gradient method for quasiconvex multiobjective optimization. Nonlinear Analysis, 74:5268–5273.

[2] ^[2]
BENTO GC, FERREIRA OP & OLIVEIRA PR. 2012. Unconstrained steepest descent method for multicriteria optimization on Riemannian manifolds. Journal of Optimization Theory and Applications, 154(1):88–107.

[3] ^[3]
BERTSEKAS DP. 2003. Convex Analysis and Optimization Athena Scientific, Belmont, MA.

[4] ^[4]
BONNEL H, IUSEM AN & SVAITER BF. 2005. Proximal methods in vector optimization. SIAM Journal on Optimization, 15:953–970.

[5] ^[5]
BRANKE J, DEB K, MIETTINEN K & SLOWINSKI R, EDITORS. 2006. Practical Approaches to Multi-Objective Optimization Dagstuhl Seminar, 2006. Seminar 06501. Forthcoming.

[6] ^[6]
BRANKE J, DEB K, MIETTINEN K & STEUER RE, EDITORS. 2004. Practical Approaches to Multi-Objective Optimization Dagstuhl Seminar, 2004. Seminar 04461. Electronically available at http://drops.dagstuhl.de/portals/index.php?semnr=04461.
» http://drops.dagstuhl.de/portals/index.php?semnr=04461

[7] ^[7]
BURACHIK R, GRAÑA DRUMMOND LM, IUSEM AN & SVAITER BF. 1995. Full convergence of the steepest descentmethod with inexact line searches. Optimization, 32(2):137–146.

[8] ^[8]
CARRIZO GA & MACIEL C. 2013. Trust region globalization strategy for the nonconvex unconstrained multiobjective optimization problem: Global convergence analysis. Submitted for publication

[9] ^[9]
CARRIZO GA & MACIEL C. 2013. Trust region globalization strategy for the nonconvex unconstrained multiobjective optimization problem: Local convergence. Submitted for publication

[10] ^[10]
CARRIZOSA E & FRENK JBG. 1998. Dominating sets for convex functions with some applications. Journal of Optimization Theory and Applications, 96(2):281–295.

[11] ^[11]
DAS I & DENNIS JE. 1997. A closer look at drawbacks of minimizing weighted sums of objectives for pareto set generation in multicriteria optimization problems. Structural Optimization, 14(1):63–69.

[12] ^[12]
DAS I & DENNIS JE. 1998. Normal-boundary intersection: a new method for generating the pareto surface in nonlinear multicriteria optimization problems. SIAM Journal on Optimization, 8(3):631–657.

[13] ^[13]
EICHFELDER G. 2009. An adaptive scalarization method in multiobjective optimization. SIAM Journal on Optimization, 19(4):1694–1718.

[14] ^[14]
ESCHENAUER H, KOSKI J & OSYCZKA A. 1990. Multicriteria Design Optimization: Procedures and Applications Springer-Verlag, Berlin.

[15] ^[15]
EVANS GW. 1984. An overview of techniques for solving multiobjective mathematical programs. Management Science, 30(11):1268–1282.

[16] ^[16]
FLIEGE J. 2001. OLAF – A general modeling system to evaluate and optimize the location of an air polluting facility. OR Spektrum, 23:117–136.

[17] ^[17]
FLIEGE J. 2006. An efficient interior-point method for convex multicriteria optimization problems. Mathematics of Operations Research, 31(4):825–845.

[18] ^[18]
FLIEGE J, GRAÑA DRUMMOND LM & SVAITER BF. 2009. Newton's method for multiobjective optimization. SIAM Journal on Optimization, 20(2):602–626.

[19] ^[19]
FLIEGE J & SVAITER BF. 2000. Steepest descent methods for multicriteria optimization. Mathematical Methods of Operations Research, 51(3):479–494.

[20] ^[20]
FU Y & DIWEKAR UM. 2004. An efficient sampling approach to multiobjective optimization. Annals of Operations Research, 132(1-4):109–134.

[21] ^[21]
FUKUDA EH & GRAÑA DRUMMOND LM. 2011. On the convergence of the projected gradient method for vector optimization. Optimization, 60(8-9):1009–1021.

[22] ^[22]
FUKUDA EH & GRAÑA DRUMMOND LM. 2013. Inexact projected gradient method for vector optimization. Computational Optimization and Applications, 54(3): 473–493.

[23] ^[23]
GASS S & SAATY T. 1955. The computational algorithm for the parametric objective function. Naval Research Logistics Quarterly, 2(1-2):39–45, 1955.

[24] ^[24]
GEOFFRION AM. 1968. Proper efficiency and the theory of vector maximization. Journal of Mathematical Analysis and Applications, 22:618–630.

[25] ^[25]
GRAÑA DRUMMOND LM & IUSEM AN. 2004. A projected gradient method for vector optimization problems. Computational Optimization and Applications, 28(1): 5–29.

[26] ^[26]
GRAÑA DRUMMOND LM, MACULAN N & SVAITER BF. 2008. On the choice of parameters for the weighting method in vector optimization. Mathematical Programming, 111(1-2).

[27] ^[27]
GRAÑA DRUMMOND LM, RAUPP FMP & SVAITER BF. 2011. A quadratically convergent Newton method for vector optimization. Optimization, 2011. Accepted for publication.

[28] ^[28]
GRAÑA DRUMMOND LM & SVAITER BF. 2005. A steepest descent method for vector optimization. Journal of Computational and Applied Mathematics, 175(2):395–414.

[29] ^[29]
JAHN J. 1984. Scalarization in vector optimization. Mathematical Programming, 29:203–218.

[30] ^[30]
JAHN J. 2003. Vector Optimization – Theory, Applications and Extensions. Springer, Erlangen.

[31] ^[31]
JÜSCHKE A, JAHN J & KIRSCH A. 1997. A bicriterial optimization problem of antenna design. Computational Optimization and Applications, 7(3):261–276.

[32] ^[32]
KIM IY & WECK OL. 2005. Adaptive weighted-sum method for bi-objective optimization: Pareto front generation. Structured and Multidisciplinary Optimization, 29(2):149–158.

[33] ^[33]
LAUMANNS M, THIELE L, DEB K & ZITZLER E. 2002. Combining convergence and diversity in evolutionary multiobjective optimization. Evolutionary Computation, 10:263–282.

[34] ^[34]
LESCHINE TM, WALLENIUS H & VERDINI WA. 1992. Interactive multiobjective analysis and assimilative capacity-based ocean disposal decisions. European Journal of Operational Research, 56:278–289.

[35] ^[35]
LUC DT. 1989. Theory of vector optimization. In Lecture Notes in Economics and Mathematical Systems, 319, Berlin, Springer.

[36] ^[36]
MIETTINEN K & MÄKELÄ MM. 1995. Interactive bundle-based method for nondifferentiable multiobjective optimization: Nimbus. Optimization, 34:231–246.

[37] ^[37]
MIETTINEN KM. 1999. Nonlinear Multiobjective Optimization Kluwer Academic Publishers, Norwell.

[38] ^[38]
MOSTAGHIM S, BRANKE J & SCHMECK H. 2006. Multi-objective particle swarm optimization on computer grids. Technical report. 502, Institut AIFB, University of Karlsruhe (TH), December.

[39] ^[39]
PALERMO G, SILVANO C, VALSECCHI S & ZACCARIA V. 2003. A system-level methodology for fast multi-objective design space exploration ACM, New York.

[40] ^[40]
PRABUDDHA D, GHOSH JB & WELLS CE. 1992. On the minimization of completion time variance with a bicriteria extension. Operations Research, 40(6):1148–1155.

[41] ^[41]
TAVANA M. 2004. A subjective assessment of alternative mission architectures for the human exploration of mars at NASA using multicriteria decision making. Computational and Operations Research, 31:1147–1164.

[42] ^[42]
WHITE DJ. 1998. Epsilon-dominating solutions in mean-variance portfolio analysis. European Journal on Operations Research, 105:457–466.

[43] ^[43]
ZADEH L. 1963. Optimality and non-scalar-valued performance criteria. IEEE Transactions on Automatic Control, 8(1):59–60.

[44] ^[44]
ZITZLER E, DEB K & THIELE L. 2000. Comparison of multiobjective evolutionary algorithms: empirical results. Evolutionary Computation, 8:173–195.

Brasil

Brasil

A SURVEY ON MULTIOBJECTIVE DESCENT METHODS

Abstract

1 INTRODUCTION

2 PRELIMINARIES

3 THE MULTIOBJECTIVE STEEPEST DESCENT METHOD

3.1 The search direction for the multiobjective steepest descent method

3.2 The multiobjective steepest descent algorithm

3.3 Convergence analysis of the MSDM: the general case

3.4 Full convergence of the MSDM:the ℝ^m₊-convex case

3.5 The MSDM with relative errors

4 THE MULTIOBJECTIVE PROJECTED GRADIENT METHOD

4.1 The search direction for the multiobjective projected gradient method

4.2 The multiobjective projected gradient algorithm

4.3 Convergence analysis of the MPGM: the general case

4.4 Full convergence of the MPGM: the ℝ^m₊-convex case

4.5 The MPGM with relative errors

5 THE MULTIOBJECTIVE NEWTON METHOD

5.1 The search direction for the multiobjective Newton method

5.2 The multiobjective Newton algorithm

5.3 Convergence analysis of the MNM

5.4 Convergence rate of the MNM

6 FINAL REMARKS

ACKNOWLEDGMENTS

REFERENCES

Publication Dates

History

Brasil

Brasil

A SURVEY ON MULTIOBJECTIVE DESCENT METHODS

Abstract

1 INTRODUCTION

2 PRELIMINARIES

3 THE MULTIOBJECTIVE STEEPEST DESCENT METHOD

3.1 The search direction for the multiobjective steepest descent method

3.2 The multiobjective steepest descent algorithm

3.3 Convergence analysis of the MSDM: the general case

3.4 Full convergence of the MSDM:the ℝm+-convex case

3.5 The MSDM with relative errors

4 THE MULTIOBJECTIVE PROJECTED GRADIENT METHOD

4.1 The search direction for the multiobjective projected gradient method

4.2 The multiobjective projected gradient algorithm

4.3 Convergence analysis of the MPGM: the general case

4.4 Full convergence of the MPGM: the ℝm+-convex case

4.5 The MPGM with relative errors

5 THE MULTIOBJECTIVE NEWTON METHOD

5.1 The search direction for the multiobjective Newton method

5.2 The multiobjective Newton algorithm

5.3 Convergence analysis of the MNM

5.4 Convergence rate of the MNM

6 FINAL REMARKS

ACKNOWLEDGMENTS

REFERENCES

Publication Dates

History

3.4 Full convergence of the MSDM:the ℝ^m₊-convex case

4.4 Full convergence of the MPGM: the ℝ^m₊-convex case