1 INTRODUCTION
In multiobjective (or multicriteria) optimization, finitely many objective functions have to be minimized simultaneously. Hardly ever a single point will minimize all of them at once, so we need another notion of optimality. Here, we will use the concepts of Pareto and weak Pareto optimality. A point is called Pareto optimal or efficient, if there does not exist a different point with the same or smaller objective function values, such that there is a strict decrease in at least one objective function value. A point is called weakly Pareto optimal or weakly efficient if there does not exist a different point with strict decrease in all its objective values. Applications for this type of problem can be found in engineering ^{[}^{14}^{]} (especially truss optimization ^{[}^{12}^{,}^{32}^{]}, design ^{[}^{20}^{,}^{31}^{]}, space exploration ^{[}^{39}^{,}^{41}^{]}, statistics ^{[}^{10}^{]}, management science ^{[}^{15}^{,}^{40}^{,}^{42}^{]}, environmental analysis ^{[}^{34}^{,}^{16}^{]}, etc.
One of the main solution strategies for multiobjective optimization problems is the scalarization approach ^{[}^{23}^{,}^{24}^{,}^{43}^{]}: we obtain the optimal solutions of the vectorvalued problem by solving one or several parametrized singleobjective (i.e., scalarvalued) optimization problems. Among the scalarization techniques, we have the socalled weighting method, where one minimizes a nonnegative linear combination of the objective functions ^{[}^{26}^{,}^{29}^{,}^{30}^{,}^{37}^{]}. The parameters are not known in advance and the modeler or decisionmaker has to choose them. For some problems, this choice can be problematic, as shown in the example provided in ^{[}^{18}^{]}, Section 7], where almost all choices of the parameters lead to unbounded scalar problems. Only recently, adaptive scalarization techniques, where scalarization parameters are chosen automatically during the course of the algorithm such that a certain quality of approximation is maintained, have been proposed ^{[}^{13}^{,}^{17}^{]}. Still other techniques, working only in the bicriteria case, can be viewed as choosing a fixed grid in a particular parameter space ^{[}^{11}^{,}^{12}^{]}.
Multicriteria optimization algorithms that do not scalarize have recently been developed (see, e.g., ^{[}^{5}^{,}^{6}^{]} for an overview on the subject). Some of these techniques are extensions of scalar optimization algorithms (notably the steepest descent algorithm ^{[}^{19}^{]} with at most linear convergence), while others borrow heavily from ideas developed in heuristic optimization ^{[}^{33}^{,}^{38}^{]}. For the latter, no convergence proofs are known, and empirical results show that convergence generally is, as in the scalar case, quite slow ^{[}^{44}^{]}. Other parameterfree multicriteria optimization techniques use an ordering of the different criteria, i.e., an ordering of importance of the components of the objective function vector. In this case, the ordering has to be prespecified. Moreover, the corresponding optimization process is usually augmented by an interactive procedure ^{[}^{36}^{]}.
Following the research line opened in 2001 with ^{[}^{19}^{]}, other classical (scalar) optimization procedures where extended in recent years, not just for the multicriteria setting, but also for vector optimization problems, i.e., when other underlying ordering cones are used instead of the nonnegative orthant. In ^{[}^{28}^{]}, a steepest descent method is proposed, while in ^{[}^{1}^{,}^{21}^{,}^{22}^{,}^{25}^{]}, several versions of the projected gradient method are studied. Proximal point type methods are analyzed in ^{[}^{4}^{]}. Trust region strategies for multicriteria were developed in ^{[}^{8}^{,}^{9}^{]}. Even a steepest descent method for multicriteria optimization on Riemannian manifolds was developed in ^{[}^{2}^{]}. In 2009, Fliege et al. ^{[}^{18}^{]} came up with a multiobjective version of the Newton method, while later on, Graña Drummond et al. ^{[}^{27}^{]} proposed a Newton method for vector optimization.
These extensions share an essential feature: they are all descent methods, i.e., the vector objective value decreases at each iteration in the partial order induced by the underlying cones. Under reasonable assumptions, full convergence of sequences produced by these algorithms is established in all those works. As expected, for the vectorvalued Newton methods, convergence is local and fast, while for the others it is global (and not so fast).
In this survey we study just three of these procedures: the steepest descent ^{[}^{19}^{,}^{28}^{]}, the projected gradient ^{[}^{21}^{,}^{22}^{,}^{25}^{]} and the Newton ^{[}^{18}^{,}^{27}^{]} methods. For the sake of simplicity, we just analyze their multiobjective versions and, in order to emphasize the underlying ideas and avoid technicalities, we do not present the results in their maximum degree of generality. For instance, we assume that the objectives are continuously differentiable in the whole space, while, sometimes, we just need local differentiability, etc.
The outline of this paper is as follows. In Section 2 we establish the notation, as well as the basic concepts, and define the (smooth) unconstrained multiobjective problem. We also state and prove a simple but important result on descent directions. In Section 3, we study the multiobjective steepest descent method. First, we introduce the search direction and analyze its properties. Then we describe the algorithm and, finally, we perform the convergence analysis of the method. Basically, we establish that, under convexity of the objectives and another natural assumption, any sequence produced by the method is globally convergent to a weak Pareto optimal point. In Section 4 we define the smooth constrained multicriteria problem, as well as the multiobjective projected gradient algorithm and we present its convergence analysis, which is quite similar to the previous one. In Section 5, for twice continuously differentiable strongly convex objectives, we define the multiobjective Newton method and, assuming similar conditions as those of the previous cases, we establish its convergence to Pareto optimal points with superlinear rate. Under the additional hypothesis of Lipschitz continuity of the objectives' second derivatives, we prove quadratic convergence. In Section 6 we make some comments on how to obtain (strong) Pareto optima with the first two algorithms and on how to approach the whole efficient frontier with the three methods.
2 PRELIMINARIES
All over this work, (·,·) will stand for the usual canonical inner product in ℝ^{p} (here, we will just deal with p = n or p = m), i.e., (u, v)= ∑^{p}_{i=1 }u_{i}v_{i} = u^{T}v. Likewise, · will always be the Euclidean norm, i.e., u = . For an m × n real matrix A, A will denote its operator (Euclidean) norm. As usual, ℝ_{+} and ℝ_{++} designate the sets of nonnegative and positive real numbers, respectively.
Let ℝ^{m}_{+} := ℝ_{+} ×···× ℝ_{+} and ℝ^{m}_{++} := ℝ_{++} ×···× ℝ_{++} be the nonnegative orthant or Paretian cone and the positive orthant of ℝ^{m}, respectively. Consider the partial order induced by ℝ^{m}_{+}: for u, v ∈ ℝ^{m}, u ≤ v(alternatively, v ≥ u)if v  u ∈ ℝ^{m}_{+}, as well as the following stronger relation: u < v (alternatively, v > u) if v  u ∈ ℝ^{m}_{++}. In other words, u ≤ v stands for u_{i} ≤ v_{i} for all i = 1,..., m, and u < v should also be understood as a (strict) componentwise inequality.
Let
be a continuously differentiable mapping. Our problem consists of finding Pareto or weak Pareto points for F in ℝ^{n} and we note it by
Recall that x^{*} ∈ ℝ^{n} is a Pareto optimal (or efficient) point for F, if
there is no x ∈ ℝ^{n} such that F(x) ≤ F(x^{*}) and F(x) ≠ F(x^{*}),
that is to say, there does not exist x ∈ ℝ^{n} such that Fi (x) ≤ Fi (x^{*}) for all i = 1, ..., m and F_{i0}(x) < F_{i0}(x^{*}) for at least one i_{0}. A point x^{*} ∈ ℝ^{n} is a weakly Pareto optimal (or weakly efficient) point for F, if
there is no x ∈ ℝ^{n} such that F(x) < F(x^{*}),
i.e., there does not exist x ∈ ℝ^{n} such that F_{i}(x) < F_{i} (x^{*}) for all i = 1, ..., m. Clearly, if x^{*} ∈ ℝ^{n} is a Pareto optimal point, then x^{*} is a weak Pareto point, and the converse is not always true.
As wee will see in our first lemma, a necessary condition for optimality of a point x ∈ C is stationarity (or criticality):
where J F (x) stands for the Jacobian matrix of F at x (the m×n matrix with entries (J F(x)_{i,j} = ∂F_{i}(x)/∂x_{j}), J F(x)(ℝ^{n}) := {J F(x)v: v ∈ ℝ^{n}} and ℝ^{m}_{++} := {u : u ∈ ℝ^{m}_{++}}. So x ∈ ℝ^{n} is stationary for F if, and only if, for all v ∈ ℝ^{n} we have J F(x), that is to say for all v there exists j = j (v)such that
and this condition implies
Note that for m = 1, we retrieve the classical stationarity condition for unconstrained scalarvalued optimization: ∇ F(x) = 0.
By definition, a point x ∈ ℝ^{n} is nonstationary if there exists a direction v ∈ ℝ^{n} such that J F(x)v < 0. The next result shows not only that such v is a ℝ^{m}_{+}descent direction for F at x (there exists ε > 0 such that F(x + tv) < F(x) for all t ∈ (0,ε]), but it also estimates the ℝ^{m}_{+} function decrease. As we will see later on, this result allows us to implement, by a backtracking procedure, an Armijolike rule on all the multiobjective (descent) methods we will study here.
Lemma 2.1. Let F : ℝ^{n} → ℝ^{m} be continuously differentiable, v ∈ ℝ^{n} such that J F(x)v < 0 and σ ∈ (0,1). Then there exists some ε > 0 (which may depend on x, v and σ) such that
In particular, v is a descent direction for F at x.
Proof. Since F is differentiable, ∇ F_{i}(x)^{T}v < 0 and σ ∈ (0,1),we have
So there exists ε > 0 such that
3 THE MULTIOBJECTIVE STEEPEST DESCENT METHOD
In this section we propose an extension of the classical steepest descent method. First we state and prove some results that will allow us to define the algorithm. Then we present its convergence analysis and, finally, we briefly expose a numerically robust version of the method which admits relative errors on computing the search directions at each iteration.
3.1 The search direction for the multiobjective steepest descent method
For a given point x ∈ ℝ^{n}, define the function φ_{x} : ℝ^{n} → ℝ by
As being the maximum of linear functions, φ_{x} is convex and positively homogeneous (φ_{x} (λv) = λφ_{x} (v) for all λ > 0 and all v ∈ ℝ^{n}). Moreover, since u → max_{i=1}, ..., m u_{i} is Lipschitz continuous with constant 1, we also have
for all v, w ∈ ℝ^{n}. Consider now the unconstrained scalarvalued minimization problem
Since the objective function is proper, closed and strongly convex, this problem has always a (unique) optimal solution v(x), which we call steepest descent direction, and is therefore given by
Let us call θ(x) the optimal value of (6), that is to say,
If m = 1, we fall back to scalar minimization and φ_{x}(v) = ∇F(x)^{T} v,so v(x) = ∇ F(x) and we retrieve the classical steepest descent direction at x ∈ ℝ^{n}.
Note that, in order to get rid of the possible nondifferentiability of (6), we can reformulate the problem as
a convex problem with variables (τ, v) ∈ ℝ × ℝ^{n} for which there are efficient numerical solvers.
Let us now give a characterization of stationarity in terms of θ(·) and v(·) and study some features of these functions.
Proposition 3.1. Let v: ℝ^{n} → ℝ^{n} and θ : ℝ^{n} → ℝ given by (7) and (8), respectively. Then, the following statements hold.
1. θ(x) ≤ 0 for all x ∈ ℝ^{n}.
2. The function θ(·) is continuous.

3. The following conditions are equivalent:
Proof. 1. From (7) and (8), for any x ∈ ℝ^{n}, we have θ(x)
2. Let us proof the continuity of θ in an arbitrary but fixed xº ∈ ℝ^{n}. First let us show that the function x → v(x) is bounded on compact subsets of ℝ^{n}. Let C ⊂ ℝ^{n} be compact. Note that for all x ∈ C and all i = 1, ..., m, we have
where we used the CauchySchwarz inequality, the definitions of φ_{x} and θ, as well as item 1. Therefore, since F is continuously differentiable, we conclude that there exists κ = κ(C) > 0 such that v(x) ≤ κ for all x ∈ C. In particular, there exists κ > 0 such that
where B(xº, 1) is the closed ball with center at xº and radius 1.
Now, for x ∈ B (xº, 1) and i ∈ {1, ..., m}, define
by
From (9), the family is equicontinuous. Consider now the family , where Φ_{x} = max_{i=1, ..., m ϕ x,i}. Since u → max_{i} u_{i} is Lipschitz continuous with constant 1, is also an equicontinuous family of functions. Then for any ε > 0 and all x ∈ B (xº, 1), there exists 0 < δ < 1 such that
Hence for z  xº < δ, from the definitions of functions θ, v, Φ_{x0} and the fact that Φ_{x0}(xº) = θ(xº), we have
so θ(z)  θ(xº) < ε. Interchanging the roles of z and xº, we conclude that, for z  xº < δ,
3. (a) ⇒ (b). Suppose that x is nonstationary. Then, there exists v ∈ ℝ^{n} such that φ_{x} (v) = max_{i} ∇F(x)^{T}v < 0. Therefore, from the optimality of v(x) and the fact that φ_{x} is positively homogeneous,
for all τ > 0. Taking 0 < τ < 2φx (v)/v^{2}, we see that θ(x) < 0.
(b) ⇒ (c). Since, by item 1, θ(x) ≤ 0, it is enough to notice that v(x) = 0 implies θ(x) = 0.
(c) ⇒ (a). If v(x) ≠ 0, then max_{i} ∇F_{i}(x)^{T}v(x) = φ_{x} (v(x)) < θ(x) ≤ 0, so J F(x)v(x) ∈ J F(x)(ℝ^{n} ) ∩[ℝ^{m}_{++}] and x is nonstationary.
Note that for a nonstationary point x, using the above characterization, we have θ(x) < 0 and so, from (8), we get φ_{x} (v(x)) < < 0, which in turn implies that J F(x)v(x) > 0. Whence, by Lemma 2.1, there exists ε > 0 such that
and, in particular, v(x) is a descent direction, i.e., F(x + t v(x)) < F(x) ∀t ∈ (0,ε].
Finally, observe that, while proving Proposition 3.1, we saw that v: ℝ^{n} → ℝ^{n}, defined by (7), is bounded on compacts sets of ℝ^{n}. Actually, it can be seen something stronger, namely that v(·) is a continuous function ^{[}^{28}^{]}, Lemma 3.3].
3.2 The multiobjective steepest descent algorithm
We are now in conditions to define the extension of the classical steepest descent method for the unconstrained multiobjective optimization problem (1).
Algorithm 3.2. The multiobjective steepest descent method (MSDM)
1. Choose σ ∈ (0, 1) and x^{0} ∈ ℝ^{n}. Set k := 0.
3. Compute θ(x^{k}) = = 0, then stop.
4. Choose t_{k} as the largest t ∈ {1/2^{j}: j = 0, 1, 2,...} such that
5. Set x^{k+1} := x^{k} + t_{k} v^{k}, k := k + 1 and go to Step 2.
Some comments are in order. Since is strongly convex, v^{k} exists and is unique. Note that, by Proposition 3.1, the stopping rule θ(x^{k}) = 0 can be replaced by v^{k} = 0. Also note that, if at iteration k the algorithm does not reach Step 4, i.e., if it stops at Step 3, then, by Proposition 3.1, x^{k} is a stationary point for F.
If Step 4 is reached at iteration k, by our comments after Proposition 3.1 and Lemma 2.1, the computation of the step length in Step 4 ends up in a finite number of half reductions. Observe that instead of the factor 1/2 in the reductions of Step 4 we could have used any other scalar ν ∈ (0, 1). If the algorithm does not stop, from Step 4 and the fact that J F(x^{k})v^{k} < 0, we have that the objective values sequence {F(x^{k})} is ℝ^{m}_{+}decreasing, i.e.,
If the algorithm stops at iteration k_{0}, the above inequality holds for k = 0, 1, ..., k_{0}  1.
3.3 Convergence analysis of the MSDM: the general case
Since Algorithm 3.2 ends up with a stationary point or produces infinitely many iterates, from now on, we assume that an infinite sequence {x^{k}} of nonstationary points is generated. We begin presenting a simple application of the standard convergence argument for the scalarvalued steepest descent method.
Theorem 3.3. Let {x^{k}} be a sequence produced by Algorithm 3.2. Every accumulation point of {x^{k}}, if any, is a stationary point.
Proof. Let x be an accumulation point of the sequence {x^{k}}. Consider v (x) and θ (x), given by (7) and (8), respectively, i.e., the solution and the optimal value of (6) at x = (x). According to Proposition 3.1, it is enough to prove that θ (x)= 0.
We know that the sequence {F(x^{k})} is ℝ^{m}_{+}decreasing (i.e., componentwise strictly decreasing), so we have that
Therefore,
But, by Steps 25 of Algorithm 3.2,
and therefore
which, componentwise, can be written as
Considering that t_{k} ∈ (0,1] for all k, we have the following two possibilities:
or
First assume that (11) holds. Then there exists a subsequence {x^{kj}}_{j} converging to x and t > 0 such that lim_{j→∞ tk j} = t. So, using (10), we get
where the last equality is a consequence of Proposition 3.1, 2. So, by Proposition 3.1, 3, we conclude that θ(x) = 0, and x is stationary.
Now assume that (12) holds. Due to the the fact that v(·) is bounded on compacts, v = v(x^{k}) for all k and {x^{k}} has a convergent subsequence, it follows that the sequence {v^{k}} has also a bounded subsequence. Therefore there exists v ∈ ℝ^{n} and a subsequence {v^{k j}}j such that lim_{j→∞} v^{k j} = v and lim_{j→∞} t_{k j} = 0. Note that we have
so letting j → ∞, we get
Take now a fixed but arbitrary positive integer q. Since t_{k j} → 0, for j large enough, we have
which means that the Armijolike condition at x^{kj} in Step 4 of Algorithm 3.2 is not satisfied for t = 1/2^{q}, i.e.,
So for all j there exists i = i(k_{j}) ∈ {1,...,m} such that
Since {i(k _{j} )}_{j} ⊂ {1,...,m}, there exists a subsequence {k_{jℓ}}_{ℓ} and an index i_{0} such that i_{0} = i(k_{jℓ}) for all ℓ = 1, 2, ... and
Taking the limit ℓ → ∞ in the above inequality, we obtain
Since this inequality holds for any positive integer q and for i_{0} (depending on q), by Lemma 2.1 it follows that J F(x) v 0, so
which, together with (13), implies θ(x) = 0. Therefore, we conclude that x is a stationary point.
Assume now that x^{0} ∈ {x ∈ ℝ^{n} : F(x) ≤ F(x^{0})}, a bounded level set. Since {F(x^{k})}is ℝ^{m}_{+} decreasing, the whole sequence {x^{k}}is contained in the above set, so it is bounded and it has at least one accumulation point, which, by the previous theorem, is stationary for F.
3.4 Full convergence of the MSDM:the ℝ^{m}_{+}convex case
In order to characterize v(x), the optimal solution of (6), we begin this section by recalling a wellknown result on nonsmooth unconstrained optimization, which establishes optimality conditions for maxtype realvalued objective functions.
Proposition 3.4. Let h_{i} : ℝ^{n} → ℝ be a differentiable function for i = 1, ..., m. If ∈ ℝ^{n} is an unconstrained minimizer of max_{i=1}, ..., m h_{i} (x), then there exists α = α () ∈ ℝ^{m}, with α_{i} ≥ 0 for all i and ∑^{m}_{i=1}, α = 1, such that, for := max_{i} h_{i} () , it holds
If h_{i} is convex for all i, the above conditions are also sufficient for the optimality of
Proof. First note that the problem min_{x∈Rn} max_{i=1,...,m} h_{i} (x) can be reformulated as
with optimal solution (, ) ∈ ℝ × ℝ^{n}. Since the Lagrangian of this problem is given by L((τ,x),α) := τ + ∑^{m}_{i=11} α_{i}(h_{i}(x)  τ), its KarushKuhnTucker (KKT) conditions at (, ) are just those stated in this proposition. Note now that (14) has a Slater point (e. g.,(, 0) ∈ ℝ × ℝ^{n}, where > h_{i}(0) for all i). So there exists a vector of KKT multipliers α = α () such that those conditions are satisfied.
For the converse, just note that as (14) is a convex problem on the variables (τ, x),the KKT conditions are sufficient for the optimality of (, ).
By (6)(7), v(x) = argmin_{v} max_{i} ∇F_{i}(x)^{T} v+(1/2)v^{2}. Therefore, in view of the above proposition, there exists ω(x) := α(v(x)) ∈ ℝ^{m}_{+} with ∑^{m}_{i=1} ω_{i}(x) = 1 such that
and so
The above equality can be rewritten as
Note that (15) tells us that the steepest descent direction v(x) is a classical steepest descent direction for the scalarvalued problem min_{x} ∑^{m}_{i=1} ω_{i} F_{i}(x), whose weights ω_{i} := ω_{i} (x) ≥ 0 are the a priori unknown KKT multipliers for problem (14), with h_{i} = F_{i} for all i, at (θ(x), v(x)).
Let us now introduce an important concept. We say that the mapping F : ℝ^{n} → ℝ^{m} is ℝ^{m}_{+}convex if
Clearly, F : ℝ^{n} → ℝ^{m} is ℝ^{m}_{+}convex if and only if its components F_{i} : ℝ^{n} → ℝ are all convex. Under this additional assumption, we have the following extension of the wellknown scalar inequality which establishes that a convex differentiable function always overestimates its linear approximation:
In fact, this vector inequality is equivalent to the following m scalar inequalities F_{i}(u) ≥ F_{i} (w) + ∇F_{i} (w)^{T}(u  w) for all u, w ∈ ℝ^{n} and all i.
Let us now state and prove a simple result relating Pareto optimality and stationarity.
Lemma 3.5 Let be a weak Pareto optimum for F. Then is a stationary point. If, in addition, F is ℝ^{m}_{+}convex, this condition is also sufficient for weak Pareto optimality. So, under ℝ^{m}_{+}convexity, the concepts of weak efficiency and stationarity are equivalent.
Proof. First, assume that is weak Pareto. As we know, if is not a stationary point, there exists v ∈ ℝ^{n} such that J F ()v < 0, so (3) holds for x = , in contradiction with the weak Pareto optimality of
Now suppose that is stationary for the ℝ^{m}_{+}convex mapping F and take any x ∈ ℝ^{n}. Since is stationary, (2) holds for x = , v := x  and some j = j (). Therefore, by (18) for u = x and w = F_{j}(x) ≥ F_{j} , and so is weak a Pareto optimum for F.
We will keep on studying the case in which the algorithm does not have finite termination and therefore it produces infinite sequences {x^{k}}, {v^{k}} and {t_{k}}. Let us now state the additional assumption under which, in the R^{m}_{+}convex case, we will prove full convergence of {x^{k}} to a stationary point or, in view of the above lemma, to a weak unconstrained optimum of F.
Assumption 3.6. Every ℝ^{m}_{+}decreasing sequence {y^{k}} ⊆ F(ℝ^{n}) := {F(x): x ∈ ℝ^{n}} is ℝ^{m}_{+}bounded from below by a point in F(ℝ^{n}), i.e., for any {y^{k}} contained in F(ℝ^{n}) with y^{k+1} < y^{k} for all k, there exists ∈ ℝ^{n} such that F () ≤ y^{k} for all k.
Some comments concerning the generality/restrictiveness of this assumption are in order. In the classical unconstrained (convex) optimization case, this condition is equivalent to existence of solutions of the optimization problem. This assumption, known as ℝ^{m}_{+} completeness, is standard for ensuring existence of efficient points for vector optimization problems [^{[}^{35}^{]}, Section 3].
We will need the following technical result in order to prove that the MSDM is convergent.
Lemma 3.7. Suppose that F is ℝ^{m}_{+}convex and let ∈ ℝ^{n} be such that F () ≤ F(x^{k}) for some k. Then, we have
Proof. By (16), there exists w^{k} := w(x^{k}) ∈ ℝ^{m}_{+} such that
Using the ℝ^{m}_{+}convexity of F, we have that (18) holds, so F(x^{k}) + J F(  x^{k} ) ≤ F (). Since F () ≤ F(x^{k}), we get
Taking into account that w^{k} ∈ ℝ^{m}_{+} and using the above vector inequality as well as (19), we get
Recall that x^{k+1} = x^{k} + t_{k}v^{k}, with t_{k} > 0. Therefore, from the above inequality we get
which implies the desired inequality, because
We still need a couple of technical results. The first one, basically, establishes that any accumulation point of a sequence produced by the MSDM furnishes an ℝ^{m}_{+}lower bound for the whole objective values sequence. We state it for an arbitrary sequence of nonstationary points, because we will also need it for the convergence analysis of another method.
Lemma 3.8. Let {x^{(k)}} be any infinite sequence of nonstationary points in ℝ^{n} such that F(x^{(k+1)}) ≤ F(x^{(k)}) for all k.

If x is an accumulation point of {x^{(k)}},then
and
Also, F is constant in the set of accumulation points of {x^{(k)}}.
If {x^{k}}, generated by Algorithm 3.2, has an accumulation point, then all results of item 1 hold for this sequence.
Proof. 1. Let {x^{(kj)}}j be a subsequence that converges to x. Take a fixed but arbitrary k. For j large enough, we have k_{j} > k and F(x^{(kj)}) ≤ F(x^{(k)}). So letting j → ∞, we get
Since the above inequality holds componentwise and F_{i} (x^{(kj)}) → F_{i} (x) as j → ∞ for all i, we have lim_{k→∞} F(x^{(k)}) = F (x)
Now let be another accumulation point of {x^{(k)}}. Then there exists a subsequence {x^{(kℓ)}}_{ℓ} that converges to . By (20), F(x) ≤ F(x^{(kℓ)}) for all ℓ, so letting ℓ → ∞, we get F(x) ≤ F(). Interchanging the roles of x and , by the same reasoning, we get F() ≤ F(x). These two ℝ^{m}_{+}inequalities imply that F() = F(x).
2. It suffices to note that any (infinite) sequence {x^{k}} generated by Algorithm 3.2 satisfies F(x^{k+1}) < F(x^{k}) for all k.
We now study the speed of convergence to zero of the sequences {v^{k}} and {θ(x^{k})}.
Lemma 3.9. Suppose that {F(x^{k})} is ℝ^{m}_{+}bounded from below by y. Then
Proof. We know that F_{i}(x^{k+1}) ≤ F_{i}(x^{k})+σt_{k}∇F_{i}(x^{k})^{T}v^{k} for any i and all k. Adding up from k = 0 to k = N, we get
where we used (4) and (8) for x = x^{k}. As y and θ(x^{k}) < 0 for all i, k (Proposition 3.1, 3), we obtain
Since the last inequality holds for any nonnegative integer N, the conclusions follow immediately.
Before stating and proving the last theorem of this section, let us introduce the main tool for the convergence analysis. Recall that a sequence {u^{k}} ⊂ ℝ^{n} is quasiFejér convergent to a set U ⊂ ℝ^{n} if for every u ∈ U there exists a sequence {ε_{k}} ⊂ ℝ, ε_{k} ≥ 0 for all k and such that
We will need the following wellknown result concerning quasiFejér convergent sequences, whose proof can be found in ^{[}^{7}^{]}.
Theorem 3.10. If a sequence {u^{k}} is quasiFejér convergent to a nonempty set U ⊂ ℝ^{n}, then {u^{k}} is bounded. If, furthermore, {u^{k}} has a cluster point u which belongs to U, then lim_{k→∞}u^{k} = u.
In ^{[}^{7}^{]} it is proved that the steepest descent method for smooth (scalar) convex minimization, with step size obtained using the Armijo rule implemented by a backtracking procedure, is globally convergent to a solution (essentially, under the sole assumption of existence of optima). Using the same techniques, we extend this result to the multiobjective setting, that is to say, we show that any sequence produced by the MSDM converges to an optimum of problem (1), no matter how poor the initial guess might be.
Theorem 3.11. Suppose that F is ℝ^{m}_{+}convex and that Assumption 3.6 holds. Then any sequence {x^{k}} produced by Algorithm 3.2 converges to a weak Pareto optimal point x^{*} ∈ ℝ^{n}.
Proof. As {F(x^{k})} is an ℝ^{m}_{+}decreasing sequence, by Assumption 3.6, there exists ∈ ℝ^{n} such that
Now observe that 0 < t_{k} ≤ 1 for all k, so
Therefore, from (21), Lemma 3.9 and the above inequality, it follows that
Define
Using the ℝ^{m}_{+}convexity of F and (21), from Lemma 3.7, we see that for any x ∈ L
As L is nonempty, because ∈ L, from (22) and the above inequality, it follows that {x^{k}} is quasiFejér convergent to the set L. Therefore, {x^{k}} has accumulation points. Let x^{*} be one of them. By Lemma 3.8, x^{*} ∈ L. Then, using Theorem 3.10, we see that the whole sequence {x^{k}} converges to x^{*}. We finish the proof by observing that Theorem 3.3 guarantees that x^{*} is stationary and so, from the ℝ^{m}_{+}convexity of F, by virtue of Lemma 3.5, x^{*} is a weak Pareto optimum.
3.5 The MSDM with relative errors
Here we briefly sketch an inexact version of the MSDM. Instead of computing v^{k} = v(x^{k}) = argmin in Step 2 of Algorithm 3.2, we accept approximate solutions of the problem with some prespecified tolerance. For a nonstationary point x ∈ ℝ^{n} and a given tolerance ε ∈[0, 1), we say that a direction v ∈ ℝ^{n} is an εapproximate solution of min_{v∈ℝn} if
where φ_{x}(v) = max_{i=1,...,m} ∇F_{i}(x)^{T}v and θ(·) is the optimal value of the above optimization problem. Calling h_{x} the objective function in this problem, i.e., h_{x} (v) :=, we see that v is an εapproximate solution if, and only if, the relative error of h_{x} at v is at most ε,that is to say,
Clearly, the exact direction v(x) is always εapproximate at x for any ε ∈ [0, 1). Moreover, v(x) is the unique 0approximate direction at x. Also note that given a nonstationary point x ∈ ℝ^{n} and ε ∈ [0, 1), an εapproximate direction v is always a descent direction. Indeed, from Proposition 3.1 and the definition of εapproximate solution, it follows that for all i. Whence, J F(x)v < 0 and so, by lemma 2.1, v is a descent direction for F at x.
We can now define the inexact MSDM in almost the same way as its exact counterpart: in Step 1, there is also the choice of the tolerance parameter ε ∈[0, 1), in Step 2, v^{k} is taken as an εapproximate solution of and, finally, the stopping criterion in Step 3 becomes h_{xk}(v^{k})= 0. In ^{[}^{19}^{,}^{28}^{]} it is shown that the method is welldefined and that it has the same convergence features as the exact one.
4 THE MULTIOBJECTIVE PROJECTED GRADIENT METHOD
The scheme of this section is similar to the previous one. Here, for the constrained multiobjective problem, we propose an extension of the classical scalarvalued projected gradient method. First, we prove some technical results. Then, we present the convergence analysis of the method. And, finally, we sketch a numerically robust version of the method which admits relative errors on computing the search directions at each iteration.
Let F : ℝ^{n} → ℝ^{m} be a continuously differentiable mapping and C ⊆ ℝ^{n} be a closed and convex set. We are now interested in the following constrained multiobjective optimization problem:
For this problem, we seek a Pareto optimum (or efficient point), i.e., a feasible point (x^{*} ∈ C) such that there does not exist x ∈ C with F(x^{*}) ≤ F(x^{*}) and F(x) ≠ F(x^{*}), or we look for a weak Pareto optimum (or weakly efficient point), that is to say, a feasible x^{*} such that there does not exist x ∈ C with F(x) < F(x^{*}).
4.1 The search direction for the multiobjective projected gradient method
For the constrained multiobjective problem, a necessary condition for optimality of a point x ∈ C is stationarity (or criticality):
where J F(x)(C − x) := {J F(x)(x− x) : x ∈ C}. So x ∈ C. So x ∈ C is stationary for F if, and only if, for all v ∈ C  x we have J F(x) v 0. Note that when m = 1, we retrieve the classical stationarity condition for constrained scalarvalued optimization: (∇F(x), x − x) ≤ for all x ∈ C.
Let us now develop an extension of the classical (scalar) projected gradient method for the vectorvalued problem (23). First, for x ∈ C, we extend the search direction, originally defined for C = ℝ^{n} in (7), by
where β > 0 is a parameter and φ_{x} is defined in (4). Note that, in view of the strong convexity of the minimand, for any x ∈ C, the projected gradient direction v(x) is always welldefined.
Now we extend the optimal value function θ(·) (given by (8) in the unconstrained case):
We also need the following extension of Proposition 3.1 to the constrained case.
Proposition 4.1. Let v: C → ℝ^{n} and θ : C → ℝ be given by (24) and (25), respectively. Then, the following statements hold.
Proof. 1. Recalling that 0 ∈ C  x, the result follows as in the proof of Proposition 3.1.
2. Let x ∈ C and {x^{(k)}} ⊂ C be a sequence such that lim_{k→∞} x^{(k)} = x. Since v(x) ∈ C  x, we have v(x) + x  x^{(k)} ∈ C  x^{(k)}. Thus, from (24) and (25), we have
Hence, from the first inequality of (5), we obtain
Since F is continuously differentiable and u → max_{i}{u_{i}} is continuous, taking lim sup_{k→∞} on both sides of the above inequality yields
Now, observe that since v(x^{(k)}) ∈ C  x^{(k)}, we have v(x^{(k)}) + x^{(k)}  x ∈ C  x. Once again from (24) and (25), we obtain
Using once again the first inequality of (5), we have
Note that for all i and any z ∈ C, from the CauchySchwarz inequality and item 1, we have that . Therefore, since F is continuously differentiable and {x^{(k)}} is bounded, because it converges, we see that v(x^{(k)}) is bounded for all k. So, taking lim inf_{k→∞} on both sides of the above inequality, we get
where the second inequality follows from (5). We complete the proof by combining this inequality with with (26).
3. All results follow as in the proof of Proposition 3.1. (A formal proof of this equivalences for a more general case can be found in [^{[}^{25}^{]}, Proposition 3.]
4.2 The multiobjective projected gradient algorithm
We can now define the extension of the classical (scalar) projected gradient method for the constrained multiobjective optimization problem (23).
Algorithm 4.2. The multiobjective projected gradient method (MPGM)
Note that, whenever C = ℝ^{n}, taking β = 1, we retrieve the MSDM. Observe that, by virtue of Proposition 4.1, an alternative stopping criterion in Step 3 is v^{k} = 0. Note also that Algorithm 4.2 either terminates in a finite number of iterations with a stationary point or it generates an infinite sequence of nonstationary points (Proposition 4.1).
Now observe that if Step 4 is reached, by virtue of Lemma 2.1, in finitely many half reductions, we obtain the step length which is used in Step 5 to define the next iterate. Note that instead of the factor 1/2 in the reductions of Step 4, we could have used any other scalar ν ∈ (0,1). Finally, observe that since θ(x^{k}) < 0 implies J F(x^{k})v^{k} < 0, from steps 35, it follows that whenever the method generates and infinite sequence, the objective values are ℝ^{m}_{+}decreasing, i.e.,
If the method stops at iteration k_{0}, this inequality hold for k = 0, 1, ..., k_{0}  1.
4.3 Convergence analysis of the MPGM: the general case
As we know, Algorithm 4.2 has finite termination, ending with a stationary point, or generates an infinite sequence of nonstationary iterates. In the sequel we study the second case. First, we show a simple fact: the algorithm produces feasible sequences.
Lemma 4.3. Let {x^{k}} ⊂ ℝ^{n} be a sequence generated by Algorithm 4.2. Then, we have {x^{k}} ⊂ C.
Proof. Let us use an inductive argument. From the initialization of the algorithm, x^{0} ∈ C. Now assume that x^{k} ∈ C. Observe that, by Step 4 of Algorithm 4.2, t_{k} ∈ (0, 1] and, by Step 2, v^{k} ∈ C  x^{k} . Then, for some z^{k} ∈ C, v^{k} = z^{k}  x^{k} and from the convexity of C, it follows that x^{k+1} = x^{k} + t_{k}v^{k} = (1 t_{k})x^{k} + t_{k}z^{k} ∈ C and the proof is complete.
Now we see that whenever a sequence produced by Algorithm 4.2 has accumulation points, they are all feasible stationary points for the problem.
Theorem 4.4. Every accumulation point, if any, of a sequence {x^{k}} generated by Algorithm 4.2 is a feasible stationary point.
Proof. The feasibility follows combining the fact that C is closed with Lemma 4.3. The rest of the proof is similar to the one of Theorem 3.3: it suffices to use the definitions of v(x) and θ(x) given in (24) and (25), respectively.
4.4 Full convergence of the MPGM: the ℝ^{m}_{+}convex case
Let us note that, as being the maximum of convex functions, the objective of the minimization problem which defines v(x) in (24) is also convex and, as we also have that C  x is a closed convex subset of ℝ^{n}, by ^{[}^{3}^{]}, Proposition 4.7.2, a necessary condition for the optimality of v(x) is the existence of w(x) ∈ ℝ^{m} with
such that
This condition can be written as
Now using a very wellknown characterization of orthogonal projections (see, for instance, [^{[}^{3}^{]}, Proposition 2.2.1], we have
where P_{C}(z) := inf{u  z : u ∈ C} is welldefined because C is closed and convex. We conclude that there exists w(x) ∈ ℝ^{m} satisfying (27) such that
Observe that in the constrained scalarvalued case (m = 1), we have J F(x)^{T} = ∇F(x) and w(x) = 1, which means that
and this is precisely the search direction used in the projected gradient method for realvalued optimization. So this is an extension to the multiobjective setting of the classical projected gradient method.
Lemma 4.5. Suppose that F is ℝ^{m}_{+}convex. Let {x^{k}} be an infinite sequence generated by Algorithm 4.2. If for ∈ C and k ≥ 0, we have F () ≤ F(x^{k}), then
where w^{k} := w(x^{k}) ∈ ℝ^{m} is such that (27) and (29) hold with x = x^{k}.
Proof. Since x^{k+1} = x^{k} + t_{k}v^{k}, we have
Let us analyze the rightmost term of the above expression. Since v^{k} = v(x^{k}) and w^{k} = w(x^{k}), by (28) with x = x^{k}, we get
Taking v =  x^{k} ∈ C  x^{k} in the above inequality, we obtain
Now, observe that, the convexity of F and (18) yield J F(x^{k})(  x^{k}) ≤ F() − F(x^{k}). This fact, together with w^{k} ≥ 0, and F () ≤ F(x^{k}) implies
Also, since J F(x^{k})v^{k} < 0, because x^{k} is nonstationary, we have 〈w^{k}, J F(x^{k})v^{k}〉 < 0. Thus, recalling that β > 0, from (31) it follows that
The above inequality, together with (30), shows that
And the result follows because t_{k} ≥ 0.
As in the unconstrained case, we need to make an extra assumption in order to prove full convergence of the method.
Assumption 4.6. Every ℝ^{m}_{+}decreasing sequence {y^{k}} ⊂ F(C) := {F(x): x ∈ C} is ℝ^{m}_{+}bounded from below by an element of F(C).
We can now state and prove the main result on the convergence analysis of the MPGM: under similar conditions of those of Theorem 3.11, regardless of the initial (feasible) guess, the method converges to an optimum of problem (23).
Theorem 4.7. Assume that F : ℝ^{n} → ℝ^{m} is ℝ^{m}_{+}convex and that Assumption 4.6 holds. Then, any sequence {x^{k}} generated by Algorithm 4.2 converges to a weak Pareto optimum.
Proof. Let us define the set
and take ∈ L, which exists by Assumption 4.6. Since F is ℝ^{m}_{+}convex, it follows from Lemma 4.5 that
First, we prove that the sequence {x^{k}} is quasiFéjer convergent to the set L. Let {e^{1},..., e^{m}} be the canonical basis of ℝ^{m}. We observe that, for each k, we can write
Let us note that, from (27), 0 ≤ w^{k}_{i} ≤ 1 for all i and k. Thus, from (32), we have
Defining ε_{k} := 2βtk ∑^{m}_{i=1} 〈e^{i}, J F(x^{k})v^{k}〉, we have that ε_{k} ≥ 0 and
Thus, let us prove that ∑^{∞}_{k=0} ε_{k} < ∞. From the Armijolike condition in Step 4 of the algorithm, we obtain
Since x^{k} is nonstationary, we also have for all i and k that
which means that t_{k}〈e^{i}, J F(x^{k})v^{k}〉 = t_{k}〈e^{i}, J F(x^{k})v^{k}〉. Hence, we obtain
Adding up from k = 0 to k = N at the above inequality, where N is any positive integer, we get
Since N is an arbitrary positive integer, β, σ > 0 and F () ≤ F(x^{k}) for all k, we conclude that ∑^{∞}_{k=0} ε_{k} < ∞. Therefore, recalling that was an arbitrary element of L, we see that {x^{k}} converges quasiFéjer to L.
By Theorem 3.10, it follows that {x^{k}} is bounded. So, since C is closed, {x^{k}} ⊂ C has at least one feasible accumulation point, say x^{*}, which is feasible and stationary by Theorem 4.4. Using now the ℝ^{m}_{+}convexity of F, from Lemma 3.5, it follows that x^{*} ∈ C is a weak Pareto optimum.
As in the end of the proof of Theorem 3.11, we now see that x^{*} ∈ L. Since F(x^{k+1}) < F(x^{k}) for all k, by Lemma 3.8, 1, F(x^{*}) ≤ F(x^{k}) for all k, and x^{*} ∈ L. So, once again from Theorem 3.10, we conclude that {x^{k}} converges to x^{*} ∈ C, a weak Pareto optimum.
4.5 The MPGM with relative errors
As we did for the MSDM in Subsection 3.5, we present here a quick overview of an MPGM version implemented with relative errors in the search directions. Given a nonstationary x ∈ C and a tolerance ε ∈ [0, 1), we say that v ∈ C  x is an εapproximate solution of the problem min_{v∈C  x φx}(v) + v^{2} if
where h_{x}(v) := φx (v) + 1/2v^{2}, with φ_{x} given by (4).
Similar comments as those made in Subsection 3.5 for εapproximate solutions for unconstrained problems can be made here for the constrained case. The modification of the (exact) MPGM are basically the same as those proposed in Subsection 3.5 for the (exact) MSDM: in Step 2 of Algorithm 4.2, instead of defining v^{k} as the exact solution of min_{v∈C  x}k φ_{x}k(v) + 1/2v^{2}, we take v^{k} as an εapproximate solution of this problem. The stopping rule at Step 3 (θ(x^{k}) = 0) is replaced by h_{xk}(v^{k}) = 0.
For a more general case (vector optimization), a comprehensive study of the inexact version of the MPGM can be found in ^{[}^{22}^{]}, where, under essentially the same hypotheses made here for the (exact) MPGM, all convergence results are fully proved.
5 THE MULTIOBJECTIVE NEWTON METHOD
We now go back to the unconstrained multiobjective problem (1). In order to define a Newtontype search direction for this kind of problems, we first recall some convexity notions for vectorvalued mappings. We say that F : ℝ^{n} → ℝ^{m} is strictly ℝ^{m}_{+}convex if the inequality (17) is satisfied strictly, that is, if F is componentwise strictly convex. We also say that F is strongly ℝ^{m}_{+}convex if there exists û ∈ ℝ^{m}_{++} such that
for all x, y ∈ ℝ^{n} and all λ ∈ (0, 1), which is equivalent to say that F is componentwise strongly convex (each component F_{i} has û_{i} as its modulus of strong convexity). Clearly, as in the scalar case, strong ℝ^{m}_{+}convexity implies strict ℝ^{m}_{+}convexity, which, in turn, implies ℝ^{m}_{+}convexity.
The following result relates Pareto optimality with strict ℝ^{m}_{+}convexity and characterizes strong ℝ^{m}_{+}convexity in terms of the eigenvalues of the objective functions' Hessians.
Lemma 5.1. Assume that F : ℝ^{n} → ℝ^{m} is twice continuously differentiable. Then, the following properties hold.
If F is strictly ℝ^{m}_{+}convex and x ∈ ℝ^{n} is stationary for F, then x is a Pareto optimal point.

The mapping F is strongly ℝ^{m}_{+}convex if and only if there exists ρ > 0 such that
where ^{λ}min : ℝ^{n×n} → ℝ denotes the smallest eigenvalue function.
Proof. It follows from [^{[}^{27}^{]}, Corollary 2.2 and Proposition 2.3].
5.1 The search direction for the multiobjective Newton method
From now on, we will assume that F is strongly ℝ^{m}_{+}convex and twice continuously differentiable. For all i = 1, ..., m, let us define ψi : ℝ^{n} × ℝ^{n} → ℝ by
Hence, for any i, s → F_{i}(x) + ψ_{i}(x, s) is the local quadratic approximation of F_{i} at x. For x ∈ ℝ^{n}, we define the Newton direction for F at x as
Since ψ_{i} (x, ·) is strongly convex for all i, s → max_{i} ψ_{i} (x, ·) is also strongly convex and it has a unique minimizer in ℝ^{n}, and so s(x) is welldefined. Clearly, for m = 1 we retrieve the Newton direction for unconstrained realvalued optimization.
Observe that whenever ∇^{2}F_{i}(x) is equal to I, the identity matrix of ℝ^{n×n}, for all i, we have that max_{i=1,...,m} ψ_{i} (x, s) = φ_{x}(s) + (1/2)s^{2}, where φ is given by (4). Therefore, in multiobjective optimization, if ∇^{2}F_{i}(x) = I for all i, then, as it happens in the scalar case, the Newton and the steepest descent directions coincide.
Let us call θ(x) the optimal value of the minimization problem min_{s∈ℝn} max_{i=1,...,m} ψ_{i} (x, s), that is to say,
In order to compute s(x), we can reformulate the (possibly nonsmooth) minmax problem as:
a smooth convex problem with variables (τ, v) ∈ ℝ × ℝ^{n} for which there are many efficient numerical solvers.
We now present analogous results as those of Propositions 3.1 and 4.1.
Proposition 5.2. Let s : ℝ^{n} → ℝ^{n} and θ : ℝ^{n} → ℝ be given by (34) and (35), respectively. Then, the following statements hold.
Proof. 1. By (33) and (35), θ(x) ≤ max_{i} ψ_{i} (x, 0) =0.
2. (a) ⇒ (b). By (a), there exists ∈ ℝ^{n} such that J F(x) ∈ ℝ^{n}_{++}, which means that ∇F_{i}(x)^{T} < 0 for all i. From (33) and (35), for τ > 0, we get
Therefore, for τ > 0 small enough the right hand side of the above inequality is negative and (b) holds.
(b) ⇒ (c). Note that if θ(x) < 0 then, in view of (33) and (35), we have s(x) ≠ 0.
(c) ⇒ (a). From the strong ℝ^{m}_{+}convexity of F, it follows that ∇^{2}F_{i}(x) is positive definite for all i, so from (c) and item 1, we get
Hence, J F(x)s(x) < 0 and (a) holds.
In particular, we just saw that, for any nonstationary x ∈ ℝ^{n}, s(x) is a descent direction. The next lemma supplies an implicit definition of s(x), which will be used in order to obtain bounds for s(x). These bounds will allow us to prove continuity of s(·), θ(·) and will be used to establish the rate of convergence of the Newton method.
Lemma 5.3. For any x ∈ ℝ^{n}, the Newton direction defined in (34) can be written as
with α_{i} = α_{i}(x) ≥ 0 and ∑^{m}_{i=1} α_{i} = 1.
Proof. We know that the convex problem min_{s} max_{i} ψ_{i} (x, s), where ψ_{i} is defined in (33), has a unique solution s(x). Applying Proposition 3.4, we see that there exists α := α(x) ∈ ℝ^{m} such that the following conditions hold at s = s(x) and τ = θ(x):
Since ∇^{2}F_{i}(x) is positive definite and the multipliers α_{i} are nonnegative for all i, from both conditions in (37), the result follows.
So s(x), defined in (34), is a Newton direction for a standard scalar optimization problem (min_{x} ∑^{m}_{i=1} α_{i}F_{i}(x)), implicitly induced by weighting the given objective functions by the nonnegative a priori unknown KKT multipliers. We now give upper bounds for the norm of the Newton direction.
Corollary 5.4. For any x ∈ ℝ^{n}, we have
where ρ > 0 is given by Lemma 5.1.
Proof. Let us proof the first inequality. By Proposition 5.2, we have θ(x) ≤ 0, which, by (35), means that max_{i} ψ_{i} (x, s(x)) ≤ 0. Hence, by (33),
From Lemma 5.1, 2, we get
The proof follows from these two inequalities and the operator norm properties.
We now proof the second inequality. We know that the complementarity condition (39) holds for (τ, s) = (θ(x), s(x)). So, by (33) and (35), α_{i} > 0 implies θ(x) = ψ_{i} (x, s(x)), and then, using the first equality in (37), we have
Replacing ∑^{m}_{i=1} α_{i}∇F_{i}(x) by its expression obtained from (36), using Lemma 5.1, 2 and the fact that ∑_{i} α_{i} = 1 with α_{i} ≥ 0 for all i, we conclude that
and the proof follows immediately from Proposition 5.2, 1.
We now establish the continuity of the functions θ(·) and s(·).
Proposition 5.5. The functions θ : ℝ^{n} → ℝ and s : ℝ → ℝ^{n}, given by (35) and (34) respectively, are continuous.
Proof. Take x, x' ∈ ℝ^{n}. Let s = s(x), s' = s(x') and i_{0} such that max_{i} ψ_{i} (x', s') = ψ_{i0} (x', s). Then, from (35), (33) and the CauchySchwarz inequality,
where D^{2}F stands for the second derivative of F. Therefore, from the above inequality and Corollary 5.4 we conclude that
Since the above inequality also holds interchanging x and x^{'} and F is twice continuously differentiable, we conclude that θ(·) is continuous.
Now we prove the continuity of s(·). By the same reasoning as above,
where we used the CauchySchwarz inequality and (33) with x = x'. By Lemma 5.1, 2 λmin(∇^{2}F_{i}(x)) ≥ ρ for all x ∈ ℝ^{n} and all i. Therefore, ψ_{i} (x', ·) is strongly convex with modulus ρ for all i and so is max_{i} ψ_{i} (x', ·). Hence, using the fact that max_{i} ψ_{i} (x', ·) is minimized at s', we have
Combining inequality (40) with the above one, we get
Since θ(·) is continuous and F is twice continuously differentiable, using Corollary 5.4 and taking the limit x' → x in both sides of the above inequality, we conclude that s' → s and so s(·) is continuous.
5.2 The multiobjective Newton algorithm
Let us now describe the extension of the classical (scalarvalued) Newton method for the unconstrained multiobjective optimization problem (1).
Algorithm 5.6. The multiobjective Newton method (MNM)
As usual, the stopping rule at Step 3 can be replaced by s^{k} = 0 (Proposition 5.2). Now we show that if, at Step 3, θ(x^{k}) ≠ 0, the backtracking procedure prescribed in Step 4 is welldefined, i.e., it ends up in a finite number of (inner) iterations. Indeed, it suffices to note that
for any nonstationary x,t > 0 and all i. As J F(x)s(x) < 0, from lemma 2.1 it follows that there exists ε = ε(x) > 0 such that the first, and therefore the second, inequality in (41) hold for any t ∈ (0, ε). So, for x = x^{k}, we will need finitely many half reductions in order to obtain t_{k} ∈ (0,ε_{k}), where ε_{k} = ε(x^{k}). Observe also that the reducing factor 1/2 at Step 4 can be substituted by any ν ∈ (0,1).
Moreover, when θ(x^{k}) ≠ 0 for all k, we know that we have θ(x^{k}) < 0, so from step 4 it follows that
This inequality holds for k = 0, ..., k_{0}  1 if the algorithm stops at iteration k_{0}. Finally, we point out that, by Proposition 5.2, whenever the method has finite termination, it ends up with a stationary point and therefore a Pareto optimum for F (Lemma 5.1, 1). Therefore, the relevant case for the convergence analysis is the one in which the algorithm generates infinite sequences {x^{k}}, {s^{k}} and {t_{k}}. So, from now on, we suppose that the method does not stop.
5.3 Convergence analysis of the MNM
Let us begin with a technical result which will be useful in the sequel.
Proposition 5.7. For all k = 0, 1, ... and all i = 1, ..., m, we have
Proof. From Step 3 of Algorithm 5.6, for any i and all j, we have
We complete the proof by adding up for j = 1, ..., k and taking into account that θ(x^{j}) < 0 for all j.
We can now prove that if the sequence {x^{k}} has accumulation points, they are all Pareto optima for F.
Proposition 5.8. If (x) ∈ ℝ^{n} is an accumulation point of {x^{k}}, then (x) is Pareto optimum for F.
Proof. There exists a subsequence {x^{kj}}_{j} such that
Using Proposition 5.7 with k = k_{j} and taking the limit j → ∞, from the continuity of F, we conclude that
Therefore, lim_{k→∞}t_{k}θ(x^{k}) = 0 and, in particular,
Suppose that x is nonstationary, which, by Proposition 5.2, is equivalent to
Define
Using the definition of θ(·) and Lemma 2.1, we conclude that there exists a nonnegative integer q such that
Since s(·), θ(·) and g(·) are continuous,
and, for j large enough
which, in view of the definition of g and Step 4 of the algorithm, implies that t_{kj}≥ 2^{q} > 0 for j large enough. Hence, from the second limit in (44), we conclude that lim inf_{j→∞} t_{k}_{j}θ(x^{kj}) > 0, in contradiction with (42). Therefore, (43) is not true and, by Proposition 5.2, x is stationary. Whence, in view of Proposition 5.1, 1, x is a Pareto optimal point for F.
We can now show that, in order to have full convergence to a Pareto optimum, it suffices to take the initial point x^{0} in a bounded level set of F.
Theorem 5.9. Let x^{0} ∈ ℝ^{n} be in a compact level set of F and {x^{k}} a sequence generated by Algorithm 5.6. Then {x^{k}} converges to a Pareto optimum x^{*} ∈ ℝ^{n}.
Proof. Let ⌈_{0} be the F(x^{0})level set of F, that is, ⌈_{0} := {x ∈ ℝ^{n} : F(x) ≤ F(x^{0})}. We know that the sequence {F(x^{k})} is ℝ^{m}_{+}decreasing, so we have x^{k} ∈ ⌈_{0} for all k. Therefore {x^{k}} is bounded, all its accumulation points belong to ⌈_{0} and so, using Proposition 5.8, we conclude that they are all Pareto optima for F. Moreover, applying Lemma 3.8, 1 with x^{(k)} = x^{k} for all k, we see that all these accumulation points have the same objective value. As F is strongly (an therefore strictly) R^{m} _{+}convex, there exists just one accumulation point of {x^{k}}, say x^{*}, and the proof is complete.
5.4 Convergence rate of the MNM
Here we analyze the convergence rate of any infinite sequence {x^{k}} generated by Algorithm 5.6. First, we provide a bound for θ(x^{k+1}) based on data at the former iterate x^{k}. Then we show that for k large enough, full Newton steps are performed, that is to say, t_{k} = 1. Using this result, we prove superlinear convergence. Finally, under additional regularity assumptions, we prove quadratic convergence.
Lemma 5.10. For any k, there exist α^{k} ∈ ℝ^{m} such that ∑^{m}_{i=1} α^{k}_{i} = 1, α^{k}_{i} ≥ 0 for all i with
and
where ρ > 0 is given by Lemma 5.1.
Proof. By virtue of (37)(38) with x = x^{k}, s = s^{k}, the Lagrangian condition (45) holds for any k with a vector of KKT multipliers α = α(x^{k}) =: α^{k} ≥ 0 such that ∑^{m}_{i=1} α^{k}_{i} = 1.
In order to prove (46), observe that, from Steps 23 of Algorithm 5.6 and the properties of α^{k},
where the last inequality holds in view of Lemma 5.1 for x = x^{k+1}. The function
is convex, so its minimal value is attained where its gradient vanishes, i.e., at
So
The proof follows from this equality and (47).
Let us now see that, for any sequence defined by Algorithm 5.6, full Newton steps will be eventually taken at Step 4.
Theorem 5.11. Let x^{0} ∈ ℝ^{n} be in a compact level set of F and {x^{k}} a sequence generated by Algorithm 5.6. Then {x^{k}} converges to a Pareto optimum point x^{*} ∈ ℝ^{n}. Moreover, t_{k} = 1 for k large enough and the convergence of {x^{k}} to x^{*} is superlinear.
Proof. Convergence of {x^{k}} to a Pareto optimum x^{*} was established in Theorem 5.9. By Lemma 3.5 and Proposition 5.2, 3, we know that x^{*} is stationary and
Since F is twice continuously differentiable, for any ε > 0 there exists δ > 0 such that
where B(x^{*}, δ) stands for the open ball with center in x^{*} and radius δ. As {x^{k}} converges to x^{*}, using (48) and the continuity of s(·) (Proposition 5.5), we conclude that {s^{k}} converges to 0. Therefore there exists k_{ε} such that for k ≥ k_{ε} we have
and using (49) for estimating the integral remainder on Taylor's second order expansion of F_{i} at x^{k}, we conclude that for any i
So, using (33), (35), Corollary 5.4 and the fact that σ < 1, we obtain
The above inequality shows that if ε < (1  σ)ρ, then, at Step 4, t_{k} = 1 is accepted for k ≥ k_{ε}. Suppose that
Using (45) and (49) for estimating the integral remainder on Taylor's first order expansion of ∑^{m}_{i=1} α^{i}_{k} ∇F_{i} at x^{k}, in x^{k} +s^{k}, we conclude that for any k ≥ k_{ε} it holds
which, combined with (46), shows that
Using the above inequality and Corollary 5.4, we conclude that for k ≥ k_{ε} we have
Therefore, if k ≥ k_{ε} and j ≥ 1 then
To prove superlinear convergence, take 0 < ξ < 1 and define
If ε < ε^{*} and k ≥ k_{ε}, using (50) and the convergence of {x^{k}} to x^{*}, we get
Hence,
Combining the above two inequalities, we conclude that, if ε < ε^{*} and k ≥ k_{ε}, then
Since ξ was arbitrary in (0, 1), the above quotient tends to zero and the proof is complete.
Recall that in the classical Newton method for minimizing a scalar convex twice continuously differentiable function, the proof of quadratic convergence requires Lipschitz continuity of the objective function's second derivative. Likewise, under the assumption of Lipschitz continuity of D^{2}F, we can see that the MNM also converges quadratically.
Theorem 5.12. Let x^{0} ∈ ℝ^{n} be in a compact level set of F, {x^{k}} a sequence generated by Algorithm 5.6 and D^{2}F Lipschitz continuous on ℝ^{n}. Then {x^{k}} converges quadratically to a Pareto optimal solution x^{*} ∈ ℝ^{n}.
Proof. By Theorem 5.11, {x^{k}} converges superlinearly to a Pareto optimal point x^{*} and t_{k} = 1 for k large enough. Let be the Lipschitz constant for D^{2}F. Then, for α^{k} ∈ ℝ^{m} as in Lemma 5.10, we have that ∑^{m}_{i=1} α^{k}_{i}∇^{2}F_{i} is also Lipschitz continuous. Therefore, for k large enough,
where the last inequality follows from the second equality, (45) and Taylor's development of ∑^{m}_{i=1} α^{k}_{i}∇F_{i} at x^{k}, in x^{k} + s^{k}. Using the above inequality and (46), we obtain
which, combined with Corollary 5.4 yields
Take ξ ∈ (0, 1). Since {x^{k}} converges superlinearly to x^{*}, there exists k_{0} such that for k ≥ k_{0} we have (51), (52) and
Therefore, applying the triangle inequality to x^{*}  x^{ℓ} and then to x^{ℓ+1}  x^{ℓ}, for ℓ ≥ k_{0} we obtain
Whence, using the above rightmost inequality for ℓ = k ≥ k_{0} and the second equality in (51), we have
while using the first inequality in (53) for ℓ = k + 1 and the second equality in (51) (for k + 1 instead of k), we obtain
Quadratic convergence of {x^{k}} to x^{*} follows from the two inequalities above and (52).
6 FINAL REMARKS
Let us make some final comments. First, it is worth to notice that, by implementing the vector optimization versions of the steepest descent method ^{[}^{28}^{]} and the projected gradient method ^{[}^{21}^{,}^{22}^{,}^{25}^{]}, we can guarantee convergence to Pareto optima instead of just weak Pareto points. Indeed, it suffices to take an underlying ordering (closed convex) cone with ℝ^{m}_{+}\{0} contained in the interior of and such that a version of Assumption 3.6 (Assumption 4.6) in the unconstrained (constrained) case is satisfied [^{[}^{21}^{]}, Section 6].
Without using this technique, in some cases, convergence of these two methods to (strong) Pareto optima can also be obtained. For instance, if the objective function is strictly ℝ^{m}_{+}convex, then, under the assumptions of the convergence theorems, this will actually happen. Indeed, as we saw in the proof of Theorem 3.11 (Theorem 4.7), sequences generated by the MSDM (MPGM) converge to stationary points, which, in this case, are Pareto optima (Lemma 5.1, 1).
We also point out that, with these (and other) descent vectorvalued methods, we are just concerned with finding a single optimum and not all the points in the optimal set. Nevertheless, from a numerical point of view, we can expect to somehow approximate the solution set by just performing any of these methods for different initial points (e.g., ^{[}^{18}^{]}, Section 8). In the socalled weighting method, this kind of idea also appears. To be more precise, in such a case, one weighting vector possibly gives rise to a Pareto optimal point, which means that we should perform the method for different weights in order to find the Pareto frontier. However, in some cases, as shown in [^{[}^{18}^{]}, Section 7], almost any choices of the weighting vectors may lead to unbounded problems. With arbitrary choices of initial points, we do not have this disadvantage in the descent vectorvalued methods discussed here. We believe this is one of the main motivations for the study of such methods.