Acessibilidade / Reportar erro

STOCHASTIC KNAPSACK PROBLEM: APPLICATION TO TRANSPORTATION PROBLEMS

ABSTRACT

In this paper, we study the stochastic knapsack problem with expectation constraint. We solve the relaxed version of this problem using a stochastic gradient algorithm in order to provide upper bounds for a branch-and-bound framework. Two approaches to estimate the needed gradients are studied, one based on Integration by Parts and one using Finite Differences. The Finite Differences method is a robust and simple approach with efficient results despite the fact that estimated gradients are biased, meanwhile Integration by Parts is based upon more theoretical analysis and permits to enlarge the field of applications. Numerical results on a dataset from the literature as well as a set of randomly generated instances are given.

Keywords:
Stochastic knoasack problem; transportation problem; probabilistic constraint; Branch and Bound; Integration by parts

1 INTRODUCTION

The deterministic knapsack problem is a well known and well studied NP-hard combinatorial optimization problem. It consists in filling a knapsack with items out of a given set such that the weight capacity of the knapsack is respected and the total reward maximized. Several variants of the knapsack problem have been considered for many practical problems amongst all network design problems. In telecommunication network design, the knapsack problem is often used as a subproblem for modelling the capacity of the edges (see Kellerer et al., 200412 KELLERER H, PFERSCHY U & PISINGER D. 2004. Knapsack problems. Springer, Berlin.). It is now generally admitted that the uncertainty is an inherent property in many practical problems. The main reason relies on the fact that the use of average parameters could lead to severe service quality deterioration whereas the use of extreme values could result into costly conservative solutions. In telecommunication networks, the main sources of uncertainty are the traffic demands and the costs. Additional sources could be network and equipments failures. There are two main approaches to deal with uncertainty, namely robust optimization (Ben-Tal & Nemirowski, 20083 BEN-TAL A & NEMIROWSKI A. 2008. Selected topics in robust convex optimization. Math. Programming, 112(1): 125-158.) or stochastic optimization (Birge & Louveaux, 19975 BIRGE JR & LOUVEAUX F. 1997. Introduction to stochastic programming. Springer-Verlag, New York.). Demand uncertainty in telecommunication network design problems leads to several stochastic models amongst all the stochastic knapsack problem. The latter was used either with chance constraints (Kosuch & Lisser, 201013 KOSUCH S & LISSER A. 2010. Upper bounds for the 0-1 stochastic knapsack problem and a b&b algorithm. Annals of Operations Research, 176(1): 77-93.) or within a two-stage stochastic knapsack problems either for the single formulation of for the multiple knapsack one (Tönissen et al., 201616 TÖNISSEN D, BOUMAN P, VEN DER AKKER J & HOOGEVEEN J. 2016. Decomposition approaches for recoverable robust optimization problems. European Journal of Operations Research, 251(3): 739-750., 201717 TÖNISSEN D, VEN DER AKKER J & HOOGEVEEN J. 2017. column generation strategies and decomposition appraoches for the two-stage stochastic knapsack problem. Computers and Operations Research, 83: 125-139.). The stochastic knapsack problem with random item sizes has met a large interest in the literature in the last decade. There are at least two ways to deal with a possible overload. When the overload is not considered, Bhalgat & Khanna (20114 BHALGAT A, GOEL A & KHANNA S. 2011. Improved approximation results for stochastic knapsack problems. In: Proceedings of the Twenty-Second Annual ACM-SIAM Symposium on Discrete Algorithms, page 1647-1665.) proposed to remove the last inserted item. In Chen & Ross (20148 CHEN K & ROSS S. 2014. An adaptive stochastic knapsack problem. European Journal of Operations Research, 239(3): 625-635.), the authors proposed that the knapsack returns zero when it overflows. In the case where the overflow is accepted, the knapsack constraint is replaced by a probabilistic constraint (see Goal & Indyk, 199911 GOAL A & INDYK P. 1999. Stochastic load balancing and related problems. In: Proceedings of Foundations of Computer Science, pages 579-586.; Kosuch &Lisser, 201013 KOSUCH S & LISSER A. 2010. Upper bounds for the 0-1 stochastic knapsack problem and a b&b algorithm. Annals of Operations Research, 176(1): 77-93.).

In stochastic optimization there are three major approaches to deal with stochasticity: Probabilistic constraint, two- or multi-stage settings as well as on-line or dynamic modeling. In the first case, all decisions are made before the random parameters are revealed. The objective function and/or the constraints are thus formulated using expectations and probability measures. In the second case, recourse actions occur when random parameters are known. These recourse actions are formulated as a penalty that has to be paid in the case of violated constraint or are used to correct the decisions made in previous stages. In the last approach, on-line problems are solved dynamically. More precisely, the algorithm implemented to solve the problem is provided with further information during its execution and reacts dependently on this new information as well as previous decisions. The solution of an on-line problem is a decision policy while in the first two cases the current decisions are computed.

In this paper we focus on a particular variant of the single-stage stochastic knapsack problem with random weights: the expectation constrained knapsack problem. The paper is organized as follows: In section ? the mathematical formulation of the problem is presented and discussed. In section ? we present the stochastic gradient algorithm used to solve the corresponding relaxed problem. Two further methods to estimate the needed gradients are presented. On the opposite to the Approximation by Convolution method studied in Kosuch & Lisser (201013 KOSUCH S & LISSER A. 2010. Upper bounds for the 0-1 stochastic knapsack problem and a b&b algorithm. Annals of Operations Research, 176(1): 77-93.) which gives an approximation of the gradient, the method of Integration by Parts used in this article replaces the expected function by another one with exactly the same expectation. In section ? the convergence of the stochastic gradient algorithm is analyzed from a theoretical as well as numerical points of view and test results concerning the solution of the relaxed problem are presented. In section ? we use these methods to solve the combinatorial problem described in Cohn & Barnhart (19989 COHN A & BARNHART C. 1998. The stochastic knapsack problem with random weights: A heuristic approach to robust transportation planning. In Proceedings of the Triennial Symposium on Transportation Analysis (TRISTAN III).) using a branch-and-bound framework presented in Kosuch & Lisser (2010)13 KOSUCH S & LISSER A. 2010. Upper bounds for the 0-1 stochastic knapsack problem and a b&b algorithm. Annals of Operations Research, 176(1): 77-93.. Numerical results on a set of randomly generated data are given and analyzed.

2 MATHEMATICAL FORMULATIONS

We consider a stochastic knapsack problem of the following form: Given a set of n items, we want to choose these items without knowing the exact value of their weights. Therefore, we handle the weights as random variables and assume that weight χi of item i is independently normally distributed with mean µ i > 0 and standard deviation σ i . Furthermore, each item has a fixed reward per weight unit r i > 0. The choice of a reward per weight unit can be motivated by the fact that the value of an item often depends on its weight which we do not know in advance1 1 All the methods used and results presented in this article are still valid in the case where the rewards are deterministic or random but independent of the weights. We do not use the fact that the rewards (per item) are normally distributed, therefore any other distribution could be considered. . We denote by χ, μ, σ and r the corresponding n-dimensional vectors. The aim is to maximize the expected total gain 𝔼 [i=1nr i χ i x i ]. Our knapsack problem has a fixed weight capacity c > 0 but due to the stochastic nature of the weights, we need to define correctly what respecting capacity means. In this paper, we solve the following expectation constrained knapsack problem:

Chance Constrained Knapsack Problem (ECKP)

max x { 0 , 1 } n E i = 1 n r i χ i x i (1)

s . t . E + ( c - g ( x , χ ) ) p (2)

where 𝔼 [ ] denotes the expectation, g(x, χ) := i=1nχ i x i is the total weight of the chosen items, ℍℝ+(∙) denotes the indicator function of the positive real interval where ⋅ =1 if ∈ ℝ+and 0 otherwise, and p ∈ (0.5, 1) is the prescribed probability.

The choice of p is a decision parameter that restricts the percentage of cases where the capacity is exceeded. p is not connected with the amount of overweight. The constraint (2) can be equivalently reformulated as the following chance constraint:

P { g ( x , χ ) c } p (3)

Without loss of generality, we assume that 𝔼[ℍℝ+(c - χ i )] ≥ p for all I ∈ {1,…, n}: any item that does not satisfy this constraint could be excluded from the beginning. It follows, that the optimal solution vector x* has at least one non-zero component.

We call J the objective function of the above maximization problem defined by 𝔼[ i=1nr i χ i x i ]. We denote j(x, χ) = i=1nr i χ i x i .

Furthermore, we refer to the function on the left hand side of the constraint as Θ and to the function inside the expectation of Θ as θ, i.e. Θ(x) = 𝔼[θ(x, χ)] = 𝔼[ℍℝ+(c - g(x, χ))].

Throughout this paper, we assume that the weights are normally distributed. Normally distributed items have the nice property that their linear combination is still normally distributed. This property ensures easier calculations but is not a foremost condition. In some extent, in the case of normal distributions, it is even possible to compute directly the gradient of the expectation function. Assuming normal distribution is, in a large part, just a practical frame in order to produce numerical results and theoretical formulations. Whenever we use normal distribution properties, we will explain how to overpass this special consideration.

Normally distributed random variables can have negative realizations, and assuming normally distributed weights might thus seem contradictory to the fact that item weights are always strictly positive. However, in most real life applications the standard deviation is several times smaller than mean values of the unknown parameters. In this case, the probability of negative weights becomes negligible.

3 PROBLEM SOLVING METHOD

Due to its combinatorial nature, ECKP can be solved using a branch-and-bound framework as presented in Kosuch & Lisser (201013 KOSUCH S & LISSER A. 2010. Upper bounds for the 0-1 stochastic knapsack problem and a b&b algorithm. Annals of Operations Research, 176(1): 77-93.). To obtain upper bounds, the authors propose to solve the corresponding continuous optimization problem. A stochastic gradient algorithm can be used to solve this problem. This method needs to evaluate the gradient of the expected value function, which is an indicator function ℍℝ+(∙). In Kosuch & Lisser (2010)13 KOSUCH S & LISSER A. 2010. Upper bounds for the 0-1 stochastic knapsack problem and a b&b algorithm. Annals of Operations Research, 176(1): 77-93., this computation has been done using Approximation by Convolution. In this paper, we study two different approaches: the first one is a non-biased estimator based on Integration by Parts (called hereafter IP-method). The second approach is a Finite Differences estimator (FD-method) presented in Andrieu et al. (20072 ANDRIEU L, COHEN G & VÁZQUEZ-ABAD F. 2007. Stochastic programming with probabilityconstraints. http://fr.arxiv.org/abs/0708.0281 (Accessed 24 October 2008).
http://fr.arxiv.org/abs/0708.0281...
). Like the Approximation by Convolution method, FD-method provides a biased estimator of the gradient.

Instead of replacing {0, 1}n by {0, 1}n when relaxing ECKP, the theoretical analysis will compel us to consider a complementary set of a neighborhood of 0[0,1]n. Considering that an empty knapsack is not an optimal solution, it follows that the optimal solution vector of the continuous problem contains at least one component x k with x k ≥ 1/n. We are thus allowed to replace [0, 1]n by {x ∊[0, 1]n |||x||∞ ≥ 1/n} =: X cont . Accordingly, we obtain the following feasible set of the relaxed ECKP:

X c o n t a d = { x X c o n t : Θ ( x ) p }

3.1 The stochastic gradient type algorithm

To solve the relaxed version of ECKP, we use a stochastic gradient type algorithm. Applying the gradient method is promising as the objective function is concave and, in addition, constraint (2) defines a convex feasible set due to the assumption that the weights are independently normally distributed.

We propose to solve ECKP with a Stochastic Arrow-Hurwicz algorithm (Culioli & Cohen, 199510 CULIOLI JC & COHEN G. 1995. Optimisation stochastique sous contraintes en espérance. Comptes rendus de l’Académie des sciences, Paris, Série I, 320(6): 753-758.) (hereafter called SAH-algorithm; see Algorithm 3.1) that uses Lagrangian multipliers to deal with the probability constraint.

Algorithm 3.1
Stochastic Arrow-Hurwicz Algorithm.

ℒ(x, λ) = 𝔼[l(x, λ, χ)] denotes the Lagrangian function associated with ECKP.

L ( x , λ ) = E i = 1 n r i χ i x i - λ p - E + ( c - g ( x , χ ) )

It follows that

( r k + λ k - 1 τ k ) = x l ( x k , λ k , χ k )

and

( p - θ ( x k + 1 , χ k ) ) = l ' λ ( x k + 1 , λ k , χ k )

where r k , λk and τ k are defined in Algorithm 3.1. Note that in the deterministic form of the Arrow-Hurwicz algorithm we can use the gradients of the Lagrangian itself. But this function is here an expectation function and its gradient is thus difficult to compute. By drawing independent samples of the random variables at each iteration of the algorithm, the expectation of the gradient is approximated (see Culioli & Cohen, 199510 CULIOLI JC & COHEN G. 1995. Optimisation stochastique sous contraintes en espérance. Comptes rendus de l’Académie des sciences, Paris, Série I, 320(6): 753-758.). At each iteration of the algorithm, we need to calculate the gradient of θ at (x k , χk ). However, θ is mainly an indicator function and therefore not useful to differentiate. In the following subsection, we present two ways to bypass this disadvantage: either by approximation using Finite Differences or by reformulation of the constraint function Θ using Integration by Parts.

3.1.1 Computation of the gradient of θ

The first method consists in approximating the h th component of the gradient of θ by the corresponding difference ratio

θ ( x + δ ν h , χ ) - θ ( x - δ ν h , χ ) 2 δ

where δ > 0 and v h ∈ {0, 1}n such that νhh=1 and νih=0 for ih. This leads to the following approximation of the h th component of the gradient of θ:

x ( θ δ ) ( x , χ ) h = + ( c - g ( x + δ ν h , χ ) ) - + ( c - g ( x - δ ν h , χ ) ) 2 δ

The second method (IP-method) involves Integration by Parts to reformulate 𝔼[θ (x, χ)] and to obtain a function in expectation 𝔼[ θ~ (x, χ)] s.t. ??[ θ~ (x, χ)] = 𝔼[θ (x, χ)]= Θ(x). θ~ is the function obtained by Integration by parts, it is differentiable and the idea is to use the gradient of θ~ in SAH-algorithm.

Andrieu et al. (Andrieu et al., 20072 ANDRIEU L, COHEN G & VÁZQUEZ-ABAD F. 2007. Stochastic programming with probabilityconstraints. http://fr.arxiv.org/abs/0708.0281 (Accessed 24 October 2008).
http://fr.arxiv.org/abs/0708.0281...
) presented how to compute the gradient of an indicator function in expectation using Integration by Parts (see Theorem 5.5 in Andrieu, 20041 ANDRIEU L. 2004. Optimisation sous contrainte en probabilité. Ecole Nationale des Ponts et Chaussées.). We state and proof their theorem for the case of ECKP. The variables and functions used in this proposition are defined in section 2:

Proposition 1.Let 𝕐ℝ+(∙) be a primitive ofℝ+(∙). Let xXcontadand let κ = κ(x) ∈{1,..., n} be defined such that x κ = ||x|| ≥ 1/n. Then, using Integration by Parts, we get

Θ ( x ) = E Y + ( c - g ( x , χ ) ) M κ ( x , χ )

Where

M κ ( x , χ ) = - ( χ κ - μ κ ) σ κ 2 1 x κ

It follows

x Θ ( x ) = E - + ( c - g ( x , χ ) ) M κ ( x , χ ) χ + Y + ( c - g ( x , χ ) ) x M κ ( x , χ )

Proof. Let φ denote the density function of the random vector χ = (χ 1 , ..., χ n ) and define

u ' χ κ ( x , χ ) - + ( c - g ( x , χ ) ) x κ and v ( x , χ ) : = - φ ( χ ) x κ

It follows

Θ ( x ) = - + ( c - g ( x , χ ) ) φ ( χ ) d χ = - u ' χ κ ( x , χ ) v ( x , χ ) d χ

Integration by Parts over χκ leads to

Θ ( x ) = [ u ( x , χ ) v ( x , χ ) ] - - - u ( x , χ ) v ' χ κ ( x , χ ) d χ = - - u ( x , χ ) v ' χ κ ( x , χ ) d χ = - - Y + ( c - g ( x , χ ) ) v ' χ κ ( x , χ ) d χ

In our case the random variables are independently distributed. With φ i being the density function of χ i we get

φ ' χ κ ( χ ) = i κ φ i ( χ i ) · ( φ κ ) ' χ κ ( χ κ ) = i κ φ i ( χ i ) · - ( χ κ - μ κ ) σ κ 2 φ κ ( χ κ ) = - ( χ κ - μ κ ) σ κ 2 φ ( χ )

It follows

v ' χ κ ( x , χ ) = χ κ - φ ( χ ) x κ = ( χ κ - μ κ ) x κ σ κ 2 φ ( χ )

and therefore

Θ ( x ) = - - Y + ( c - g ( x , χ ) ) ( χ κ - μ κ ) x κ σ κ 2 φ ( χ ) d χ = E - Y + ( c - g ( x , χ ) ) ( χ κ - μ κ ) x κ σ κ 2

If xΘ(x)=Ex-Y+(c-g(x, χ))(χκ-μκ)xκσκ2 we get the result. Thus, it remains to proof the following claim:

Claim 3.1

x Θ ( x ) = E x - Y + ( c - g ( x , χ ) ) ( χ κ - μ κ ) x κ σ κ 2

Proof of the claim. This is a classical result under the assumption of uniform bounding of the gradient function inside the integral. In our case, we can easily show that this bounding is obtained according to the assumption that x κ ≥ 1/n. First of all let us observe that we can choose 𝕐+(x) = ℍ+(x) ∙ x = [x]+. Therefore, the κ-th component of

x - Y + ( c - g ( x , χ ) ) ( χ κ - μ κ ) x κ σ κ 2

can be reformulated as follows:

x - Y + ( c - g ( x , χ ) ) ( χ κ - μ κ ) x κ σ κ 2 k = + ( c - g ( x , χ ) ) ( χ κ - μ κ ) x κ σ κ 2 χ κ + [ c - g ( x , χ ) ] + ( χ κ - μ κ ) x κ 2 σ κ 2 = + ( c - g ( x , χ ) ) ( χ κ - μ κ ) χ κ x κ σ κ 2 + ( c - g ( x , χ ) ) x κ 2 σ κ 2

For (χκ - µ κ ) >0 and as [(c - g(x, χ)) ≥ 0 ⟺ ℍ+(c - g(x, χ)) =1] it follows

0 ( x - Y + ( c - g ( x , χ ) ) ( χ κ - μ κ ) x κ σ κ 2 ) κ ( χ κ - μ κ ) n · χ κ σ κ 2 + n 2 · ( c - g ( x , χ ) ) σ κ 2

If (χκ - µ κ ) < 0, the bounds are inversed.

For the other components of the gradient, we just have ℍ+(c - g(x, χ)) (χκ-μκ)xκσκ2 χh and thesame limitations holds for the κ-th component. Note that Θ~ is differentiable except in 0. □

4 CONVERGENCE OF THE STOCHASTIC ARROW-HURWICZ ALGORITHM

4.1 Theoretical convergence

Culioli & Cohen (199510 CULIOLI JC & COHEN G. 1995. Optimisation stochastique sous contraintes en espérance. Comptes rendus de l’Académie des sciences, Paris, Série I, 320(6): 753-758.) showed how to obtain the convergence of SAH-algorithm in the case where the objective function J is convex despite the fact that j is not (for a global survey see Carpentier, 20107 CARPENTIER 2010. Méthodes numériques en optimisation stochastique. Ecole Nationale Supérieure des Techniques Avancées. http://www.ensta.fr/$\sim$pcarpent/MNOS/ (Accessed march 2010).
http://www.ensta.fr/$\sim$pcarpent/MNOS/...
). More precisely, they adapt the classical SAH-algorithm in the case where the constraint is given in expectation by considering a subgradient both in dual variables x and λ instead of only in λ. The drawback is to check technical assumptions on both gradients that have to be linearly bounded. The way we transform the expected function in the constraint permits to check correctly all the assumptions with some limitations explained below.

Theorem 1 ( Culioli, Cohen (1995 10 CULIOLI JC & COHEN G. 1995. Optimisation stochastique sous contraintes en espérance. Comptes rendus de l’Académie des sciences, Paris, Série I, 320(6): 753-758. )). Suppose the following assumptions to be satisfied:

  • h1: θ, χ) is differentiable with gradient uniformly bounded with respect to χ.

  • h2: The associated Lagrangian admits a saddle point x ~ and J is strictly convex.

  • h3:χ ∈ ℝn , θ(·, χ) is locally Lipschitz continuous.

  • h4: There exists c1, c2> 0 such that

  • χ ∈ ℝn ,x, yX cont , ||θ(x, χ) − θ(y, χ)|| ≤ c 1||xy|| + c 2 .

  • h5: Θ(x) is Lipschitz continuous and concave.

  • h6: There exists α, β > 0 such that

  • χ ∈ ℝn ,xX cont ||∇x j(x, χ)|| ≤ α||xx~ || + β.

  • h7: ∈kkis monotone non-increasing.

  • h8: There exists γ, δ > 0 such that

xX cont 𝔼[θ(x, χ) − Θ(x)]2 < γ||xx~||2 + δ.

Then, the sequence (x k , λ k ) is bounded and x k converges weakly towardsx~.

We are now going to check all the assumptions on our sample set and situation, we focus on the IP-method:

  • h1 As presented in subsection 3.1.1, it is possible to replace θ(x, χ) by a differentiable function θ~ (x, χ) such that 𝔼[ θ~ (x, χ)]= 𝔼[θ(x, χ)] (IP-method).

Using the IP-method we obtain the following gradient:

x θ ~ ( x , χ ) = + ( c - g ( x , χ ) ) ( χ κ - μ κ ) x κ σ κ 2 χ + [ c - g ( x , χ ) ] + ( χ κ - μ κ ) x κ 2 σ κ 2 ν κ

The fact that ℍ+(cg(x, χ)) = 1 only for cg(x, χ) ≥ 0 limits χ to a compact domain and h1 is checked.

  • h2 We suppose that the Lagrangian function admits a saddle point. Nonetheless, we observe that strict convexity of J is a sufficient condition to get the stability of the Lagrangian function. In our case, the function J is linear.

  • h3 For all χ∈ ℝnθ~ (⋅, χ) is locally Lipschitz continuous according to the expression of ∇xθ~ (x, χ) in all points χX cont since x κ1n .

  • h4 Choose c 2 = 1. It follows ∀ χ ∈ ℝn , ∀x,yX cont ||θ(x, χ) - θ(y, χ)|| ≤ c 2.

  • h5 As we assume the item weights to be normally distributed, Θ(x) = ℙ{g(x, χ) ≤ c} is Lipschitz continuous: Let F be the cumulative distribution function of the standard normal distribution. Then we have

Θ ( x ) = F c - i = 1 n μ i x i i = 1 n σ i 2 x i 2

It is easy to see that Θ is continuous on X cont . In addition, we have 0 ≤ Θ(x) ≤ 1 for all x ∈ ℝn and as X cont is a compact set, it follows that Θ is Lipschitz continuous on X cont .

We don’t really need normal distribution to establish this assumption, as we replaced θ by θ~,we get that :

χ , ( x , y ) X c o n t | | θ ~ ( x , χ ) - θ ~ ( y , χ ) | | τ | | x - y | |

where τ bounds ||xθ~|| given in h1 on X cont . Then, we integrate the inequality, and we get the bounding on Θ.

The question of concavity of Θ is more complex and depends on the sample values of the objects. There are two different questions to solve: is the constraint concave on the hypercube, and if not, is the constraint concave on a partial subset of the hypercube? The answer to the first question is definitely no. For further details concerning chance constraints, we refer to Prekopa (199515 PREKOPA A. 1995. Stochastic Programming. Kluwer Academic Publishers (Dordrecht, Boston).).The second question depends on initial parameters: in a first approach we begin to observe that in the case of a mono dimensional vector x, Θ is concave for x <x~ where x~ depends on the initial conditions.

In case of a higher dimensional problem, the situation becomes more complex. Despite there exists a domain for x near to 0 where Θ is concave (see Fig. 1 for the value of Θ observed for a two dimensional vector), the boundaries of this domain still depend on the characteristics of the sample set.

Figure 1:
Constraint function τ in 2-dimensional case.

A practical check combined with a theoretical analysis of the concavity is then possible.We begin to compute the eigenvalues of the Hessian matrix H(x~) at a point x~ close to 0 of the set X cont and we find them all negative. According to the fact that the determinant of the Hessian matrix is a continuous function of x, it follows that there exists a topologically connected component CX cont of x~ such that det H(x) ≠ 0 for all xC and x~C. We consider the boundary C - C of this set and we set p = min{Θ(x) : xC- C}.

To conclude on the above observations, we formulate the following proposition:

Proposition 2.Let χ1,...,χnbe a fixed set of independently normally distributed random variables and let c be a fixed capacity such that for the vector x with all components near to 0, the Hessian matrix has all eigenvalues strictly negative. Then, there exists p = p(χ 1 ,...,χ n , c) such that Θ(x) is concave on the set {xX cont |Θ(x) ≥ p}.

In other words, we establish that for a given instance there exists p such that the corresponding Lagrangian relaxation is concave on the feasible set.

  • h6j is linear in each component of x. h6 is thus satisfied.

  • h7 Condition h7 is satisfied for all sequences of type constk with const > 0.

  • h8 We have

θ ( x , χ ) 1 and Θ ( x ) = P { g ( x , χ ) c } 0 θ ( x , χ ) - Θ ( x ) 1 E [ θ ( x , χ ) - Θ ( x ) ] 2 1

It follows that condition h8 is satisfied for all δ >1.

4.2 Implementation of the SAH algorithm

4.2.1 Convergence of the Stochastic Arrow-Hurwicz Algorithm involving FD-method

A stopping criterion with SAH algorithm cannot be expressed with an observation of a decreasing speed of variation for the objective value. The classical approach is to set a maximum number of iterations. We fixed the number of iterations to 500 for FD-method according to several tests where only slight variations can be observed after around 300 iterations (for numerical results see subsection 5 and Table 2).

Table 1:
Numerical results for the continuous ECKP.

Table 2:
Numerical results for the continuous ECKP.

4.2.2 Enhancements in problem formulation when using the Stochastic Arrow-Hurwicz Algorithm involving IP-method

Some improvements in our implementations worth being notified: During our first numerical tests, we remarked that SAH-algorithm involving the IP-method was inefficient and even did not converge in some cases (see Fig. 2). Then we analyzed and modified the implementation of the IP-method. The first enhancement deals with the formulation of ECKP. The expectation constraint states

E + ( c - g ( x , χ ) ) p p - E + ( c - g ( x , χ ) ) 0

Figure 2:
Initial divergence for the Stochastic Arrow-Hurwicz algorithm solving the continuous ECKP: FD-method (bold curve) versus initial IP-method.

We thus get the Lagrangian

L ( x , λ ) = E i = 1 n r i χ i x i - λ p - E + ( c - g ( x , χ ) )

Using the IP-method, we rewrite this Lagrangian as follows:

L ( x , λ ) = E i = 1 n r i χ i x i - λ p + E [ Y + ( c - g ( x , χ ) ) ( χ κ - μ κ ) x κ σ κ 2 ] (4)

Let us denote l~ the function inside the expectation of the Lagrangian (4), i.e.

l ~ ( x , λ , χ ) = i = 1 n r i χ i x i - λ p + Y + ( c - g ( x , χ ) ) ( χ κ - μ κ ) x κ σ κ 2

It follows

x l ~ ( x , λ , χ ) h = r h χ h + λ + ( c - g ( x , χ ) ) ( χ κ - μ κ ) x κ σ κ 2 χ h + ( c - g ( x , χ ) ) x κ ν h κ

Remark that the term that multiplies λ is zero whenever the capacity constraint is not satisfied. In these cases all the components of the current x k are incremented (as all the components of (r 1 χ 1, ...,r n χ n )T are positive) although at least one component should be decremented in order to better fit the capacity.

The constraint in expectation can be equivalently reformulated as

E + ( g ( x , χ ) - c ) 1 - p E + ( g ( x , χ ) - c ) - ( 1 - p ) 0

In this case the h th component of the gradient of l~ is

x l ~ ( x , λ , χ ) h = r h χ h - λ + ( g ( x , χ ) - c ) ( χ κ - μ κ ) x κ σ κ 2 χ h - ( g ( x , χ ) - c ) x κ ν h κ

Here the term that multiplies λ is zero whenever the capacity constraint is satisfied. In thiscase the components of x k are incremented by the components of the positive vector (r1 χ1k,, rn χnk)T (multiplied by the corresponding factor σ k ). When the capacity constraint is not satisfied the term with coefficient λ is subtracted from this vector in order to correct x k . This term is positive whenever (χκk-μκ) > 0 and the Lagrange multiplier λ is thus playing its assigned role of a penalty factor in case the constraint is violated.

A second improvement can be obtained by a specific choice of the component to integrate by parts: The h th component of the gradient of the Lagrangian function obtained by the IP-method after reformulation:

x l ~ ( x , λ , χ ) h = r h χ h - λ + ( g ( x , χ ) - c ) ( χ κ - μ κ ) x κ σ κ 2 χ h - ( g ( x , χ ) - c ) x κ ν h κ

In case of an overload, we expect this gradient to be negative for some indexes h in order to decrease the total weight of the knapsack. However, for hκ and ℍ+(g(x, χ) - c) = 1, the term that multiplies λ is positive if and only if (χ κ - µ κ ) > 0, which is not always the case. When this event does not occur, the gradient is strictly positive and all components of x with index different from κ are incremented despite the overload. Due to this observation we propose the following improved choice of the index κ: Instead of just choosing κ in the k th iteration such that xκk = ∥x k (see Proposition 1), we choose κ as follows:

κ = arg max i = 1 , , n { x i k | x i k 1 / n and ( χ i k - μ i ) > 0 }

However, if {xik|xik1/n AND (χik-μi)>0}=, we choose κ as before, i.e. such that xκk=||xk||1/n.

With this modification we were able to significantly improve the convergence of SAH-algorithm involving the IP-method (see Fig. 3) and could reduce the maximum number of iterations to 1000 with effective results.

Figure 3:
Results after modifications on IP method: FD-method (gray curve) versus IP-method (bold curve).

5 NUMERICAL RESULTS

All numerical tests have been carried out on an Intel PC with 2GB RAM. SAH-algorithm algorithms as well as the branch-and-bound framework have been implemented in C language.The random numbers needed for the computations of the stochastic gradient algorithm were generated in advance using the gsl C-library and stored in a file.

We tested SAH-algorithm on the instance used in Cohn & Barnhart (19989 COHN A & BARNHART C. 1998. The stochastic knapsack problem with random weights: A heuristic approach to robust transportation planning. In Proceedings of the Triennial Symposium on Transportation Analysis (TRISTAN III).) called hereafter C./B. as well as on 50 randomly generated instances for each dimension. The data given in the tables 2 and 3 are average values over these instances. As noticed at the end of section 2, the choice of normally distributed weights implies that theoretically weights can take negative values. The test instances were generated in such a way that the variance/mean ratio is below 1/4 (see Cohn & Barnhart, 19989 COHN A & BARNHART C. 1998. The stochastic knapsack problem with random weights: A heuristic approach to robust transportation planning. In Proceedings of the Triennial Symposium on Transportation Analysis (TRISTAN III). or Kosuch & Lisser, 201013 KOSUCH S & LISSER A. 2010. Upper bounds for the 0-1 stochastic knapsack problem and a b&b algorithm. Annals of Operations Research, 176(1): 77-93. for details). This implies a very low probability for the realization of negative weights: Although a high number of scenarios were generated (either 500 or 1000 for each run of SAH-algorithm), we got no negative weight realization.

Table 3:
Numerical results for the (combinatorial) ECKP.

In Table 2, the numerical results of SAH-algorithm involving FD- or IP-method are compared with those using a Second Order Cone Programming-solver (SOCP): As mentioned, ECKP can be equivalently reformulated as a chance constrained knapsack problem that, in turn, can be reformulated as a deterministic equivalent SOCP-problem (for details see Boyd et al., 19986 BOYD S, LEBRET H, LOBO MS & VANDENBERGHE L. 1998. Applications of second-order cone programming. Linear Algebra and its Applications, 284: 193-228. or Kosuch & Lisser, 201013 KOSUCH S & LISSER A. 2010. Upper bounds for the 0-1 stochastic knapsack problem and a b&b algorithm. Annals of Operations Research, 176(1): 77-93.). To solve the SOCP-problem we used the SOCP-solver by MOSEK in C-programming language that applies an interior point method (MOSEK, 200914 MOSEK. 1998-2009. The MOSEK optimization tools manual. Version 6.0 (Revision 53). http://www.mosek.com/fileadmin/products/6\_0/tools/doc/pdf/tools.pdf
http://www.mosek.com/fileadmin/products/...
). The gaps given in tables 2 and 3 are the relative gaps between the MOSEK solution and the approximate solution obtained when using the stochastic gradient algorithm.

First of all, we remark that the optimal solution values of SAH-algorithm involving FD-method are comparable to those produced by the IP-method. Some fluctuations occur and can be interpreted in terms of choice of implementation, like the choice of the two σ-sequences for SAH-algorithm. Choosing the right parametrization for these sequences has an important influence on the convergence of the algorithm and the best found solution.

In terms of running time, both methods outperform SOCP-algorithm for small and medium size instances. For higher dimensional problems, this is still true for FD-method. However, SAH-algorithm involving the IP-method needs approximately twice the time than when using FD-method. This is of course due to the total number of iterations that we fixed at 500 when using FD-method, while we need 1000 iterations with the IP-method in order to obtain equally good solutions. For higher dimensional instances Mosek interior point method needs therefore less CPU-time than the IP-method.

6 SOLVING THE (COMBINATORIAL) ECKP - NUMERICAL RESULTS

The combinatorial problem has been solved using a branch-and-bound algorithm as described in Cohn & Barnhart (19989 COHN A & BARNHART C. 1998. The stochastic knapsack problem with random weights: A heuristic approach to robust transportation planning. In Proceedings of the Triennial Symposium on Transportation Analysis (TRISTAN III).) and Kosuch & Lisser (201013 KOSUCH S & LISSER A. 2010. Upper bounds for the 0-1 stochastic knapsack problem and a b&b algorithm. Annals of Operations Research, 176(1): 77-93.). The algorithm has been tested on the instances previously used for the tests of SAH-algorithm.

We stored the random numbers needed for the test runs of SAH in a batch file. As the total number of runs during the branch-and-bound algorithm is unknown and the number of random numbers needed for all those runs is generally very high, we only stored random numbers for a limited number of runs. Before running SAH, we chose randomly one of the instances of random numbers. Remark that, as the runs of SAH are independent, one stored instance of random numbers would theoretically be sufficient.

The results given in Table 3 indicate that the branch-and-bound algorithm that uses SOCP-program to obtain upper bounds, considers more nodes than when using SAH-algorithm. This is not due to a better choice of the upper bounds in the latter method as in both algorithms the upper bounds are supposed to be the same (i.e. the optima of the corresponding relaxed problems). However, as the best solution found by the approximate SAH-algorithm might be slightly smaller than the solution value of the relaxed problem, more branches are pruned than with the primal-dual SOCP-algorithm. Note that this could theoretically also cause the pruning of a subtree that contains the optimal solution.

Similarly, SAH-algorithm involving the IP-method considers an average number of nodesgreater than when using FD-method. This implies that the solutions of the relaxed subproblems produced by FD-method are not as good as those obtained when using IP-method. As both methods perform equally on the relaxed overall problems (see subsection 5), we conclude that FD-method is less robust: Instead of choosing particular σ-sequences for each instance or even each subproblem that has to be solved during the branch-and-bound algorithm, we fixed one parametrization for each dimension. However, the subproblems solved during the branch-and-bound algorithm are mostly lower dimensional problems. Moreover, SAH-algorithm using the fixed σ-sequences is less performant on these subproblems. This seems to be especially the case when using FD-method.

If we only allow an average computing time of 1h, SOCP-algorithm can only be used up to a dimension of 50. On the contrary, when using FD-method and allowing 500 iterations in SAH-algorithm, we are able to solve problems up to a dimension of 250 within an average CPU-time of 1h.

7 CONCLUSION

In this paper, we study and analyse two methods for computing the gradient of the Stochastic Knapsack Problem, where the expectation constraint is considered. More than just considering efficiency of compared methods, it is interesting to analyze what can be concluded when using such methods. In one case, FD-method is very simple, robust, efficient and fast, but presents some disadvantages, such as computing an approximated value of the gradient of the constraint. On the other case, IP-method introduces no approximation in the computation, but is more complex to handle correctly. Nonetheless, this second method opens more opportunities to investigate. First of all, handling an approaching value of the gradient seems not to be a source of disorder here but remains a first order approximation. Secondly, we keep in mind that when using integration by parts, we always have to choose one part to integrate, and one part to derivate in a product, but the way we choose these parts is not fixed. This allows for instance to check conditions of boundaries of Θ~. In the future, we will examine ways of choosing the different parts of the Integration by Parts, and also will extend our approach to other stochastic combinatorial problems.

REFERENCES

  • 1
    ANDRIEU L. 2004. Optimisation sous contrainte en probabilité. Ecole Nationale des Ponts et Chaussées.
  • 2
    ANDRIEU L, COHEN G & VÁZQUEZ-ABAD F. 2007. Stochastic programming with probabilityconstraints. http://fr.arxiv.org/abs/0708.0281 (Accessed 24 October 2008).
    » http://fr.arxiv.org/abs/0708.0281
  • 3
    BEN-TAL A & NEMIROWSKI A. 2008. Selected topics in robust convex optimization. Math. Programming, 112(1): 125-158.
  • 4
    BHALGAT A, GOEL A & KHANNA S. 2011. Improved approximation results for stochastic knapsack problems. In: Proceedings of the Twenty-Second Annual ACM-SIAM Symposium on Discrete Algorithms, page 1647-1665.
  • 5
    BIRGE JR & LOUVEAUX F. 1997. Introduction to stochastic programming. Springer-Verlag, New York.
  • 6
    BOYD S, LEBRET H, LOBO MS & VANDENBERGHE L. 1998. Applications of second-order cone programming. Linear Algebra and its Applications, 284: 193-228.
  • 7
    CARPENTIER 2010. Méthodes numériques en optimisation stochastique. Ecole Nationale Supérieure des Techniques Avancées. http://www.ensta.fr/$\sim$pcarpent/MNOS/ (Accessed march 2010).
    » http://www.ensta.fr/$\sim$pcarpent/MNOS/
  • 8
    CHEN K & ROSS S. 2014. An adaptive stochastic knapsack problem. European Journal of Operations Research, 239(3): 625-635.
  • 9
    COHN A & BARNHART C. 1998. The stochastic knapsack problem with random weights: A heuristic approach to robust transportation planning. In Proceedings of the Triennial Symposium on Transportation Analysis (TRISTAN III).
  • 10
    CULIOLI JC & COHEN G. 1995. Optimisation stochastique sous contraintes en espérance. Comptes rendus de l’Académie des sciences, Paris, Série I, 320(6): 753-758.
  • 11
    GOAL A & INDYK P. 1999. Stochastic load balancing and related problems. In: Proceedings of Foundations of Computer Science, pages 579-586.
  • 12
    KELLERER H, PFERSCHY U & PISINGER D. 2004. Knapsack problems. Springer, Berlin.
  • 13
    KOSUCH S & LISSER A. 2010. Upper bounds for the 0-1 stochastic knapsack problem and a b&b algorithm. Annals of Operations Research, 176(1): 77-93.
  • 14
    MOSEK. 1998-2009. The MOSEK optimization tools manual. Version 6.0 (Revision 53). http://www.mosek.com/fileadmin/products/6\_0/tools/doc/pdf/tools.pdf
    » http://www.mosek.com/fileadmin/products/6\_0/tools/doc/pdf/tools.pdf
  • 15
    PREKOPA A. 1995. Stochastic Programming. Kluwer Academic Publishers (Dordrecht, Boston).
  • 16
    TÖNISSEN D, BOUMAN P, VEN DER AKKER J & HOOGEVEEN J. 2016. Decomposition approaches for recoverable robust optimization problems. European Journal of Operations Research, 251(3): 739-750.
  • 17
    TÖNISSEN D, VEN DER AKKER J & HOOGEVEEN J. 2017. column generation strategies and decomposition appraoches for the two-stage stochastic knapsack problem. Computers and Operations Research, 83: 125-139.
  • 1
    All the methods used and results presented in this article are still valid in the case where the rewards are deterministic or random but independent of the weights. We do not use the fact that the rewards (per item) are normally distributed, therefore any other distribution could be considered.

Publication Dates

  • Publication in this collection
    Sep-Dec 2017

History

  • Received
    15 May 2017
  • Accepted
    01 Nov 2017
Sociedade Brasileira de Pesquisa Operacional Rua Mayrink Veiga, 32 - sala 601 - Centro, 20090-050 Rio de Janeiro RJ - Brasil, Tel.: +55 21 2263-0499, Fax: +55 21 2263-0501 - Rio de Janeiro - RJ - Brazil
E-mail: sobrapo@sobrapo.org.br