Acessibilidade / Reportar erro

METAHEURISTICS EVALUATION: A PROPOSAL FOR A MULTICRITERIA METHODOLOGY* * A preliminary version of this work was presented at Euro XXV, Vilnius, July 2012.

ABSTRACT

In this work we propose a multicriteria evaluation scheme for heuristic algorithms based on the classic Condorcet ranking technique. Weights are associated to the ranking of an algorithm among a set being object of comparison. We used five criteria and a function on the set of natural numbers to create a ranking. The discussed comparison involves three well-known problems of combinatorial optimization - Traveling Salesperson Problem (TSP), Capacitated Vehicle Routing Problem (CVRP) and Quadratic Assignment Problem (QAP). The tested instances came from public libraries. Each algorithm was used with essentially the same structure, the same local search was applied and the initial solutions were similarly built. It is important to note that the work does not make proposals involving algorithms: the results for the three problems are shown only to illustrate the operation of the evaluation technique. Four metaheuristics - GRASP, Tabu Search, ILS and VNS - are therefore only used for the comparisons.

Keywords:
comparison among heuristics; metaheuristics; TSP; CVRP; QAP

1 INTRODUCTION

1.1 Heuristics evaluation in the literature

This work is dedicated to a proposal of a multicriteria evaluation scheme for heuristic algorithms, which we called the Weight Evaluation Method (WOM). It involves an application of the Condorcet ranking technique, presented in Item 1.2. The initial discussion of WOM is the object of Item 1.3. Sections 2 and 3 present, respectively, quick explanations on the three problems and the four metaheuristics used in the tests. The use of the evaluation technique is detailed in Section 4 with the aid of an example. Section 5 presents the results of the comparison among the metaheuristics when used with the three problems. The conclusions are exposed in Section 6.

The use of metaheuristics to find good quality solutions for discrete optimization problems has the double advantage of working with algorithms based on models already known and the efficiency of the methods themselves. This is very important when dealing with problems that have an exponential number of feasible solutions. When there are many techniques available, it is clearly important to evaluate their efficiency with respect to a given problem. A number of direct evaluation techniques, both deterministic and probabilistic, is commonly used, such as in Aiexet al. 22 AIEX RM, RESENDE MGC & RIBEIRO CCC. 2002. Probability distribution of solution time in GRASP: an experimental investigation. Journal of Heuristics, 8: 343-373.), (33 AIEX RM, RESENDE MGC & RIBEIRO CCC. 2005. TTTPLOTS: a PERL program to create time-to-target plots. AT&T. where their observations lead to the hypothesis that the iteration processing times of the heuristics based on local searches, aiming at a result with a determined target, follow an exponential distribution. The use of instance collections, available in the Internet for many problems, appears there and in a number of other works as an efficient way to evaluate metaheuristics and compare their efficiency when dealing with a variety of situations.

Tuning parameters of an algorithm for improved efficiency is also a work that benefits from a system of assessment. Averages of execution times and final solution values, with their standard deviations, are often used. The normal distribution is commonly considered on these occasions, an option which is criticized by Taillard et al. 2323 TAILLARD E, WAELTI P & ZUBER J. 2008. Few statistical tests for proportions comparison. European Journal of Operational Research, 185: 1336-1350. as a hypothesis not always verified: for example, if there are many global optima, the distribution will have a truncated tail, since it is impossible to go beyond the optimum.

There are in the literature several techniques for this purpose: following this reference, the most common are:

  1. When dealing with optimization, a set of problem instances is solved with a couple of methods that should be compared, by calculating mean and standard deviation (possibly also other measures such as median, minimum, maximum, etc.) of the values obtained in a series of algorithm rounds.

  2. In the context of exact problem-solving, the computational effort required to obtain the best solution is measured, and its mean, standard deviation an so on, are calculated.

  3. The maximum computational effort is fixed, as well as a goal to reach, by counting the number of times each method achieves the goal within the computational time allowed.

In practice, often the measures computed by the first and second techniques are very primitive and it is common to calculate only the averages, which are insufficient to assert a statistical advantage of a method of solution in relation to another.

1.2 The Condorcet technique

This work proposes a multicriteria evaluation scheme based on the classic Condorcet ranking technique, 11 ABREU NMM, BOAVENTURA NETTO PO, QUERIDO TM & GOUVÊA EF. 2002. Classes of quadratic assignment problem instances: isomorphism and difficulty measure using a statistical approach. Discrete Applied Mathematics, 124(1-3): 103-116.), (44 BARBUTT CCP. 1990. Automorphismes du permutoèdre et votes de Condorcet. Math. Inform. Sci. Hum., 28E, 111: 73-82.), (1818 MOREIRA AST. 2006. Hybrid GRASP-Tabu algorithms using the structure of Picard-Queyranne matrix for the QAP (in Portuguese). D.Sc. Thesis. Program of Production Engineering, COPPE/UFRJ, Rio de Janeiro, Brazil.. This technique allows us to substitute orders for concept values, which are subsequently submitted to a pairwise comparison. The initial concepts can be either qualitative or quantitative. For instance, a blind test for wine quality evaluation could involve a group of tasters, each one giving a ranking for a set of similar products by considering mouth sensations, bouquet, color and so on. When applied to algorithm evaluation, we could use the algorithms on a given set of instances and consider value rankings for different criteria, such as final value averages, processing times and so on. These results can be presented as a matrix where we will be able to evaluate coherence and inconsistency levels in order to arrive to a decision concerning the quality of the studied options (see Item 4.3).

1.3 The Weight Evaluation Method (WOM)

In this method we begin with a Condorcet-type ranking matrix. We use weights associated to the ranking of the algorithms from a set being object of comparison with respect to instances of a given problem. In this work, we define five evaluation criteria (see Item 4.2 below) and we apply a function on the set of natural numbers to the ranking given by each criterion. The valuation is defined such that the better results are associated to the lesser criterion values.

In this work, we cross three combinatorial optimization problems - the Traveling Salesperson Problem (TSP), the Capacitated Vehicle Routing Problem (CVRP) and the Quadratic Assignment Problem (QAP) - against four different metaheuristics: the Greedy Randomized Adaptive Search Procedure (GRASP), the Iterated Local Search (ILS), the Tabu Search (TS) and the Variable Neighborhood Search (VNS). To do that, we took instance collections of each problem from public libraries and made ten independent runs with each one, using an execution time limit of 600 seconds. We looked for solutions with Optimal or Better-Known Values (OBKV), according to the more recent information available on the Internet.

The algorithms were programmed in C language and ran on a Linux platform. In order to allow for a very basic comparison, each algorithm was used with essentially the same structure, the same local search was used in every case and the initial solutions were similarly built. The only differences are the specific characteristics of each problem: the problem constraints and the objective function calculation. We adopted this option owing to the number of improvements already existing in the literature, since the paper objective is to use the problem-algorithm crossing to show the functioning of the method and not to propose any algorithm improvement.

The use of a sort function facilitates the visualization of orders. The sum of the values obtained with each algorithm for each problem makes easier the comparison between the algorithms and their sensitivities to every problem. It also allows us to evaluate the in-the-whole performance of an algorithm.

2 TEST PROBLEMS USED IN THE STUDY

The problems used in this study and presented below are widely known by the scientific community and often used as benchmarks for the validation of new algorithms, owing to their algorithmic complexity 88 GAREY MR & JOHNSON DS. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness. A Series of Books in the Mathematical Sciences. San Francisco, Calif. Victor Klee, ed.), (2121 REITER EE & JOHNSON CM. 2013. Limits of computation: an introduction to the undecidable and the intractable. CRC Press, Boca Raton..

2.1 Traveling Salesperson Problem (TSP)

In simple terms, the Traveling Salesperson Problem (TSP) can be viewed as a list of cities and their distances in pairs, where the task is to leave the origin and to follow the shortest possible circuit which visits each city exactly once before returning to the origin. It was formulated as a mathematical problem by Karl Menger 1717 MENGER K. 1931. Bericht über ein mathematisches Kolloquium. Monatshefte für Mathematik und Physik, 38: 17-18.. The TSP is one of the most intensively studied combinatorial optimization problems. It is of great importance from a practical as well as a theoretical point of view, given its relationship to other combinatorial optimization problems. It is used as a benchmark for many optimization methods. Even being computationally difficult (NP-hard), a large number of exact methods and heuristics have been applied to it, so that instances with tens of thousands of cities can be solved.

2.2 Capacitated Vehicle Routing Problem (CVRP)

Since the work published by Dantzig & Ramser in 1959 55 DANTZIG GB & RAMSER JH. 1959. The truck dispatching problem. Management Science, 6(1): 80-91. INFORMS., many papers related to the Vehicle Routing Problem (VRP) has been seen in the literature. Some studies show different variants, such as more than one deposit, a time limit of delivery, different types of vehicles, delivery and collection of products, among others. In this work we make use of its classical modeling, which is to meet a set of customers through a fleet of vehicles of the same capacity. Each vehicle comes from a deposit and the sum of the demands associated with each customer cannot exceed the vehicle capacity.

2.3 Quadratic Assignment Problem (QAP)

Consider the problem of allocating pairs of activities to pairs of locations, taking into account the costs of travel distances between locations and some flow units conveniently defined between activities. The Quadratic Assignment Problem (QAP), proposed by Koopmans & Beckmann 1414 KOOPMANS TC & BECKMANN MJ. 1957. Assignment problems and the location of economic activities. Econometrica, 25: 53-76., is the problem of finding a minimum cost allocation of activities to locations where costs are determined by the sum of the products distance-flow.

3 METAHEURISTICS USED IN THE STUDY

The implementations used here for the metaheuristics vary greatly in efficiency. This was deemed appropriate to facilitate the observation of how WOM works.

3.1 Tabu Search

The Tabu Search was introduced by Fred Glover 99 GLOVER F. 1989. Tabu search-Part I. ORSA Journal on Computing, 1: 190-206.), (1010 GLOVER F. 1989. Tabu search-Part II. ORSA Journal on Computing, 2: 4-32. for integer programming problems and more recently perfected by Taillard 2222 TAILLARD E. 1991. Robust taboo search for the quadratic assignment problem. Parallel Computing, 17: 443-455.. This metaheuristic is based on the establishment of restrictions that effectively guide a heuristic search in exploring the solution space, trying to avoid that the search returns to previously visited solutions. These restrictions work in different ways, such as excluding the search of certain alternatives, classifying them as temporarily banned or taboos, or modifying ratings and selection probabilities, designating them as aspiration criteria.

3.2 GRASP

GRASP - Greedy Randomized Adaptive Search Procedure - proposed by Feo & Resende 77 FEO TA & RESENDE MGC. 1995. Greedy randomized adaptive search procedures. Journal of Global Optimization, 6: 109-133. - can be seen as a metaheuristic which uses the good characteristics of the purely random algorithms and the purely greedy processes in the construction phase. It is a multistart iterative process in which each iteration consists of two phases: the construction phase, where a feasible solution is constructed, and the local search phase, where a local optimum is found in the vicinity of the initial solution and, if necessary, the update of the best solution found so far is made.

3.3 ILS

ILS - Iterated local search - proposed by Lourenço et al. 1515 LOURENÇO HR, MARTIN OC & STÜTZLE T. 2003. Iterated local search. Glover F & Kochenberger GA (editors), Handbook of Metaheuristics, Chapter 11, p. 321-353. Kluwer Academic Publishers., is a simple method that iteratively applies local search to disturb the site of the current search, leading to a random walk in the space of local optima. To apply an ILS algorithm, four procedures must be specified: (a) generation of initial solution, (b) disturbance, which generates new starting points for local search, (c) the acceptance criterion that decides from which solution the search will be continued, (d) the local search procedure is defined as the search space.

3.4 VNS

VNS - Variable Neighborhood Search - was proposed by Hansen & Mladenovic´ 1111 HANSEN P & MLADENOVIC´ N. 1997. Variable neighborhood search. Computers and Operations Research, 24: 1097-1100.), (1212 HANSEN P & MLADENOVIC´ N. 2001. Developments of variable neighborhood search. Les Cahiers du GERAD, G-2001-24.. It is based on a systematic neighborhood exchange associated with a random algorithm to determine starting points of local search. The basic VNS scheme is very simple and easy to implement. Unlike other metaheuristics based on local search methods, VNS does not follow a trajectory but explores incrementally more or less distant neighborhoods of the current solution, ranging from the current solution to the new, if and only if an improvement occurs. According to the authors, the advantage of using various neighborhoods is that the local optimum in relation to a neighborhood is not necessarily the same from others: thus, the search should continue in a way downward (or upward) until the solution current is a minimum (or maximum) location of all structures of the pre-selected neighborhoods.

4 DETAILS ON CONDORCET AND WOM TECHNIQUES

In this section we present in more detail the proposed performance criteria, the application of the Condorcet method and the use of its results to generate the indicators associated with WOM.

4.1 Performance criteria for the proposed versions

Following the already cited concern of Taillard et al. 2323 TAILLARD E, WAELTI P & ZUBER J. 2008. Few statistical tests for proportions comparison. European Journal of Operational Research, 185: 1336-1350. about the dependency of heuristic efficiency on instance type, we built a multicriteria evaluation with five comparison criteria. The criteria definitions below are such that lower values represent better results.

  1. Number of not-OBKV solutions obtained (nopt)

  2. Average relative value distance (avd)

This is the average of the values (obtained value - OBKV)/OBKV, for those tests not reaching the OBKV, expressed in percentages.

  1. Quality index (qual)

This is a tailor-made function used to express performance, whether the algorithm reaches the OBKV or not. We used:

where nSolOtm is the number of solutions with the OBKV value, nSolNotOtm is the number of solutions with worse values and AvgError is the average error percentage of those instances where the OBKV was not reached. The last term is disregarded whether no OBKV value is found.

As an example, let us consider that in 10 executions of an instance, 8 produced the OBKV and 2 presented an average error of 1.3%. The index value is then I=0.125+(2×1.3)=2.725. If the error of those two instances were 21.6%, we would have I=0.125+(2×21.6)=43.325. We can see that the index is sensitive to the presence of bad solutions and that its value decreases when the number of OBKV solutions grows.

  1. Average execution time (exec)

This includes only the instances where OBKV was obtained before the maximum execution time (600 seconds).

  1. Average stagnation time complement (stag)

This shows the average difference between the maximum execution time of 600 seconds and the time associated with the last improvement in the solution value before the algorithm stops by maximum time criterion. Whenever the algorithm gets the OBKV, this value is nullified. This criterion can be associated with an algorithm capacity to avoid sticking at local optima.

4.2 Some details on the Condorcet technique

As discussed in Item 1.2, the Condorcet technique is based on a pairwise evaluation, seeking to order pairs of results, looking for obtaining a measure for performance differences, as detailed in what follows.

Let W be a set of w objects o1, o2, ..., ow and let us consider their w! permutations. Each permutation pk(1 ≤ kw!) induces a classification Ok, that is, an order relation in which the object oi is said to be better or easier than oj according to the order Ok, if oi precedes oj in Ok(oi < oj). The pair (oi, oj) is said to show a discrepancy between Op and Oq if and only if oi < oj in the order Op and oi < oj in the order Oq. The distance between Op and Oq, dist(Op, Oq), is defined as the number of discrepant pairs among the Cw, 2 possible ones. We can then define the relative error of Op with respect to Oq as:

This technique can be used to compare the performance of a pair of algorithms with respect toa given instance. Let then |W|=w the number of algorithms that will be compared with a total of z evaluation criteria, whose values refer to a given instance, each criterion generating a possibly different order. A criterion-algorithm table is obtained for each instance, where each position contains the algorithm number and its corresponding criterion value. In this table, the algorithm number in each entry corresponds to that of the corresponding column.

We exemplify the method with the QAP instance Tai25a, used among other QAP instances for testing five VNS variations, (16) (Table 1).

Table 1
Instance Tai25a - The matrix with the values obtained by the algorithms.

After this, we order each line by nondecreasing order of the corresponding criterion value. The algorithm identifiers are carried on along the ordering. After the execution, the initial matrixstays as in Table 2.

Table 2
Instance Tai25a - Criteria values in nondecreasing order.

In the next step, we examine each value pair along each line, considering the algorithms which produced the corresponding results. We represent the comparison result by a matrix where each column corresponds to a pair of algorithms and each entry value is k ∈{-1,0,+1}. The value choice for k is given by (+1;>); (-1,<); (0,=) (e.g., Criterion a gives the second position to Algorithm 2 (value 9.0000) and the fourth one to Algorithm 4 (value 10.0000), hence (a,[2,4])=-1, while Criterion d gives the first position to Algorithm 1 and the second one to Algorithm 4 (both with value 0.0000); hence, (d,[1,4])=0).

Table 3
Instance Tai25a - The value pair comparison matrix.

The next step is the (Condorcet) distance determination. An expanded matrix is built where the lines correspond to criteria pairs and columns to algorithm pairs. Here, we say there is a discrepancy (expressed by a unity in Table 4), when the entries corresponding to a criteria pair in Table 3 have opposite signs. The remaining entries of Table 4 are null. The line sums express the distances between the orders given by the algorithms, while the column sums correspond to the distances between the orders given by the criteria (e.g., (a,[1,2])=1 and (b,[1,2])=-1, then, in Table 4, ([a,b],[1.2])=1).

The last row and column of Table 4 are used for indicator evaluation. If all pairs have very high disagreements, for example, over 75%, a questioning about their validity will be convenient.

For this example of Tai25a instance, only the criteria pair [a,d] shows a higher disagreement (70%). The other pairs have better consistency, which indicates this criteria set as having good evaluation capacity for the algorithms applied to this instance. We can also look at the columns sum. It is interesting to observe that [1,5] column indicates no discrepancy, which is the same to say that Algorithms 1 and 5 are equivalent, according to all criteria utilized.

The Condorcet method proceeds by calculating the relative errors to be included in Eqn. 4.2 and preparing comparison tables based on those results. The number of comparisons will grow to O(w2) for each instance. The final evaluation would be done by inspection, since it becomes difficult to establish logical criteria which could be used for computational evaluation. Since the number of alternatives may be large, according to the value of w, we consider the Condorcet technique as becoming impractical.

Table 4
Instance Tai25a - Comparison between pairs (by algorithms and by criteria).

4.3 The Weight Ordering Method - WOM

The situation we have just described calls for some evaluation improvement. It led us to propose a Condorcet-like technique where the comparison can be easily made by calculation, the Weight Ordering Method (WOM). Here we have the advantage of automatically translate the results of the comparisons into numeric values. We do it with the aid of a function designed to be injective for the considered value set: then, we can be sure it will condense in numbers the information provided by Table 4 above.

From the ordered array of the Condorcet method (Table 2), we look for equal-valued elements.If they exist, we proceed to a rearrangement to condense these values in a single entry. Otherwise we proceed with Table 2 without changing. Anyway, we obtain Table 5, where equal values in the various entries were condensed in a unique position (e.g., Algorithms 1, 4 and 5, with Criteria a and d). For the instance Tai25a, Table 5 will be:

Table 5
WOM method - rearrangement of equal values for Tai25a.

We do not consider the empty entries in the ordering (e.g., Line (d): Algorithm 2 in column 4 will be second in order, not fourth; Algorithm 3 will be third, not fifth).

With these data we are able to create an Oij - type matrix, similar to that of Condorcet method. (Table 6), where each entry (i,j) contains the number of times Algorithm i appears in order j, for the whole criteria set applied to a given instance. (e.g., Table 5 shows that Algorithm 1 obtained one first position (with Criterion d), three third positions (criteria a, b, c) and onefourth position (criterion e)).

Table 6
WOM method - ordering matrix for Tai25a.

To quantify the performance of each algorithm we use this matrix to associate with a weight function over the obtained set of orders, where a first-rated algorithm receives a greater value than the second-rated one and so on. The suggested weight function (4.3) for a given algorithm considers the number w of algorithms, the order of the algorithm i for a given instance, an exponent basis k and the matrix O = [Oij] of the instance, as follows:

This function becomes injective for k sufficiently high. One has to test some values, given the arrangement set obtained. For the example, we found that k=13/3 guarantees the injective property. Here, a higher value corresponds to a better performance.

It is crucial to observe that we are already working with an ordered set: since the function values reflect the ordering of the multicriteria evaluation for each algorithm, they correspond to the pairwise ordering used by Condorcet method, condensing its results into numeric values which indicate the algorithm performance order according to the proposed criteria.

Table 7 is Table 6 with a new column showing WtFunc values. We can see that the best global performance was that of Algorithm 3 (279) and the worst, that of Algorithm 2 (59).

Table 7
WOM method - final algorithm ordering for Tai25a.

It is important to mention that these results are consistent only within a given situation, sincethe orderings obtained in two different situations may not be consistent with one another and it may not be significant to add up their respective ratings.

5 COMPUTATIONAL RESOURCES AND RESULTS

For each problem, we used about 100 test instances, taken from their respective websites, 2424 TSPLIB HOME PAGE. 2012. http://comopt.ifi.uni-heidelberg.de/software/TSPLIB95/ Accessed on: 12/10/2012.
http://comopt.ifi.uni-heidelberg.de/soft...
for TSP and CVRP, and 1919 QAPLIB HOME PAGE. 2012. http://www.seas.upenn.edu/qaplib/ Accessed on: 12/10/12.
http://www.seas.upenn.edu/qaplib/...
for QAP.

All algorithms departed with randomly generated initial solutions. We performed a set of tenexecutions for each instance, each one initialized with a new seed in order to ensure independence. The seeds were randomly selected from the list of prime numbers between 1 and 2,000,000, 66 ESTANY CP. 2010. Prime numbers. Available in: Available in: http://pinux.info/primos/ , Accessed April 2010.
http://pinux.info/primos/...
. The tests were run on a computer with an Intel Core 2 Quad 2.4 GHz with 4 GB of RAM, under the Linux operating system, openSUSE distribution.

Table 8 contains the list of instances from the three problems, with the corresponding sizes.

Table 8
Tested instances for QAP, TSP and CVRP.

Table 9 shows the values of the weight function associated with the four algorithms, working on the problems used in the test. We can see that GRASP was the better technique both on TSP and on CVRP, while VNS worked more efficiently on QAP.

Table 9
Comparison among the four metaheuristics using WOM.

It may be noted that no algorithm was better than the others for the three problems. Although behavior differences should be expected between an algorithm-problem pair and another one, the results have also been influenced by our use of basic versions, which detailed descriptions can easily be found in the literature.

A comparison test for WOM was designed with the use of boxplots 2020 R CORE TEAM. 2015. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/.
http://www.R-project.org/...
. The boxplot description followed the pattern used in Table 6, that is, for each problem we built a boxplot set for each criterion, involving all four algorithms. The graphics is shown in Appendix 1 APPENDIX 1: BOXPLOT ANALYSIS Here we present the boxplot set for each problem, each graphic box corresponding to a criterion, where the plots correspond to the four algorithms, GRASP, ILS, TS and VNS, respectively. In order to have a better painting for avd and qual, we reconfigured the values on a percentual basis, by using the maximum obtained value as a standard. The new avd and qual values are calculated as follows, newavd=100*(avd-OBKV)/max(avd) and newqual=100*qual/max(qual). The stagnation time stag was also put on a percentual basis. A discussion follows each set. We begin with the QAP boxplots (Fig. A1-1): Figure A1-1 Boxplot set for QAP. For VNS, the number of not-OBKV solutions (nopt) covered the whole set of eleven possible values (from zero to 10). It seems then to be strongly instance-dependant, but all results are within the interquartile (IQ) zone. ILS ranks as second, TS as third and GRASP as fourth, but all with high median values. The value average (avd) gave the lesser values for VNS among the four algorithms, ILS being second, GRASP third and TS fourth (only because of its outliers). The quality index (qual) had no difference with respect to avd. VNS execution time (exec) has a behavior similar to nopt for the IQ zone, but the median value is reasonably low (while the other algorithms have it high). ILS ranks second, TS third, GRASP fourth. The stagnation time (stag) has the lesser median for VNS. TS presented the higher stagnation times and the higher median. GRASP was second and ILS, third. On the other hand, GRASP had the lesser value spread, followed by ILS, VNS, then TS. We can say the boxplot comparison matches WOM results, VNS being easily the first, TS and ILS having near results and GRASP certainly worse. The TSP boxplots are in Figure A1-2 below. Figure A1-2 Boxplot set for TSP. The criteria nopt and exec were not effective: since the TSP instances have real values, the algorithms spent all the allowed execution time of 600 seconds, within the ten executions for instance, trying to obtain better solutions within an interval of 1% fixed around the originally OBKV value given by the site, associated to the problem. We can observe that GRASP produced low avd and qual values. This behavior allows us to understand its stag behavior as a strong search for better values, most of them falling in the immediate neighborhood of the 1% region around OBKV. Since GRASP is a multistart method, along this process it would have less chance of sticking to local optima. The same analysis, applied to the other three algorithms, points to less precision. We have to remember that, by the definition of qual, it approaches avd when the number of successful trial goes to zero. Then the painting of the two criteria, here, is very similar and indicates that the stagnation time was consumed with worse solutions than those found by GRASP. The early stagnation also should mean the influence of local optima. Considering this last point, VNS is the most susceptible and it presents also the higher values for avd and qual, showing the worst performance in this test. GRASP is evidently the most efficient and to decide between TS and ILS to be second and third it is convenient to consider the somewhat lesser avd and qual values of TS. It should then rank second and ILS third. This result is the same obtained by the WOM technique (Table 7). The CVRP boxplots are in Figure A1-3 below. Figure A1-3 Boxplot set for CVRP. The analysis is somewhat similar of that made with TSP results. There are nevertheless some interesting differences. CVRP is a more difficult problem than TSP. This difficulty reflects itself in the differences of avd and qual in this case: we can observe that the very sensible qualindicates the presence of greater distances related to the OBKV as final results. This is generally true, with the four algorithms. By looking at the avd boxplot, GRASP could be considered the better technique, also in this case: but its qual values show that its output is somewhat unstable. Then the interpretation of its high stag values - apparently similar to that of TSP - becomes less reliable. The avd values for the other three algorithms are comparable, but when looking at the qual boxplot we observe an advantage of TS over ILS and VNS. The stag values for ILS and TS are comparable, while VNS shows lesser values. This early stagnation seems, according to qual, to arrive at local optima. It becomes difficult to classify GRASP in this case. ILS and TS are certainly in a middleposition, and very close, while VNS should rank fourth. Here, the result is quite different from that shown by WOM (Table 7), which ranks VNS, TS, GRASP and ILS. The fact of the GRASP to be a multistart algorithm, can also justify their higher number of outliers that appeared in all algorithm, if it is compared to others. Sometimes, this characteristic can be interesting during the local search, sometimes not. .

6 CONCLUSIONS

The WOM technique allows us to choose the level of detail in an algorithm performance study. For example, we can check performances by using an isolated instance or a set of instance classes, as in 1616 MELO VA. 2010. QAP: Investigations on the VNS metaheuristic and on the use of the QAP variance on graph isomorphism problems (in Portuguese). D.Sc. Thesis. Program of Production Engineering, COPPE/UFRJ, Rio de Janeiro, Brasil.. Comparison between different versions of the same algorithm can be made much more easily than by using the Condorcet method (whose output file increases with the square of the number of elements and is designed to give results by inspection), since the WOM gathers the evaluation results on a single parameter. It is also easily adaptable to an insertion, a replacement or a removal of a criterion or algorithm under study, allowing for faster scanning and analysis of their results.

Based on the Condorcet method, WOM shows very clearly both algorithm strengths and weaknesses and also allows for an overall comparison in terms of performance ordering. We believe, even with this small example, that we can show its efficiency to make comparisons and sorting techniques by performance in the midst of a much larger number of alternatives.

We think WOM can be very useful in algorithm development, when a researcher has to deal with a number of different, but similar, algorithm versions, or with several sets of different parameter values for a given algorithm. As for the Condorcet method, the proposed criteria set can be changed or modified according to the research objective.

A comparison with the boxplot analysis (Appendix 1 APPENDIX 1: BOXPLOT ANALYSIS Here we present the boxplot set for each problem, each graphic box corresponding to a criterion, where the plots correspond to the four algorithms, GRASP, ILS, TS and VNS, respectively. In order to have a better painting for avd and qual, we reconfigured the values on a percentual basis, by using the maximum obtained value as a standard. The new avd and qual values are calculated as follows, newavd=100*(avd-OBKV)/max(avd) and newqual=100*qual/max(qual). The stagnation time stag was also put on a percentual basis. A discussion follows each set. We begin with the QAP boxplots (Fig. A1-1): Figure A1-1 Boxplot set for QAP. For VNS, the number of not-OBKV solutions (nopt) covered the whole set of eleven possible values (from zero to 10). It seems then to be strongly instance-dependant, but all results are within the interquartile (IQ) zone. ILS ranks as second, TS as third and GRASP as fourth, but all with high median values. The value average (avd) gave the lesser values for VNS among the four algorithms, ILS being second, GRASP third and TS fourth (only because of its outliers). The quality index (qual) had no difference with respect to avd. VNS execution time (exec) has a behavior similar to nopt for the IQ zone, but the median value is reasonably low (while the other algorithms have it high). ILS ranks second, TS third, GRASP fourth. The stagnation time (stag) has the lesser median for VNS. TS presented the higher stagnation times and the higher median. GRASP was second and ILS, third. On the other hand, GRASP had the lesser value spread, followed by ILS, VNS, then TS. We can say the boxplot comparison matches WOM results, VNS being easily the first, TS and ILS having near results and GRASP certainly worse. The TSP boxplots are in Figure A1-2 below. Figure A1-2 Boxplot set for TSP. The criteria nopt and exec were not effective: since the TSP instances have real values, the algorithms spent all the allowed execution time of 600 seconds, within the ten executions for instance, trying to obtain better solutions within an interval of 1% fixed around the originally OBKV value given by the site, associated to the problem. We can observe that GRASP produced low avd and qual values. This behavior allows us to understand its stag behavior as a strong search for better values, most of them falling in the immediate neighborhood of the 1% region around OBKV. Since GRASP is a multistart method, along this process it would have less chance of sticking to local optima. The same analysis, applied to the other three algorithms, points to less precision. We have to remember that, by the definition of qual, it approaches avd when the number of successful trial goes to zero. Then the painting of the two criteria, here, is very similar and indicates that the stagnation time was consumed with worse solutions than those found by GRASP. The early stagnation also should mean the influence of local optima. Considering this last point, VNS is the most susceptible and it presents also the higher values for avd and qual, showing the worst performance in this test. GRASP is evidently the most efficient and to decide between TS and ILS to be second and third it is convenient to consider the somewhat lesser avd and qual values of TS. It should then rank second and ILS third. This result is the same obtained by the WOM technique (Table 7). The CVRP boxplots are in Figure A1-3 below. Figure A1-3 Boxplot set for CVRP. The analysis is somewhat similar of that made with TSP results. There are nevertheless some interesting differences. CVRP is a more difficult problem than TSP. This difficulty reflects itself in the differences of avd and qual in this case: we can observe that the very sensible qualindicates the presence of greater distances related to the OBKV as final results. This is generally true, with the four algorithms. By looking at the avd boxplot, GRASP could be considered the better technique, also in this case: but its qual values show that its output is somewhat unstable. Then the interpretation of its high stag values - apparently similar to that of TSP - becomes less reliable. The avd values for the other three algorithms are comparable, but when looking at the qual boxplot we observe an advantage of TS over ILS and VNS. The stag values for ILS and TS are comparable, while VNS shows lesser values. This early stagnation seems, according to qual, to arrive at local optima. It becomes difficult to classify GRASP in this case. ILS and TS are certainly in a middleposition, and very close, while VNS should rank fourth. Here, the result is quite different from that shown by WOM (Table 7), which ranks VNS, TS, GRASP and ILS. The fact of the GRASP to be a multistart algorithm, can also justify their higher number of outliers that appeared in all algorithm, if it is compared to others. Sometimes, this characteristic can be interesting during the local search, sometimes not. ) shows most of its results comparable with those of WOM, CVRP being the less precise, TSP matching well and QAP fairly good.

REFERENCES

  • 1
    ABREU NMM, BOAVENTURA NETTO PO, QUERIDO TM & GOUVÊA EF. 2002. Classes of quadratic assignment problem instances: isomorphism and difficulty measure using a statistical approach. Discrete Applied Mathematics, 124(1-3): 103-116.
  • 2
    AIEX RM, RESENDE MGC & RIBEIRO CCC. 2002. Probability distribution of solution time in GRASP: an experimental investigation. Journal of Heuristics, 8: 343-373.
  • 3
    AIEX RM, RESENDE MGC & RIBEIRO CCC. 2005. TTTPLOTS: a PERL program to create time-to-target plots. AT&T.
  • 4
    BARBUTT CCP. 1990. Automorphismes du permutoèdre et votes de Condorcet. Math. Inform. Sci. Hum., 28E, 111: 73-82.
  • 5
    DANTZIG GB & RAMSER JH. 1959. The truck dispatching problem. Management Science, 6(1): 80-91. INFORMS.
  • 6
    ESTANY CP. 2010. Prime numbers. Available in: Available in: http://pinux.info/primos/ , Accessed April 2010.
    » http://pinux.info/primos/
  • 7
    FEO TA & RESENDE MGC. 1995. Greedy randomized adaptive search procedures. Journal of Global Optimization, 6: 109-133.
  • 8
    GAREY MR & JOHNSON DS. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness. A Series of Books in the Mathematical Sciences. San Francisco, Calif. Victor Klee, ed.
  • 9
    GLOVER F. 1989. Tabu search-Part I. ORSA Journal on Computing, 1: 190-206.
  • 10
    GLOVER F. 1989. Tabu search-Part II. ORSA Journal on Computing, 2: 4-32.
  • 11
    HANSEN P & MLADENOVIC´ N. 1997. Variable neighborhood search. Computers and Operations Research, 24: 1097-1100.
  • 12
    HANSEN P & MLADENOVIC´ N. 2001. Developments of variable neighborhood search. Les Cahiers du GERAD, G-2001-24.
  • 13
    HOLLAND JH. 1975. Adaptation in natural and artificial systems. University of Michigan Press, Ann Arbor.
  • 14
    KOOPMANS TC & BECKMANN MJ. 1957. Assignment problems and the location of economic activities. Econometrica, 25: 53-76.
  • 15
    LOURENÇO HR, MARTIN OC & STÜTZLE T. 2003. Iterated local search. Glover F & Kochenberger GA (editors), Handbook of Metaheuristics, Chapter 11, p. 321-353. Kluwer Academic Publishers.
  • 16
    MELO VA. 2010. QAP: Investigations on the VNS metaheuristic and on the use of the QAP variance on graph isomorphism problems (in Portuguese). D.Sc. Thesis. Program of Production Engineering, COPPE/UFRJ, Rio de Janeiro, Brasil.
  • 17
    MENGER K. 1931. Bericht über ein mathematisches Kolloquium. Monatshefte für Mathematik und Physik, 38: 17-18.
  • 18
    MOREIRA AST. 2006. Hybrid GRASP-Tabu algorithms using the structure of Picard-Queyranne matrix for the QAP (in Portuguese). D.Sc. Thesis. Program of Production Engineering, COPPE/UFRJ, Rio de Janeiro, Brazil.
  • 19
    QAPLIB HOME PAGE. 2012. http://www.seas.upenn.edu/qaplib/ Accessed on: 12/10/12.
    » http://www.seas.upenn.edu/qaplib/
  • 20
    R CORE TEAM. 2015. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/
    » http://www.R-project.org/
  • 21
    REITER EE & JOHNSON CM. 2013. Limits of computation: an introduction to the undecidable and the intractable. CRC Press, Boca Raton.
  • 22
    TAILLARD E. 1991. Robust taboo search for the quadratic assignment problem. Parallel Computing, 17: 443-455.
  • 23
    TAILLARD E, WAELTI P & ZUBER J. 2008. Few statistical tests for proportions comparison. European Journal of Operational Research, 185: 1336-1350.
  • 24
    TSPLIB HOME PAGE. 2012. http://comopt.ifi.uni-heidelberg.de/software/TSPLIB95/ Accessed on: 12/10/2012.
    » http://comopt.ifi.uni-heidelberg.de/software/TSPLIB95/
  • *
    A preliminary version of this work was presented at Euro XXV, Vilnius, July 2012.

APPENDIX 1: BOXPLOT ANALYSIS

Here we present the boxplot set for each problem, each graphic box corresponding to a criterion, where the plots correspond to the four algorithms, GRASP, ILS, TS and VNS, respectively.

In order to have a better painting for avd and qual, we reconfigured the values on a percentual basis, by using the maximum obtained value as a standard. The new avd and qual values are calculated as follows,

newavd=100*(avd-OBKV)/max(avd) and newqual=100*qual/max(qual).

The stagnation time stag was also put on a percentual basis.

A discussion follows each set. We begin with the QAP boxplots (Fig. A1-1):

Figure A1-1
Boxplot set for QAP.

For VNS, the number of not-OBKV solutions (nopt) covered the whole set of eleven possible values (from zero to 10). It seems then to be strongly instance-dependant, but all results are within the interquartile (IQ) zone. ILS ranks as second, TS as third and GRASP as fourth, but all with high median values.

The value average (avd) gave the lesser values for VNS among the four algorithms, ILS being second, GRASP third and TS fourth (only because of its outliers).

The quality index (qual) had no difference with respect to avd.

VNS execution time (exec) has a behavior similar to nopt for the IQ zone, but the median value is reasonably low (while the other algorithms have it high). ILS ranks second, TS third, GRASP fourth.

The stagnation time (stag) has the lesser median for VNS. TS presented the higher stagnation times and the higher median. GRASP was second and ILS, third. On the other hand, GRASP had the lesser value spread, followed by ILS, VNS, then TS.

We can say the boxplot comparison matches WOM results, VNS being easily the first, TS and ILS having near results and GRASP certainly worse.

The TSP boxplots are in Figure A1-2 below.

Figure A1-2
Boxplot set for TSP.

The criteria nopt and exec were not effective: since the TSP instances have real values, the algorithms spent all the allowed execution time of 600 seconds, within the ten executions for instance, trying to obtain better solutions within an interval of 1% fixed around the originally OBKV value given by the site, associated to the problem.

We can observe that GRASP produced low avd and qual values. This behavior allows us to understand its stag behavior as a strong search for better values, most of them falling in the immediate neighborhood of the 1% region around OBKV. Since GRASP is a multistart method, along this process it would have less chance of sticking to local optima.

The same analysis, applied to the other three algorithms, points to less precision. We have to remember that, by the definition of qual, it approaches avd when the number of successful trial goes to zero. Then the painting of the two criteria, here, is very similar and indicates that the stagnation time was consumed with worse solutions than those found by GRASP. The early stagnation also should mean the influence of local optima.

Considering this last point, VNS is the most susceptible and it presents also the higher values for avd and qual, showing the worst performance in this test. GRASP is evidently the most efficient and to decide between TS and ILS to be second and third it is convenient to consider the somewhat lesser avd and qual values of TS. It should then rank second and ILS third.

This result is the same obtained by the WOM technique (Table 7).

The CVRP boxplots are in Figure A1-3 below.

Figure A1-3
Boxplot set for CVRP.

The analysis is somewhat similar of that made with TSP results. There are nevertheless some interesting differences. CVRP is a more difficult problem than TSP. This difficulty reflects itself in the differences of avd and qual in this case: we can observe that the very sensible qualindicates the presence of greater distances related to the OBKV as final results. This is generally true, with the four algorithms.

By looking at the avd boxplot, GRASP could be considered the better technique, also in this case: but its qual values show that its output is somewhat unstable. Then the interpretation of its high stag values - apparently similar to that of TSP - becomes less reliable.

The avd values for the other three algorithms are comparable, but when looking at the qual boxplot we observe an advantage of TS over ILS and VNS.

The stag values for ILS and TS are comparable, while VNS shows lesser values. This early stagnation seems, according to qual, to arrive at local optima.

It becomes difficult to classify GRASP in this case. ILS and TS are certainly in a middleposition, and very close, while VNS should rank fourth. Here, the result is quite different from that shown by WOM (Table 7), which ranks VNS, TS, GRASP and ILS.

The fact of the GRASP to be a multistart algorithm, can also justify their higher number of outliers that appeared in all algorithm, if it is compared to others. Sometimes, this characteristic can be interesting during the local search, sometimes not.

Publication Dates

  • Publication in this collection
    Sep-Dec 2015

History

  • Received
    16 July 2014
  • Accepted
    03 Sept 2015
Sociedade Brasileira de Pesquisa Operacional Rua Mayrink Veiga, 32 - sala 601 - Centro, 20090-050 Rio de Janeiro RJ - Brasil, Tel.: +55 21 2263-0499, Fax: +55 21 2263-0501 - Rio de Janeiro - RJ - Brazil
E-mail: sobrapo@sobrapo.org.br