QUALITY ANALYSIS FOR THE VRP SOLUTIONS USING COMPUTER VISION TECHNIQUES

The Vehicle Routing Problem (VRP) is a classical problem, and when the number of customers is very large, the task of finding the optimal solution can be extremely complex. It is still necessary to find an effective way to evaluate the quality of solutions when there is no known optimal solution. This work presents a suggestion to analyze the quality of vehicle routes, based only on their geometric properties. The proposed descriptors aim to be invariants in relation to the amount of customers, vehicles and the size of the covered area. Applying the methodology proposed in this work it is possible to obtain the route and, then, to evaluate the quality of solutions obtained using computer vision. Despite considering problems with different configurations for the number of customers, vehicles and service area, the results obtained with the experiments show that the proposal is useful for classifying the routes into good or bad classes. A visual analysis was performed using the Parallel Coordinates and Viz3D techniques and then a classification was performed by a Backpropagation Neural Network, which indicated an accuracy rate of 99.87%.


INTRODUCTION
The vehicle routing problem (VRP) has important applications in many areas with different characteristics.The best known problem in this class is the Traveling Salesman Problem, an NP-hard problem, presented in 1934 by the mathematician Karl Menger (Garey & Johnson, 1979 [15]), which consists in finding the sequence of cities to be visited by a traveling salesman, so that all cities must be visited exactly once and the total distance traveled must be minimized.Since then, new problems and formulations have been proposed, with new necessities, including capabilities of vehicles, hours of operation, the maximum length of routes (time or distance), the size and composition of the fleet, the vehicle types that can meet certain customers, the precedence among customers, etc. Bodin & Golden (1981) [5], Christofides (1985) [10], Assad (1988) [3] and Ronen (1988) [24] and more recently by Eksioglu et al. (2009) [13] presented taxonomies, which have a more complete classification.Laporte (2009) [20], Kumar & Pannerselvam (2012) [19] and Jaegere et al. (2014) [18] provide very current surveys of the area.More recently, Miranda-Bront et al. (2015) [22] studied the swap body vehicle routing problem (SB-VRP) that is a generalization of the classical VRP.
Due to computational complexity of the issue, many methods proposed to solve this problem can be found in the scientific literature (Chaoji et al., 2008 [7]).According the taxonomy of Eksioglu et al. (2009) [13], it can be considered as cost, the travel time, the total distance traveled, the number of vehicles used, or the delay.
The VRP is an NP-hard problem, thereby if the number of costumers is too large, it is very difficult to find the best solution, either by heuristic methods as exact methods.The exact methods has a computational cost usually very high.In heuristics methods, it is necessary to use a method to guarantee that the found solution will be satisfactory, that is, optimal or nearly optimal.So, it is still necessary to find an effective way to evaluate the quality of solutions when there is no known optimal solution to the problem.This work presents a proposal to analyze the quality of vehicle routes, based only on their geometric shapes analysis.The other sections of this paper are organized as follows: in Section 2 are presented some works that treat the expected forms for routes in routing problems.This section shows the main shape descriptors found in the literature and evaluated in this work to analyze the routes.In this section also are presented the techniques of information visualization, used in visual analysis of data, and a technique for attributes selection, based on analysis of variance; Section 3 presents the methodology, proposed in this work, to evaluate the quality of solutions for the vehicle routing problem, based on the geometric analysis of the route shapes; In Section 4 are presented some results using the instances type "A" from Augerat et al. (1998) [4]; Finally, Section 5 presents the conclusions and future works.

RELATED WORKS
An early attempt to exploit the shape form of the routes was carried out by Gillet & Miller (1974) [16], with a heuristic procedure called Sweep Method, suggesting that, in the case of using multiple vehicles, the optimal solutions tend to have a configuration based on flower petals.Another related work, presented by Foster & Ryan (1976) [14], showed that, from the set of all possible routes in shaped petals, an optimal solution in the shape of petal can be easily found.A more significant improvement was presented by Renaud et al. (1996) [23], which showed an improved heuristic algorithm, capable of obtaining solutions almost as good as those produced by Tabu search adaptation, but with a lower computational cost and able to generate solutions whose values lie on average within 2.38% of the best known solutions.More recently, Dondo & Cerdá (2013) [11] present a sweep-heuristic based formulation for the VRP with cross-docking.Also in the literature, it is stand out the works of Ryan et al. (1993) [26], Vidal et al. (2014) [29] and Laporte et al. (2014) [21].
Accordingly the shape of petals seems to be one that has a good solution.Figure 1 (a) shows the routes obtained for instance A-n33-k5, where it is observed a petal-shaped structure.In (b) shows the shapes of the five routes for this solution.

Shape Analysis of Routes
Many problems in computer can be reduced to a shape analysis of images, finding important applications in many fields such as biology, medicine, visual arts and security (Takemura & Cesar Junior, 2002 [28]).However, the main difficulty is to find measures (descriptors) that are invariant to changes made by the forms, such as scale changes, rotations, translations and projections.This paper proposed and evaluated some descriptors to accomplish this task.These descriptors are described below.
Total Cost -The total cost is the total distance traveled by all vehicles.Cost Min, Cost Max and Cost Average, are the minimum, the maximum and the average route lengths.
Overlap -The overlap between the routes can be an indicative of bad quality of the solution found, indicating that better solutions can be obtained, combining the customers in these routes.The Figure 2 shows examples of overlap between two routes.(Renaud, Boctor and Laporte, 1996[23]).
Perimeter -The perimeter P of a shape is defined by the sum of the length of its edges.In the case of objects in an image, can be obtained adding the distances between the pixels of its frontier, noting that the distance among pixels in the vertical and horizontal direction is 1.0 and the distance between two pixels on diagonal direction is √ 2. Although the perimeter may be a very simple descriptor, it has been widely used to obtain more interesting descriptors.
Area -The area of a shape A is another descriptor very simple.In the case of a contour in an image I , the area can be obtained by counting all pixels inside the contour I C as in 1.
where: W is the image width; H is the image height; k = 1 if pixel (i, j ) ∈ I C and k = 0 otherwise.In the case of a convex region, determined by its n vertices (x 1 , y 1 ), (x 2 , y 2 ), . . ., (x n , y n ), the area of the convex hull is given by Equation 2.
Compacity -The compacity C (Costa and Cesar (2001) [9]) is a measure defined by Equation 3, where P is the perimeter and A is the area of the form.The lowest compacity is obtained with a circle, which has a very large area compared to its perimeter.Compacit y Min, Compacit y Max and Compacit y Aver age are the minimum, the maximum and the average compacity of the routes.
Centroid -It is the location of the central point of the shape.From the centroid, important descriptors can be obtained, such as the minimum, maximum and average distance from the centroid to the edge.The centroid is obtained as the averages of vertices coordinates of the shape.
Diameter -The diameter D is the longest distance between any two points of the shape.
Convex Hull -The determination of the convex hull of a shape, its area A H and its perimeter P H are very useful to characterize shapes and also to obtain other descriptors.The convex hull of a route is the smallest convex polygon that contains the route.
Fractal Dimension -The Fractal Dimension F D is a value that describes how irregular an object is and how much of the space it occupies.The basic principle to estimate F D is based on the concept of self-similarity.The F D of a bounded set S in Euclidean n-space is defined as in Equation 4.
where N r is the least number of copies of S in the scale r.The union of N r distinct copies must cover the set S completely.One of the most used methods to obtain the fractal dimension of an object in an image is the Box-counting method, which creates square boxes, with the image was resized to a square dimension such that the length, measured in number of pixels, was of a power of 2. This allows for the square image to be equally divided into four quadrants and each subsequent quadrant can be divided into four quadrants, and so on.The number of boxes containing black pixels was noted as a function of the box-size, length of box.The natural log of all these points were calculated and plotted and the fractal dimension will be the angular coefficient of the diagram.
Temperature -The temperature T is a measure defined by Equation 5, where P is the perimeter and P H is the perimeter of the convex hull.The contour temperature is defined based on thermodynamics formalism.The authors that proposed this feature argues that is bear strong relationships with the fractal dimensions (Costa & Cesar (2001) [9] and DuPain et al., 1986 [12]).
Curvature -The curvature is one of the most important descriptors that can be extracted from a contour.Given a parametric curve S(t ) = (x(t ), y(t )), the curvature k(t ) is defined by Equation 6.
Bending Energy -The bending energy B E is obtained by integrating the squared curvature values along the contour and dividing the result by curve perimeter, as Equation 7.

TECHNIQUES FOR MULTIDIMENSIONAL VISUALIZATION INFORMATION
Due to the large number of descriptors used in this work, it is necessary to use information visualization techniques (Card et al., 1999 [6]) to display the high dimensional data, such as Parallel Coordinates (Inselberg, 1985 [17]), which allows visualizing all attributes in A 2D chart.
In parallel coordinates, a space of dimension n is mapped to a two-dimensional space using equidistant n and parallel axes to principal axes.Each axis represents an attribute, and normally, the interval of values for each attribute is linearly mapped on the corresponding axis.Each data item is showed as a polygonal line that intercepts each axis at the point corresponding to the value of the associated attribute, as shown in Figure 3.
Each axis is labeled with the name, the lowest and highest value of each attribute, and the interpretation is facilitated by the immediate estimation of the attribute values along the axes.Other structures can be identified, such as data distribution and functional dependencies, correlations between attributes (Wegman & Luo, 1996 [30]) and clusters.
Another useful visualization technique to the exploration of data with clusters is the Viz3D (Artero & Oliveira, 2004 [1]), that projects the data in the surface and inside of a 3D cylin- der, whose base consists in system of radial axes which represent the attributes of the records.
Given the data matrix D m×n , the Viz3D maps the n-dimensional coordinates of m records di of D in 3D coordinates (x i , y i , z i ) according to Equation 8.
Figure 4 illustrates the Viz3D projection and Artero and Oliveira (2004) [1] argue that the views obtained with this projection are similar to those obtained using Principal Component Analysis (PCA) when used to reduce dimensionality of data to the dimensional of the space.

ATTRIBUTES SELECTION
When data are grouped in classes and have a great number of attributes, which presents different capabilities to separate the classes, it is necessary to identify the most relevant attributes to separate classes and delete the attributes that do not have a good separation between classes, because, these can shuffle the groups in the visualization.Although Parallel Coordinate visualization technique can help in determining the most relevant attributes, a traditional technique, which can be used in this step, is the Analysis of Variance (ANOVA) (Snedecor and Cochran, 1967 [27]).The analysis of variance is a widely used statistical test, which basically aims to verify if there is a significant difference between means and if the factors influence some dependent variable.From a sample of k (classes) groups, with n registers, the critical value of Snedecor F is determined by Equation 9.
where: c is the number of class j ; n is the total number of samples in the set; n j is the number of samples in the class j ; x j is the mean of samples in class j; x is the mean of all samples in the data set; k is the degree freedom; x i, j is the sample i in the class j .
When the calculated value of F is greater than the critical value in the Snedecor distribution, the analyzed attribute is considered relevant for separating classes and therefore should be retained in the analysis.

ANALYSIS OF SOLUTIONS FOR VEHICLE ROUTING PROBLEM USING TECHNIQUES OF COMPUTATIONAL GEOMETRY
The total cost of the solution is an obvious indicator on the quality of the route, however, only analyzing this attribute is not possible to classify the solution as good or bad, because for problems with multiple nodes (costumers) to be serviced is natural that the total cost is high, even for the optimal solution, since the total cost depends on the area covered by us.Two simples possibilities to evaluate the quality of a solution, independent of the quantity and area covered by the nodes, would be the ratios Q 1 and Q 2 , given in Equations 10 and 11, which are two new attributes.
T emperature Conv H ull Ar ea (11) In this work, we propose that the descriptors presented in Subsection 2.1 and descriptors in Equations 10 and 11 may be applied to assess the quality of the solutions to the vehicle routing problem by means of a geometrical analysis of routes.These descriptors are applied individually to the routes traveled by the vehicle, and total path traveled by all vehicles in this manner, the descriptors are predicted minimum, maximum and average of each solution, yielding a total of 34 attributes in the next Table 1.In addition to these attributes, the records (solutions) also receive, according to the cost of the solution (sum of the solution routes) a last attribute corresponding to Class 1 for good routes and Class 2 for bad. Figure 5 shows the ideal solution and a poor solution for instance A-n36-k5, which is small and A-n80-k10, which is large.
Looking at Figure 6, which shows the distribution of total costs in classes good and bad, it is observed that the total cost of the optimal solution instance A-n80-k10 is often greater than the total cost of bad solution in the instance A-n33-k5, showing that the total cost is not a useful measure to separate the good and bad solutions.
The problem here is due to the fact that are two different problems, in size of the area attended, number of customers and vehicles.The instance A-n33-k5 has: convex hull area equal to 6, 081; 33 customers and 5 vehicles while the instance A-n80-k10 has: convex hull area equal to 9,044, 80 customers and 10 vehicles.In fact, the great challenge of this work is to achieve effective descriptors of forms, regardless of the number of customers, vehicles and the size of the area served.In the experiments presented in Section 4, the same descriptors were evaluated with the instances of the type "A" de Augerat et al. (1998) [4], which vary the number of customers from 32 until 80, the number of vehicles from 5 until 10 and the area serviced from 6,081 until 9,404 units of area.

EXPERIMENTS
To investigate the usefulness of these attributes in evaluating the quality of the solutions, regardless of the number of places to be visited (customers), the number of vehicles to be used and the service areas, were used instances of Augerat et al. ), and then the 15 best solutions and the 15 worst solutions we selected, were adopted 405 good solutions (total cost low) and 405 bad (high total cost), resulting in 15 good and 15 bad solutions for each one of the 27 instances, always considered the capacity maximum limit vehicle.

Visual Data Analysis
The visualization in parallel coordinates of all the attributes is shown in Figure 7, where the polygonal matching to the good solutions are presented in black, while the polygonal matching to the bad solutions are displayed in gray.Sorting the axes with the values of F-Snedecor it is easy to identify the most relevant attributes for the separation of the two classes, as illustrated in Figure 8.The F values for these 34 attributes are shown in Table 2.The ten attributes with the higher values (better) of the F of Snedecor, all above than 1, 000 are shown in Table 3.The visualization, using parallel coordinates, of the 10 attributes with the highest values (best values) of F is shown in Figure 9.In this view, it is clear that there is a good separation between the two classes, with lower values for these attributes in Class of good solutions and higher values in the class of bad solutions.
It is noted that four of ten attributes are obtained from the information temperature, which indicates that this measure can be very useful to obtain a reasonable attribute to distinguish good from bad routing solutions.The visualization of these registers in Viz3D is shown in Figure 10, where it is possible to observe a reasonable separation between the markers in the two classes (good solutions in black and bad solutions in gray).
It is noted that four of ten attributes are obtained from the temperature information, which indicates that this measure can be very useful to obtain a reasonable attribute to distinguish good from bad solutions routing solutions for the problem of routing.Using the T emper Max attribute, it is possible to see in Figure 12 that regardless of the number of customers in the instances and the size of the area covered by the customers, the classes have a good separation regardless of the number of nodes (customers).As the optimal costs of instances of Augerat et al. (1998) [4] are known, it is possible to determine an optimality index of other solutions obtained for these instances, given by the relation in 12.
Opt imalit y = Total Solution Cost Optimal Solution Cost (12) In this case, the Optimality has a value of 1.0 for the optimal solution itself, and has value greater than 1.0 for the remaining worse solutions.The six attributes with the highest correlation with the Optimality values are presented in Table 4. Again, the attributes obtained from the temperature appear among the best.[2]), this section presents the results of a classification using the same 10 previously selected attributes, aiming to evaluate the effectiveness of these ten attributes with higher values of F, to classify Solutions as good or bad.The neural network used has ten neurons in the input layer; Five in the dark and two on the way out.Logistics transfer function and learning rate 0.5 was adopted (Rumelhart et al., 1986 [25], Artero, 2009 [2]).After 2,000 training iterations, consuming a time of only four seconds, the maximum error of the network was 3.19E-14, indicating a great convergence of the network, resulting in a success rate of 99.87% (809 hits).The confusion matrix obtained by classification using the neural network is presented in Equation 13.

CONCLUSIONS
It is not a simple task to find optimal solutions to the vehicle routing problem, when there are a large number of customers and vehicles, as well as different sizes and shapes of the assisted areas.Heuristic methods can obtain solutions in an acceptable time, however, when the optimal solution is unknown, it is hard to discern how good is the solution with respect to the optimality, without running lower bound methods.
This work presents a proposal to evaluate the solutions obtained with heuristic methods, allowing to classify them as good or bad, using an analysis of the shapes of the routes.In addition to evaluating the efficacy of some geometric descriptors as attributes, in a visual exploration process, a Backpropagation neural network was also used to make an automatic classification, which showed a success rate of 99.87%, showing that the investigated attributes that have a reasonable potential to discriminate the quality of solutions, regardless of the number of customers with routes varying from 32 to 80 guests and number of vehicles ranging from 5 to 10.The de-scriptor TemperMax (maximum temperature) stood out in the discrimination of the two classes (Figure 11), as well as other temperature changes.In future works, other descriptors need to be evaluated, for example, descriptors based on moments, Fourier, etc.

Figure 1
Figure 1a) Solution for the A-n33-k5 instance; b) Shape of the five routes.

Figure 3
Figure 3a) Set of data with four records of dimension five, b) Visualization using parallel coordinates of the set showed in (a).

Figure 6 -
Figure 6 -Distribution of attribute values CostTotal for the classes good and bad.

Figure 7 -
Figure 7 -Parallel Coordinates Visualization of the 34 attributes (descriptors).Black polylines for the 405 good solutions and gray polylines for the 405 bad solutions.

Figure 8 -
Figure 8 -Parallel Coordinates Visualization of the 34 attributes (descriptors).Black polylines for the 405 good solutions and gray polylines for the 405 bad solutions.

Figure 9 -
Figure 9 -Visualization in parallel coordinates of the records good and bad, using only the 10 attributes with the highest values of F. Black polylines for the 405 good solutions and gray polylines for the 405 bad solutions.

Figure 10 -
Figure 10 -Viz3D Visualization of the records good and bad, using only the 10 attributes with the highest values of F. Black markers for the 405 good solutions and gray markers for the 405 bad solutions.

Figure 11 -
Figure 11 -Distribution of the attribute values TemperMax in the two classes good and bad.Black color for the 405 good solutions and gray color for the 405 bad solutions.

Table 3 -
Attributes with the 10 higher values (best) of F.

Table 4 -
The six attributes with the highest correlation with the Optimality.