Research Article Performance and parameterization of the algorithm Simplified Generalized Simulated Annealing

The main goal of this study is to find the most effective set of parameters for the Simplified Generalized Simulated Annealing algorithm, SGSA, when applied to distinct cost function as well as to find a possible correlation between the values of these parameters sets and some topological characteristics of the hypersurface of the respective cost function. The SGSA algorithm is an extended and simplified derivative of the GSA algorithm, a Markovian stochastic process based on Tsallis statistics that has been used in many classes of problems, in particular, in biological molecular systems optimization. In all but one of the studied cost functions, the global minimum was found in 100% of the 50 runs. For these functions the best visiting parameter, qV, belongs to the interval [1.2, 1.7]. Also, the temperature decaying parameter, qT, should be increased when better precision is required. Moreover, the similarity in the locus of optimal parameter sets observed in some functions indicates that possibly one could extract topological information about the cost functions from these sets.


Introduction
A large number of problems in physics, chemistry and biology have as central point the minimization of an appropriate energy function for finding the global minimum of a particular target function.The protein folding and the ligand-receptor docking problems are two examples of challenges in the molecular biology field where the development of efficient and robust global optimization algorithms play a central role in order to find the conformational geometry associated to the global minimum of the molecular free energy hypersurface.Biological macromolecules and biomolecular complexes present a very complex free energy landscape with thousands of local minima.This fact dramatically increases the probability of an optimization process to be trapped in local minima and consequently turns the global minimum determination into a very diffi-cult task.To cope with this problem one should choose a powerful optimization algorithm and understand it deeply to obtain a robust and efficient optimization protocol.This work investigates a simplified and extended version of the GSA, Generalized Simulated Annealing, algorithm of optimization (Tsallis and Stariolo, 1995;1996), called SGSA (i.e., Simplified GSA), with the main objective of understanding the role of the SGSA parameters in its performance in order to guide their choices in future biomolecular optimization studies.
The GSA is a generalization of the Simulated Annealing algorithm, SA (Kirkpatrick et al., 1983), also known as Boltzmann machine because it is based on Boltzmann-Gibbs statistics, and of the Fast Simulated Annealing algorithm, FSA (Szu and Hartley, 1987), or Cauchy machine, based on Cauchy-Lorentz probabilistic distribution.
The simulated annealing algorithms family depends on a visiting function that determines how the domain of the function is searched, and on an acceptance function that says if a result of higher "energy" should be accepted or rejected.
In the original SA, the visiting function was simply a random variable choice, due to the binary nature of the variables.The acceptance function, where k B is the Boltzmann constant, gives the Boltzmann-Gibbs distribution nature of the movement of the reference point.In problems with more complex domains, the Boltzmann-Gibbs probability distribution function is also used as a visiting function.It was demonstrated (Geman and Geman, 1984) that in this case the maximum temperature decaying ratio should be T(t) = T 0 / log (1 + t), where the "time" t is the iteration step, to guarantee the theoretical convergence of the algorithm.
The FSA algorithm uses as its visiting function a Cauchy-Lorentz probability distribution function: (2) where x is the variable of interest, and maintains the same acceptance function of the SA algorithm.Szu and Hartley proved that in this case T could decay with the inverse of the computing step, T c (t) = T 0 / (1 + t), because even in relatively low temperatures long range jumps are still possible, which made the annealing faster than in the SA algorithm.
The Simplified Generalized Simulated Annealing algorithm, SGSA, is an extended and simplified derivative of the GSA algorithm, with a reduced computational cost and the capacity to deal successfully with finite domain problems such as grid based receptor-ligand docking methodologies (Meng et al., 1992;Luty et al., 1995;Garrett et al., 1998).

Material and Methods
Given a cost function, the simulated annealing family of algorithms works as follow: 1. From an initial set of values of the parameters of the given cost function, generally randomly chosen, an initial "energy", E ref , of the system is evaluated and an initial "temperature" T = T 0 is selected; 2. a random perturbation is generated into the parameters of the cost function using the visiting function, and the new "energy" of the system, E new , is then calculated; 3. if ∆E = E new -E ref ≤ 0, the new point is better or at least of the same quality as the previous one, the new set of values of the parameters of the function become the reference set; 4. if ∆E > 0, the new point is worse than the reference point but still could be accepted depending upon the acceptance probability function and a random number, as defined in the Metropolis criteria (Metropolis et al., 1953); 5. the "temperature" T is decreased, according to a temperature decaying function; 6. step 2 thru step 5 are repeated during a giving number of steps or until some other stopping criteria becomes satisfied.

The GSA algorithm
The GSA algorithm uses for the acceptance probability function in the cases where E(x t+1 ) > E(x t ), the expression where q A (1.0 < q A < 3.0) is the acceptance parameter.
The visiting function depends on the Tsallis probability density function: where D is the dimension of the cost function, and q V (1.0 < q V < 3.0) is the visiting parameter.From the probability distribution function, G(∆x t ), to which a randomly chosen value is attributed, a perturbation ∆x t is determined in every iteration with the temperature decaying in Equation 4 controlled by where q V = 2 is the FSA temperature decaying case.
Generally Equation 5 has no analytic solution and Equation 6 must be resolved numerically by means of the inversion of a power series (Moret at al., 1996(Moret at al., , 1998)).
Usually T t T t , but there is no specific reason that enforces that.
The idea of generalized in the algorithm comes from the fact that in the parameters limit (q A ; q V ) = (1; 1) reproduces the SA or Boltzmann machine; and (q A ; q V ) = (1; 2) reproduces the FSA or Cauchy machine.
As x was D-dimensional, ∆x t i for every dimension was originally determined using products of sine and cosine functions, that introduces an artifact in the visiting function.Moret (1996) suggested the application of g x q V ( ) independently in every dimension.In any case, two problems arise: the computational cost of the calculus of the inverse of the integral of Equation 5; and second, ∆x t computed in this way is not limited and when the domain is finite it must be normalized.
To cope with these problems, two main simplifications are used in the SGSA.The first is to make, for every dimension x i , with r i randomly chosen, greatly reducing the computational cost, because there is no power series to invert.The second is to make D = 0, that guarantees 0 ≤ g q V ≤ 1, in order to maintain x i always inside a given domain.In this case, A lower value of q V in the g r q i V ( ) gives a more global profile for the visiting function, where long jumps have a greater probability of occurring when compared to the probability given by greater q V values.On the other hand, a greater value of q V gives a more local visiting function profile, with high short jump probability and a very fast decreasing of the long jump probability.
Another difference from traditional GSA was the introduction of an additional temperature decaying parameter, q T , in place of q V in Equation 7, to maintain better and more independent control over the annealing process.A larger value of q T causes a very fast T q T decaying with two possible effects: either the convergence to the global minimum is very fast or the algorithm is trapped in a local minimum.

Results and Discussion
With the objective of understanding the role of the SGSA parameters sets, (q V , q A , q T ), in the algorithm performance, the optimization procedure using six twodimensional functions as case studies is investigated.The choice of two-dimensional functions permits the comparison between the locus of the best SGSA parameters for a particular function and its topology.The six functions studies are: Ackley (Solomatine, 1995), Figure 1(a); log-trigonometric (Kvasnicka and Pospíchal, 1997), Figure 1(b); Lavor (Lavor, 2001), Figure 1(c); Schwefel (Schwefel, 1981), Figure 1(d); Goldstein-Price (Solomatine, 1995), Figure 1 The Ackley and log-trigonometric functions have in common a unique and deep global minimum with several local minima around it.Lavor and Schwefel functions have both a smooth profile with an almost undistinguishable global minimum, because many of the local minima basins are very similar to the global minimum basin.The Goldstein-Price function seems at first an easy objective function, but presents a scale problem with a difference of many magnitude orders between the domain and the function hypersurface.Finally, the De Jong F5 function is, as could be easily seen, a nightmare for optimization algorithms, many deep and small minima basins with minimum values close to the global minimum.
The approach adopted in this study was an exhaustive search for the best parameters set, (q V , q A , q T ), for the SGSA algorithm.Using a stop criteria of 2,500,000 steps, the parameters were scanned using a 0.1 step in the intervals 1.1 ≤ q V ≤ 2.9, 1.1 ≤ q A ≤ 2.9, 1.1 ≤ q T ≤ 3.5, with initial "temperatures" T T The entire process was repeated 50 times for every parameter set with different random initial conditions for each execution.
In Table 1 are shown some of the performance data in terms of mean number of cycles in 50 runs, with two different RMSD (Root Mean Square Deviation) limits, from the exact global minimum solution, for every function studied.The success index presented in the RMSD/Success column shows that for all functions except the De Jong F5 function, the algorithm was able to find the global minimum in all runs for a reasonable number of parameters sets (see cases after the success index).In the De Jong F5 function, in no more than 50% of the runs the algorithm was successful for a particular parameter set.These unsatisfactory results indicate that some work still must be done to improve the algorithm.
From the "Best case" columns it can be seen that the visiting parameter, q V , belongs to the interval [1.2, 1.7].An interval a bit larger, [1.1, 1.8], holds almost all good parameter set intervals (see the "Good cases intervals" columns in Table 1).In the same sense, the temperature decaying parameter, q T , varies in a larger interval [1.2, 2.4], while the good cases intervals are in the interval [1.1, 2.5].It should be noted that when higher precision is required a greater q T should be used in most of the cases, which increases the ability of the algorithm to act as a local search engine.
It was observed that the acceptance parameter, q A , was almost ineffective, indicating that probably the initial acceptance temperature, T q A , should be increased independently of T q V .Another non-exclusive option to cope with this ineffectiveness is the use of negative values for q A , as had been pointed out by Tsallis and Stariolo (1996).
In Figure 2 shows the profile of the SGSA parameters q V and q T for the best q A value (see the "Best case" columns in Table 1), that achieve 100% success in 50 runs (for the De Jong F5 function a success index of more than 20% is used) for each studied function.Some degree of similarity in the profile of optimal parameter sets observed in the functions of Ackley and log-trigonometric, and Lavor and Schwefel (Figure 2), could also be observed in the hypersurface of the respective functions (Figure 1), indicating that possibly one could extract topological information about the cost function from these optimal parameter sets.If some information from a cost function hypersurface is known, an a priori SGSA parameters set range could be chosen.In a case where the function hypersurface is com- pletely unknown, some insight about it could be obtained by means of an exhaustive search in the parameters space.
The results obtained are useful in indicating a direction for the use of this algorithm in problems like protein folding or ligand-protein docking, reducing significantly the number of algorithm parameter choices as well as giving hints about the effect of parameters on the behavior of the algorithm.With these results some improvements were already achieved in studies in progress using the SGSA algorithm in the ligand-protein docking studies in progress, which will be published elsewhere.

620
Performance and parameterization of the SGSA Mean cycle 3 q V q A q T Limit 4 Mean cycle 5 q V q A q T (actual minimum) (success / cases) 1. Maximum Root Mean Square Deviation of the function parameters from the exact global minimum solution values.This value is used as success criterion in the optimization process.2. Minimum SGSA percentage of success in the 50 runs for a particular (q V , q A , q T ) set to be counted and the number of sets that met this requirement.
3. Mean number of cycles where the global minimum was reached among the 50 runs for the best (q V , q A , q T ) set case and the minimum/maximum number of cycles among them.4. The cases represent the number of parameter sets for which the global minimum was reached in a mean number of cycles lower than the chosen limit for every function.5. Mean number of cycles where the global minimum was reached among the 50 runs for the cases of the previous column and the minimum/maximum number of cycles among them.6.One must note that the parameters intervals are independent of one another.7.In the De Jong F5 function case, the parameter set considered to have the better success index is represented on the additional line.8.In the De Jong F5 function case, the limit is determined in function of the success index.

Figure 1 -
Figure 1 -Test functions used in this work.

Table 1 -
Performance data of the SGSA algorithm for the selected two-dimension functions.