Optimum contribution selection using differential evolution

A program to determine optimum contribution selection using differential evolution was developed. The objective function to be optimized was composed by the expected merit of the future progeny and the coancestry among selected parents. Simulated and real datasets of populations with overlapping generations were used to validate and test the performance of the program. The program was computationally efficient and feasible for practical applications. The expected consequences of using the program, in contrast to empirical procedures to control inbreeding and/or to selection based exclusively on expected genetic merit, would be the improvement of the selection response and a more effective control of inbreeding.


Introduction
Selection based on optimum genetic contribution (Woolliams & Thompson, 1994;Meuwissen, 1997) aims to restrict the inbreeding level of a population and to maximize the genetic progress in the long term.Under the same level of inbreeding, optimum contribution selection can provide higher genetic gain when compared to selection based exclusively on expected breeding value (Meuwissen & Sonesson, 1998).
The expected merit of the future progeny and the coancestry among selected parents are the usual components considered in the index (objective function) to be optimized in the optimum contribution selection definition.A negative weight is used for coancestry, aiming to control inbreeding in the long term.With overlapping generations, a more proper genetic contribution seems to be obtained when the juveniles (animals not yet available for reproduction) are also considered in the optimization process or when coancestry computation take in to account previous contributions.However, the advantage of considering these changes to perform optimum contribution selection is not so evident in the literature.
The objective function for optimum contribution selection can be optimized using, for example, Lagrange multipliers (Meuwissen & Sonesson, 1998), semidefinite programming (Pong-Wong & Woolliams, 2007) or evolutionary algorithms (Sorensen et al., 2006).Semidefinite programming is advantageous when the components and the contrasts (restrictions) of the objective function are convex, as the case of the usual objective function mentioned above.Evolutionary algorithms are, however, more flexible regarding the components and the contrasts that can be considered.Differential evolution (Storn & Price, 1995) is a type of evolutionary algorithm, very powerful to optimize diverse objective functions studied in the literature (Price et al., 2005).Despite of the potential of the method, it seems that the feasibility of using Differential evolution for optimum contribution selection has not been investigated so far.
The objectives of this study were the following: to develop a program based on differential evolution to determine optimum genetic contributions; to validate and test its performance using simulated and real data sets and different parameters of the Differential evolution algorithm; to empirically evaluate the consequences of applying different approaches to perform optimum contribution selection in schemes with overlapping generations.

Material and Methods
The optimum genetic contribution was defined with the optimization of the following objective function (OF): OF = w 1 *x'EBV/2N + w 2 *x'Ax/4N 2 , where x = the vector of genetic contributions (number of mates for each candidate) to be optimized, EBV = the vector containing the expected breeding values, A = the numerator relationship matrix, w 1 and w 2 = the weights for the expected merit of the future progeny (x'EBV/2N) and the average coancestry among selected parents (x'Ax/4N 2 ), respectively, and N = the number of required matings.In order to optimize this objective function, the following program based on differential evolution was developed.
The differential evolution algorithm applied consisted in randomly generating an initial 'population' of possible solutions, composed by the number of mates for each candidate.Each 'individual' of the 'population' is considered a 'chromosome' with n 'loci', where n is the number of candidates for selection.The 'alleles' are random values ranging from 0 to the maximum allowed number of mates for each candidate.The 'fitness' of each 'individual', determined by the value of the objective function described above, is calculated according to the value ('allele') of each 'locus'.Once the initial 'population' is established, several 'generations' are simulated.In each 'generation', a 'challenger' is constructed for each 'population member'.If this 'challenger' has superior 'fitness', it will replace the 'population member' in the next 'generation'.To build this 'challenger', three other 'individuals' are chosen at random.We can label these as A, B and C. Each 'allele' is then addressed in turn.With a probability equal to the 'crossover rate' (CR), the 'allele' is simply adopted from the 'population member' that the 'challenger' is challenging.Otherwise, a new 'allele' value is constructed as the value for 'member' A plus the 'mutation factor' (F) times the difference between the values for B and C. Successful 'challengers' replace their respective 'population members' and, together with 'surviving members', constitute a new 'generation' with higher mean 'fitness'.The process continues over enough 'generations' to achieve convergence close to an optimal solution, with the 'fittest' solution being chosen (Price & Storn, 1997).
The differential evolution algorithm was originally designed to work with continuous variables.The approach proposed by Lampinen & Zelinka (1999) was used to provide integer solutions for x (number of mates per candidate).The method consists in letting differential evolution work internally with continuous floating-point values and converting them to integers to calculate the 'fitness'.According to these authors, this is essential to maintain the diversity of the 'population' and the robustness of the algorithm.To enhance the speed of the differential evolution for the optimum contribution selection application, the Colleau (2002) indirect approach to compute coancestry was adopted, and linked lists (Knuth, 1975) were used for the storage and calculations involving sparse matrices.The differential evolution developed program was written in FORTRAN language.
Simulated datasets were used to validate and to evaluate the performance of the differential evolution program (Experiment 1).Phenotypes of a hypothetical trait, with heritability equal to 0.4, were simulated according to the infinitesimal model: y ij = b i + a ij + e ij , in which y ij = the phenotype of the animal j in the contemporary group (CG) i, b i = the effect of the CG i, a ij = the breeding value of the animal j in the CG i and e ij = the random error.Firstly, a base population of six sires and 120 dams, which were assumed to be unrelated, unselected and randomly sampled from a conceptually infinite population, was generated.Each sire was randomly mated to 20 dams.Following a pregnancy rate of 80%, each dam could generate an offspring with randomly defined sex.Ten years (mating seasons) were simulated.For each year, those sires used in two consecutive years and the failed dams were culled, resulting in a replacement rate of 50 and 20% for sires and dams, respectively.For the first ten years, selection for replacement was based on breeding values.Following this structure, 20 replications were simulated.In all these replicates, the differential evolution program was then applied to determine the optimum contribution selection for what would be the eleventh mating season.
It was considered as candidates for selection, the six sires used in the tenth mating season plus the 12 best male juveniles.Females available for reproduction were the 96 dams that conceived in the last simulated year plus the 24 best female juveniles.It was assumed that each available female would produce one offspring, i.e. the female contribution was not optimized in the optimum contribution selection process.The maximal number of allowed mates per male was equal to 40.
The allele values of the initial population were randomly generated from a uniform distribution ranging from 0 to 40.Two distinct ways to constrain the number of mates to be equal to the number of available females (120) were investigated: 1) scaling the allele values, forcing them to sum to number of mates, 'internally' in the differential evolution algorithm; or 2) using an auxiliary vector to scale the allele values just when they were converted to integers, in the subroutine to evaluate the 'fitness'.
Runs of the differential evolution program were carried out using the following sets of operational parameters: size of the population equal to 36 (twice the number of candidates for selection); crossover rate (CR) equal to 0.5 or 0.7; mutation factor (F) equal to 0.9 or randomly sampled from a uniform distribution (ranging from 0.2 to 0.9) for each locus.
For each of the eight possible combinations compromising the type of constraint, the crossover rate (CR) and the mutation factor (F) adopted, four applications of the differential evolution program were performed, varying the weights of the objective function for the genetic merit (w 1 ) and the coancestry (w 2 ).A first application optimizing just the genetic merit (w 1 =1; w 2 =0) was applied to validate the program, as the correct solution is known in this case.Two distinct ways to control coancestry were then tested: imposing a negative weight (w 1 =1; w 2 =-5); and imposing a restriction (w 1 =1; w 2 = 'r'), forcing the average coancestry of the selected parents to be equal to the application which considered w 2 =-5 (this restriction was guaranteed with a 'death' penalty for 'members' of the evolutionary process that broke the constrain).The fourth application, tested the performance of the differential evolution program to optimize an objective function considering just the coancestry (w 1 =0; w 2 =-1).
For each set of weights of the objective function and constrain on the number of mates a modified version of the differential evolution algorithm proposed by Gondro & Kinghorn (2008) was also applied, aiming to improve speed and robustness of convergence.The method basically presents three differences from the differential evolution previously described (adopted values are included in parenthesis): i) low mutation factor (F) for most generations and a high mutation factor (F) for every few generations (F=0.1;F=1.0 every four generations); ii) lower crossover rate (CR) for latter generations (CR=0.5;CR=0.1 if generation>5,000); iii) perform proportional and absolute mutation (Gondro & Kinghorn, 2008) on every few loci (~3%).
Convergence of the evolutionary process was assumed when the range and the mean absolute deviation of the 'fitness', considering all the 'individuals' of a 'generation', were lower than 1x10 -6 .
In order to test the viability of the differential evolution developed program for practical applications, a real data set of a Nelore herd was used (experiment 2).The full pedigree contained 4,763 animals, which were born from 1967 to 2006.Candidates for selection were chosen based on the selection index used by the farm, composed by expected breeding values for different pre-weaning and yearling traits.The candidate sires were those ranked as top 30% and used in the last three mating seasons (from 2004 to 2006), and the top 20% juveniles (males born in 2004 and 2005), totalizing a number of candidates equal to 100.The females available for reproduction were those which were pregnant in the last mating season and the top 20% heifers (females born in 2005), totalizing 500 females to be mated.As in the previous experiment, it was assumed that each available female would produce a fixed number of one offspring, i.e. the female contribution was not optimized in the optimum contribution selection process.
The optimum genetic contribution for each available sire was defined accordingly to the objective function and the differential evolution program described above.The operational parameters of the differential evolution algorithm were: 'population' size = 200 (twice the number of candidates); crossover rate (CR) = 0.5 (or CR = 0.1 if generation>5,000); and mutation factor (F) = 0.1 (or F=1.0 every 4 generations).As proposed by Gondro & Kinghorn (2008), proportional and absolute mutation was performed for approximately 3% of the 'loci'.Convergence of the evolutionary process was assumed when the range and the mean absolute deviation of the 'fitness', considering all the 'individuals' of a 'generation', were lower than 1x10 -4 , or the 'fittest' solution of the 'generation' 5x10 6 .
Different weights of the objective function for the genetic merit (w 1 ) and the coancestry (w 2 ) were empirically defined to evaluate the consequences of applying the optimum contribution selection in contrast to the selection based exclusively on expected breeding values.The four sets of adopted weights were: w 1 =1 and w 2 =-1 (OC_1); w 1 =1 and w 2 =-10 (OC_10); w 1 =1 and w 2 =-20 (OC_20); w 1 =0 and w 2 =-1 (OC_100).
The importance of considering juveniles in the objective function or previous contributions to compute coancestry, in populations with overlapping generations, was empirically evaluated with simulated data (Experiment 3).The same model and structure of the population described in the experiment 1 were used in the simulation of the first 10 years.Other 10 years were simulated, following four selection strategies: 1) selection based on best linear unbiased prediction (BLUP) of the breeding values, with a replacement rate of 50 and 20% for sires and dams, respectively, each sire mated randomly to 20 dams; 2) OC -optimum genetic contribution selection considering the actual available candidates (described latter); 3) OCfoptimum genetic contribution selection considering the actual and the future (juveniles -animals younger than two years of age) available candidates; and 4) OCgoptimum genetic contribution selection using the approach proposed by Grundy et al. (2000), which adopts an augmented numerator relationship matrix (A*) and a vector of committed future contributions (f) to compute coancestry ((x+f)'A*(x+f)).
For the three optimum genetic contribution selection strategies, males considered as candidates for selection were the sires used in the previous mating season and the steers (two years old males) ranked as top 30%.As in the first experiment, failed dams were automatically replaced by the top 20% heifers, and it was assumed that each available female would be mated once per season.Random matings were performed conditionally on the optimum contribution solutions of the sires, with a maximum number of allowed mates per sire equal to 40.Following each strategy and mating season, 20 replicates were simulated.
The optimum genetic contribution for each available sire was defined accordingly to the objective function and the differential evolution program described in the beginning of this section.The operational parameters of the differential evolution algorithm and the convergence criteria were the same as those adopted in experiment 2.
For the optimum contribution strategy considering juveniles (OCf), the estimated contributions of the future candidates were also considered in the optimization process to avoid restriction of the future use of good juveniles due to previous intensive use of their parents, i.e. aiming to attain a better balance of using actual and future good candidates.
The weight for the average coancestry among parents was empirically defined forcing optimum genetic contribution strategies (OC, OCf and OCg) to present the same inbreeding rate as the BLUP strategy, allowing the comparison of the genetic progresses under the same level of inbreeding.

Results and Discussion
The assumed convergence of the differential evolution program was attained very fast (Table 1).The minimum, mean and maximum central processing unit (CPU) time, over the 20 replicates, were equal to 0.2, 3.5 and 17.3 seconds, respectively.For a mate selection application, Abbass (2000) stated that the generation of the numerator relationship matrix made the optimization problem very computationally expensive.This demand was drastically reduced in the present application with the adoption of the indirect approach to compute coancestry (Colleau, 2002), contributing for the efficiency of the program.It is important to note that although the number of candidates ( 18) was small for the optimum genetic contribution selection, the coancestry computations also involved the available females and their ascendants, with a total number of animals ranging from 203 to 241, depending on the replication.
The expected merit of the future progeny with BLUP selection (three best sires mated to 40 dams each) would be equal to 0.6100.The optimum genetic contribution selection applications maximizing only the genetic merit (w 1 =1 and w 2 =0), used to validate the differential evolution program, presented solutions very close or equal to the optimum solution.The expected merits ranged from 0.6075 to 0.6100, varying with the operational parameters.Only the application that considered the modified differential evolution and adopted an auxiliary vector to constrain the number of mates provided the optimum solution.The others presented a different solution because, for some replications, one or even two mates were recommended for the fourth best sire, which had an index value very close to the third best ranked sire (data not shown).
Considering the speed of the optimization process, the applications with a 'light' penalty for coancestry (w 2 =-5) outperformed those with a 'death' penalty for 'individuals' presenting average coancestry greater than 0.0857 ("w 2 =r").Both strategies presented almost the same solutions, however.
Scaling the allele values and forcing them to sum to number of mates, slowed down the efficiency of the differential evolution algorithm, in comparison with the applications that used an auxiliary vector to scale the allele values just for evaluating the 'fitness' (Table 1).This result is in agreement with the statement of Price et al. (2005) that the shrinkage of the boundaries of the allele values could negatively impact on the ability of the differential evolution to converge.
Using a random mutation factor (F~u[0.2;0.9])accelerated the convergence process, in contrast to a fixed mutation factor equal to 0.9.Applications using crossover rates equal to 0.5 or 0.7 presented similar results, i.e. there were no evidences that the use of a higher crossover rate resulted in premature convergence.It is important to emphasize that these operational parameters of the differential evolution are sensitive case, i.e. the best mutation factor and crossover rate for a problem will not necessary be the best for other cases.The adoption of the modified differential evolution attained its objectives of improving speed and robustness of convergence.
The optimization of the objective function considering only the coancestry (w 1 =0 and w 2 =-1), resulted in different solutions being obtained for the expected merit of the future progeny, probably because of the presence of diverse global optima.
Results from Experiment 2, showed that the proportion of inbreed animals observed in the real data set and their average inbreeding coefficient were equal to 33% and 0.037, respectively.The CPU time of the differential evolution convergence ranged from 15.8 (OC_10) to 120.4 seconds b Constrain imposed internally in the differential evolution algorithm (DE) or while evaluating the 'fitness'.c Mutation factor = 0.9 or randomly chosen from ~ unif.[0.2;0.9], for each locus.d Time in seconds, in a PC with a Pentium D 3.4GHz processor and 3 GB RAM.* Using the modified differential evolution proposed by Gondro & Kinghorn (2008).
Table 1 -Average values, over 20 replicates, of the CPU time, number of generations of the evolutionary process (n_gen), expected merit of the future progeny (xEBV) and average coancestry of selected parents (xAx), according to applications of the differential evolution varying: the weights of the objective function (w 1 and w 2 ); the constrain on the number of mates (N=120); the mutation factor (F); and the crossover rate (CR) (OC_100).The convergence criteria that the range and the mean absolute deviation of the 'fitness' were lower than 1x10 -4 , were attained for all the optimum genetic contribution selection applications with fewer than 10,000 generations of the evolutionary process.
The random mating of the 100 sire candidates with the 500 available dams resulted in an expected merit of the future progeny equal to 0.813 standard deviation units and an average coancestry equal to 0.024 (Table 2).As expected, the BLUP (index) selection was the strategy that provided the highest expected merit (in the short term), also presenting an expressive increase in the average coancestry, in comparison to the random strategy.The expected consequences of a higher coancestry are the increase of the inbreeding, the reduction of the genetic variability and the genetic response in the medium/long term, and possibly some indirect undesirable responses, too (Weigel, 2001).
The empirical attempt to control inbreeding by selecting the best 13 sires which presented F<0.012 (average inbreeding of the herd), reduced coancestry by 12.5% and the expected merit by 6.0%, in comparison to the index strategy.The same reduction in coancestry was observed by applying OC_1, without reduction of the expected merit, highlighting the inefficiency of empirical attempts to control inbreeding.
The optimum contribution selection strategies resulted, as expected, in more sires being selected, reduction in coancestry and in the expected merit of the future progeny, in comparison with the index strategy.As an example, the optimization of the objective function with weight equal to -10 for coancestry (OC_10), resulted in 23.1% more sires being selected, a reduction equal to 18.8% for coancestry and equal to 0.6% for the expected merit.
Results from experiment three showed that, as it was desired, the average inbreeding rates were almost the same for all the adopted selection strategies (Figure 1).The average inbreeding coefficient in the year 20 was equal to 0.12 (an annual rate equal to 0.008, in the last 10 years).
BLUP selection resulted in an achieved genetic gain equal to 1.64 phenotypic standard deviation units (s p ), after 20 years of selection (Figure 2).Optimum contribution selection strategies presented higher genetic response than BLUP selection, in agreement with the results from literature (Meuwissen & Sonesson, 1998;Grundy et al. 2000).In the 20 th year, the superiority of optimum genetic contribution strategies (OC, OCg and OCf) over BLUP selection were, respectively, equal to 16.4, 18.2 and 19.1%.An almost same response to selection was observed for the different optimum genetic contribution selection approaches.The importance of changing coancestry computation in schemes with overlapping generations seems to be more related to constraining inbreeding to a predefined level than to maximizing response.As   previously described, constrained inbreeding level was attained in the present study by using different weights on coancestry for each optimum genetic contribution selection approach, forcing them to present the same inbreeding rate as the BLUP strategy.
Meuwissen & Sonesson (1998) observed a superiority of optimum genetic contribution selection over BLUP selection, under a similar level of inbreeding of the present study, ranging from 16 to 36%.This superiority was even greater for populations with lower inbreeding rate.Results of the present study are not directly comparable to those by Meuwissen & Sonesson (1998) due to differences in the structure of the simulated population, in the number of years that optimum genetic contribution selection was applied, in the optimization method, and in the selection approach for females.
It is important to emphasize that the proper definition of the weight for coancestry in the objective function depends on the time horizon being considered; on the level of allowed reduction in the short term genetic progress; and on the average coancestry of the population under investigation.It should be noted that the values used for weighting coancestry in the present study were empirically defined just to test the differential evolution program, i.e. they were not optimized, and more favorable results of optimum contribution selection can undoubtedly be achieved.
For all the experiments of this study, the female contribution was not optimized in the optimum genetic contribution selection process.This is justified just for practical circumstances where low selection pressure is applied over the females (e.g.beef cows without embryo transfer application), i.e. when it is not expected that the optimum genetic contribution selection, also considering the optimization of the female contributions would present a significant superiority over the optimum genetic contribution selection optimizing only the male contributions.
After defining the optimum genetic contribution, the natural next step should be defining the matings.Another approach not investigated in the present study is to perform simultaneously the selection and mating decisions (mate selection).This one step approach seems to provide better control over the inbreeding and higher genetic response (Kinghorn et al., 1999).New studies are planned to be made trying to make the proper adaptations in the DE developed program, allowing its application for mate selection purpose.

Conclusions
It is possible to use differential evolution as an optimization method to perform optimum contribution selection.The differential evolution developed program is computationally efficient and feasible for practical applications.The expected consequences of using the program, in contrast with empirical attempts to control inbreeding and/or selection based exclusively on BLUP, is to improve the selection response and to control inbreeding more effectively.

Figure 1 -
Figure 1 -Average inbreeding coefficient per year and strategy of adopted selection.