Artificial neural networks for adaptability and stability evaluation in alfalfa genotypes

– The purpose of this work was to evaluate a methodology of adaptability and phenotypic stability of alfalfa genotypes based on the training of an artificial neural network considering the methodology of Eberhart and Russell. Data from an experiment on dry matter production of 92 alfalfa genotypes ( Medicago sativa L.) were used. The experimental design constituted of randomized blocks, with two repetitions. The genotypes were submitted to 20 cuttings, in the growing season of November 2004 to June 2006. Each cutting was considered an environment. The artificial neural network was able to satisfactorily classify the genotypes. In addition, the analysis presented high agreement rates, compared with the results obtained by the methodology of Eberhart and Russell.


INTRODUCTION
In plant breeding, when the purpose is to select or recommend genotypes for planting, a detailed study of the interaction between genotypes and environments is of extreme importance.
Several methodologies have been developed for this purpose. Some methods are based on regression models; for example, the methods of Eberhart and Russell (1966) and Cruz et al. (1989). The Bayesian method proposed by Nascimento et al. (2011) and non-parametric methods, such as Rocha et al. (2005) and its subsequent modifications can also be employed (Nascimento et al. 2009a, Nascimento et al. 2009. The use of the AMMI (Additive Multiplicative Models Interaction) (Gauch Junior 2006) model can also be mentioned.
The method of Eberhart and Russell (1966) is widely used today due to easy application and interpretation. The use of this method can be verified in studies of Ferreira et al. (2004), which alfalfa cultivars were classified as adaptability and stability, and in Nascimento et al. (2010), which evaluated cultivars of coffee. However, a limitation of this method is that genotype classification as for adaptability is done by a hypothesis test of angular coefficient (β 1 ), in which the genotype is considered of specific adaptability to a determined set of environments (favorable or unfavorable) when hypothesis H 0 : β 1 = 1 is rejected. In studies where the number of evaluated environments is small (n<10) the applied test is not consistent, which can cause the nonrejection of false null hypotheses. Besides, the small number of observations influences accuracy of estimates used for genotype classification.
As an alternative to solve this problem, artificial neural networks were used for genotypes classification in accordance with the methodology of Eberhart and Russell (1966). In this approach, initially are simulated genotypes belonging to classes defined by Eberhart and Russell (1966). Subsequently, the simulated genotypes are used in the training and validation of neural networks. Thus, by the trained neural networks, the assessment of genotypes for stability and adaptability is not only performed based on the genotypes in the study, but by a large collection of simulated genotypes in accordance with the predefined classes.

ARTICLE
According to Barbosa et al. (2011) neural networks have been recently used in agriculture as a way of solving problems associated with identification of early stages of pests or disease development and in the classification of satellite images (França 2010). Nevertheless, in genetic improvement Barbosa et al. (2011) used a neural network as strategy for genetic diversity analysis.
This study aimed to propose a methodology for analysis of adaptability and phenotypic stability of alfalfa (Medicago sativa L.) genotypes, based on the training of an artificial neural network considering the methodology of Eberhart and Russell (1966).

MATERIALS AND METHODS
The data used for evaluation of the proposed methodology came from an experiment conducted by Embrapa Pecuária Sudeste, for the development of alfalfa genotypes adapted to the different Brazilian ecosystems. The experimental delineation constituted of randomized blocks with two repetitions, in which dry matter production of 92 alfalfa genotypes submitted to 20 cuttings, between November 2004 and June 2006 was evaluated. The cuttings were considered different environmental conditions, as they were carried out in different times. Furthermore, evaluation of the adaptability and stability of genotypes was also analyzed by the methodology Eberhart and Russell (1966).
The method proposed by Eberhart and Russell (1966) is based on the analysis of simple linear regression, which measures the answer of each genotype in face of environmental variations. Accordingly, for an experiment with g genotypes, e environments and r repetitions, the following statistical model is defined: in which: Y ij : mean of genotype i in environment j; β 0i : linear coefficient referring to an i-th genotype; β 1i : coefficient of regression, which measures the answer of the i-th genotype to environment variation j; I j : encoded environmental index ; ψ ij : random errors, which can be decomposed as: ψ ij = δ ij + ε ij , where δ ij regression deviation and ε ij mean experimental error.
Estimates of I j indicate environment quality. Negative values of I j identify unfavorable environments; whereas positive values of I j indicate favorable environments.
Estimators of adaptability and stability parameters are given respectively as and , in which MSD i is the mean square deviation of genotype i; MSR is the mean square residue; and r is the number of repetitions.
The interest hypotheses are H 0 : β 1i = 1 versus H 1 : β 1i ≠ 1, and H 0 : σ 2 di = 0 versus H 1 : σ 2 di > 0. These hypotheses are evaluated by statistics t and F, respectively. After evaluation of hypotheses, the genotypes in study can be classified in one of the six classes described in Table 1.
These parametric values were selected to create the first three mutually exclusive classes. To obtain the three remaining classes, using the same idea of Finlay and Wilkinson (1963), the simulated values were transformed to the logarithmic scale introducing a high linearization degree, in other words, for classes 4, 5 and 6 σ 2 Ψ = 0. Therefore, the stability concept is associated with the capacity genotypes have to present a predictable behavior towards stimulus of the environment. It must be emphasized that the simulation of sets is carried out taking into account environmental rate values of the evaluated set of data.
After obtaining the 3000 genotypes, representing the six classes, the set of data was divided in two: training set and test set of the network. The network training set, composed of 2400 genotypes, was obtained by random selection of 400 genotypes within each class. The test set, composed of the 600 remaining genotypes, with 100 of each class, was used for network testing.
The network used in this work, denoted as a single hidden layer back-propagation (Figure 1) can be represented by a functional form (Hastie et al. 2009). Consider that variables M Nascimento et al.
Z m are functions of ponderated sums of input variables X i , in other words, Z m = γ(α 0m + α T m X) , m = 1, 2, ..., M, and output, Y k , are modeled as functions of these combinations, where T k = β 0k + β T k Z, k = 1, 2, ..., K, Y k = f k (X) = g k (T), k = 1, 2, ..., k, in which Z = (Z 1 , Z 2 , ..., Z M ), and T = (T 1 , T 2 , ..., T k ). The activation function, sigmoid, γ(υ) is given as: The output function g k (T) allows a final transformation of output vector T. In regression studies, function g k (T) is defined as the identity, in other words, g k (T) = T. However, when the network is used for classification, purpose of the present study, in one of the k groups, the softmax function is used, ( ) , which produces positive estimates whose sum is one (Hastie et al. 2009).
The estimate of the set of all parameters of the network (ϴ), known as weights, {α 0m , α m ; m = 1, 2 ..., M} and {α 0k , α k ; k = 1, 2 ..., K}, is carried out by minimization of the sum of square errors, , and the corresponding classifier is given as: G(ϴ) = arg max k f k (x). The function minimization is carried out by application of the descending gradient algorithm, known as back-propagation (Hastie et al. 2009).
To initialize the training process of the network, i.e., to obtain the weights, it is necessary to define initial values. According to Venables and Ripley (2002) the initial values of the process should be chosen randomly in the range, where their limits must satisfy the equation LS* max(| x |) ≈ 1, where LS denotes the upper limit of the range and max (| x |) is the largest absolute value the set of training data.
After the network training and test stage, where it was considered a maximum error of 2% for test set, the set of data of alfalfa genotypes was presented to the network for classification.
The classification as for adaptability was carried out based on the classification in one of the first three classes ( Table 1). As to stability, considering Finlay and Wilkinson (1963) concept, the genotype is described as high stability if, after its linearization the classification for adaptability is not altered and, as low stability if it is altered. The evaluation of adaptability and stability of genotypes was also carried out by the methodology proposed by Eberhart and Russell (1966).
To evaluate adaptability and stability of the 92 alfalfa genotypes in study, using concepts presented in the methodology of Eberhart and Russel (1966), through a neural network, the nnet function of the nnet package (Venables and Ripley 2002) implemented in software R (R Development Core Team 2010) was used, whose codes are available at: http://www.det.ufv.br/~moyses/links.php. The analysis regarding the methodology of Eberhart and Russell (1966) was performed using the Genes software (Cruz 2006).

RESULTS AND DISCUSSION
Significant differences were observed between genotypes (Table2), indicating the existence of genetic variability between genotypes for dry matter production. The existence of the genotype x cutting interaction (P ≤ 0.01) was also verified, indicating that the genotypes present distinguishedperformance in face of the different environmental conditions. Therefore, the necessity of further studies on the behavior of cultivars towards these variations by adaptability and stability analysis is observed.
Out of the 92 genotypes, 74 were classified as general adaptability, out of which 45 have higher mean than the general average (1176.84 kg ha -1 ) and are described as high predictability. presented, respectively, same classification as for adaptability and stability, by the methodology of Eberhart and Russell (1966) (Table 3). Among the discordant classifications, the Rio genotype was classified by neural networks the similar way than in study of the Ferreira et al. (2004).
Moreover, the Rocio and WL 612 genotypes, classified as general adaptability by network have been classified as specific adaptability to unfavorable environments by the method of Eberhart and Russell (1966), while in Vasconcelos et al. (2008) these same genotypes were classified as specific adaptability to favorable and unfavorable, respectively.
Nine genotypes (Primavera, Topper, Candombe, WL 516, F 686, Barbara, Lujan, WL 525, Sequel) were classified as specific adaptability to unfavorable environments and nine (Activa, Aurora, Sundor, Prointa Patricia, Prointa Lujan, Platino, Kern, Key II, Aca 901) as specific adapt-ability to favorable environments. Percentage agreements for adaptability and stability were respectively, 89 and 78% and 100 and 100% regarding results obtained by Eberhart and Russell methodology (1966), for genotypes described as specific adaptability to unfavorable and favorable environments respectively (Table 3). Ventura et al. (2009) calculated percentage of coincidence among breeding values for weight at 205 days in cattle Tabapuã, originating from the neural networks and the values predicted by BLUP. Considering the first hundred animals, the percentage was 66% and for subsequent classifications matching the value was even lower (26%). Guided by results the authors did not recommend the use of neural networks in genetic evaluations when to insert new animals in the future that are not contained in the database trained.
Due the high agreement rates in relation to the results, as adaptability, obtained by the methodology of Eberhart and Russell (1966), the neural networks showed an alternative to classification of genotypes. Regarding stability, the lowest percentage of agreement can be explained by the concept of used in the network. This concept is based  Eberhart and Russell methodology (1966)
Another point interesting is the possibility of simulation of the genotypes based in different methodologies of adaptability and phenotypic stability in which it is possible create classes of responses.
Despite the satisfactory results obtained by the network, it is important to mention that further studies are needed to evaluate the real efficiency of the technique in such situations. These studies, based on simulation, would clarify if the neural networks are more efficient than other adaptability and stability methodologies.
In future studies we intend to perform simulations for different scenarios in order to verify if the neural networks can be useful to work around problems related to the small number of environments and loss of observations.
Besides, due to the non-linear structure (Bishop 1995), neural networks capture more complex characteristics pertaining to the set of information and do not require detailed information of the process to be modeled the neural networks has great potential in plant breeding.