Prediction of hybrid means from a partial circulant diallel table using the ordinary least square and the mixed model methods

By definition, the genetic effects obtained from a circulant diallel table are random. However, because of the methods of analysis, those effects have been considered as fixed. Two different statistical approaches were applied. One assumed the model to be fixed and obtained solutions through the ordinary least square (OLS) method. The other assumed a mixed model and estimated the fixed effects (BLUE) by generalized least squares (GLS) and the best linear unbiased predictor (BLUP) of the random effects. The goal of this study was to evaluate the consequences when considering these effects as fixed or random, using the coefficient of correlation between the responses of observed and non-observed hybrids. Crossings were made between S1 inbred lines from two maize populations developed at Universidade Federal de Goiás, the UFG-Samambaia “Dent” and UFG-Samambaia “Flint”. A circulant inter-group design was applied, and there were five (s = 5) crossings for each parent. The predictions were made using a reduced model. Diallels with different sizes of s (from 2 to 5) were simulated, and the coefficients of correlation were obtained using two different approaches for each size of s. In the first approach, the observed hybrids were included in both the estimation of the genetic parameters and the coefficient of correlation, while in the second a cross-validation process was employed. In this process, the set of hybrids was divided in two groups: one group, comprising 75% of the original group, to estimate the genetic parameters, and a second one, consisting of the remaining 25%, to validate the predictions. In all cases, a bootstrap process with 200 resamplings was used to generate the empirical distribution of the correlation coefficient. This coefficient showed a decrease as the value of s decreased. The cross-validation method allowed to estimate the bias magnitude in evaluating the correlation coefficient using the same hybrids, to predict the genetic parameters and the correlation evaluation. The bias was shown to be greater when the OLS method was used. When the correlation coefficients of the observed and estimated hybrid means were obtained through the mixed instead of the fixed model, this decrease was less marked. The selection of hybrids superior to the checks, in terms of grain weight, also differed in the two different approaches. Nineteen percent of the hybrids were shown to be superior to the checks in the fixed models, while only 1.8% of them were superior in the mixed model.


Introduction
Nowadays, one of the major obstacles to corn breeding programs which aim to develop hybrids is the high cost of field evaluation. The strategy originally adopted in breeding programs was to perform all possible crossings in a group of inbred lines and then make an evaluation of the single hybrids obtained, followed by the selection of the most promising ones. However, as the breeding programs became larger, thousands of inbred lines became available. This made the development and evaluation of all possible hybrids extremely difficult, mainly because of the high cost of the assessment phase. So, there was an urgent need to develop procedures to allow the evaluation of a large number of inbred lines from a small sample of hybrids. The prediction of non-observed hybrid performance became possible through the use of genetic parameter estimation. Consequently, estimators or predictors have been sought, in order to maximize the correlation between estimated or predicted genetic values and parametric genetic values. Diallel tables have been one of the main tools for estimating genetic parameters, not only because they provide great amounts of information, but also because of the flexibility in constructing them. For predictive analysis, the scheme proposed by Kempthorne and Curnow (1961), based on a sample of all possible crossings between a group of parents, referred to as a circulant diallel cross, is noteworthy. Miranda Filho and Vencovsky (1999), using Griffing's model (1956), and Reis (2000), using the model proposed by Gardner and Eberhart (1966), adjusted the circulant design to an interpopulation level. In order to achieve this, it was necessary to obtain the hybrid combinations ps/2, where p is the number of parents and s is the number of combining hybrids in each participating parent. In the second case, evaluation of the parents is also required. When comparing the complete diallel tables, the great reduction in the number of crossings is striking, mainly when there is an increase in the value of p.
One way of evaluating the predictive capacity of the model, that uses the estimates from a circulant diallel table, is by applying the Pearson correlation coefficient between the responses of the predicted and observed hybrids. Andrade (1995), using s = 3, found correlation coefficients varying from 0.82 to 0.96. On the other hand, Araújo (2000), using s = 4, found a correlation of 0.86, and Fuzatto (2003) observed correlations between 0.685 and 0.925, using values of s from 6 to 2. The correlation increased as the value of s decreased. All these authors evaluated ear weight. Gonçalves (1987), using s = 3, observed correlations from 0.92 to 0.86 related to the grain weight. In all these experiments, the genetic parameters (general and specific combining ability) were estimated by using the ordinary least squares (OLS) method.
On the other hand, in a circulant diallel table, there is an interest in extrapolating the information obtained about the observed hybrids to a reference population of nonobserved hybrids [(p/2) 2 -ps/2)]. As emphasized by Searle et al. (1992), the main issue is to quantify the performance of a non-realized random variable (non-observed hybrids), given an observation vector (realized observation). Therefore, in this context, according to Henderson (1986), the use of BLUP (Best Linear Unbiased Predictor) would be the most appropriate method to predict the genetic parameters. The use of BLUP in plant breeding has also been advocated by Bernardo (1994Bernardo ( , 1995Bernardo ( , 1996aBernardo ( , 1996b. In this particular case, the error variance and the other variance components will influence the genetic parameter estimation, making it possible to obtain the BLUE (Best Linear Unbiased Estimator) for fixed effects and the BLUP for random effects, which is the appropriate approach for mixed linear models (Henderson, 1984). In this method, the known covariances will be considered not only in the statistical tests, but also in the assessment and prediction of effects which directly influence the selection of the inbred lines. In general, the corresponding estimators have lower variances than the ones obtained through the OLS, thus resulting in more reliable estimation (Duarte and Vencovsky, 2001). André (1999) concluded that the BLUP provides better accuracy than the OLS estimators in predicting the general combining ability effects in different conditions of heritability. Besides being possible when the information about co-ancestry between the inbred lines is available, it is also possible to consider the additive effects, the dominant effects and the epistatic interactions. The main restriction found to the use of this approach is its great computational requirement, which no longer represents an obstacle.
The purpose of this paper was to evaluate the efficiency of the mixed linear models methodology in analyzing a partial circulant table, with varying sizes of s. This evaluation was performed mainly by correlating the predicted and the observed values of the hybrids.

Material and Methods
Two groups of parents, 34 flint maize inbred lines S 1 and 34 dent maize inbred lines S 1 , randomly sampled from two populations, the UFG-Samambaia flint and the UFG-Samambaia dent, were used as the experimental material. These populations were developed at Universidade Federal de Goiás (EA-UFG). The crossings were performed according to a partial circulant diallel design, with five crosses for each parent (s = 5) (Table 1), where 165 out of Reis et al. 315  170 originally predicted hybrids were obtained, representing the reference population for the 1156 possible hybrids between these two inbred line groups. These hybrids were evaluated through a randomized complete block design, with four replications. The experimental plots were represented by single rows 5 m long spaced 0.9 m apart, with 25 plants per plot after thinning. The triple hybrid BR-3123 was used as a check, and planting was done on January 6, 1999, in the experimental area at the EA-UFG. Griffing's model (1956) was adopted to describe the observations of the diallel table: where: y ij is the phenotypic value of the hybrids between the dent line i (i = 1, 2, ..., I) and the flint line j (j = 1, 2, ..., J); µ is the mean common to the observations; g i is the general combining ability effect of the i th parent from the dent group, assumed to be random and with the distribution N~(0, σ 2 LD ); g j is the general combining ability effect of the j th parent from the flint group, assumed to be random and with the distribution N~(0, σ 2 LF ); s ij is the specific combining ability effect resulting from the crossing between the parents i and j, assumed to be random and with the distribution N~(0, σ 2 CEC ); and ε ij is the random error effect with the distribution N~(0, σ 2 ).

Fixed model
In the matrix form, the hybrid means can be represented by: where y is the mean treatment vector, X is the incidence matrix of the genetic effects, β is the parametric vector, and is the error vector. As X is an incomplete rank column matrix, X'X is singular, not having a single inverse. Therefore, in order to solve the system of normal equations and to obtain single solutions, the following parametric restrictions were adopted: Thus, the OLS solutions are given by: (3)

Mixed linear model
The individual observations can be expressed in the matrix form as follows: where: y is the observation vector; θ is the vector of fixed effects, which here includes the general mean and the block effect; a X is the vector for the general combining ability of the dent inbred lines; a Y is the vector for the general combining ability of the flint inbred lines; d is the vector for the specific combing ability; ε is the error vector; and X, Z 1 , Z 2 and Z are the incidence matrices for vectors θ, a X , a Y , and d, respectively.
In this case, applying generalized least squares (GLS) to calculate the fixed effects and the best linear unbiased prediction for the random effects, as proposed by Henderson (1984), the solutions of the mixed model equations can be obtained by: or: . and Using the expectation maximization-restricted maximum likelihood (EM-REML) algorithm (Dempster et al., 1977) to obtain the solution of this system, the variance component estimators are given by: where: p 1 , p 2 , p, and s are the numbers of flint inbred lines, dent inbred lines, total number of inbred lines and the number of crosses for each inbred line, respectively. In (6), r(X) is the rank of X, and Tr is the trace operation. As the inbred lines were considered unrelated and since the two groups 316 Circulant diallel with mixed model are not related to each other, the matrices A 1 , A 2 and D are identity matrices. When assuming the existence of coancestry between the parents, matrices A 1 and A 2 will present values equal to 1.0 in the diagonal and the co-ancestry coefficient between parents off the diagonal. Thus the diagonal of matrix D is also composed of values equal to 1.0, and the off-diagonal values are the products of the coancestry coefficients between the parents. An interactive process was conducted, in accordance with (6) and (5), until a convergence was obtained, attributing an initial randomized value to the variance components. As only the estimates of the variance components, and not their parametric values, were known, the EBLUP (Empirical Best Linear Unbiased Predictor) was obtained from the random effects (Littel et al., 1996). However, for the selection based on isolated traits, the rank of the candidates for selection is not as influenced by errors in the estimation of variance components (Resende, 2002), when there are balanced data and when only one population is considered (Duarte and Vencovsky, 2001).
The diallels were simulated to evaluate the models' goodness of fit and the way that correlation is obtained, in order to make the predictions of non-observed hybrids. The s sizes ranged from 5 to 2, and the correlation was made in two different ways. First, the correlation was obtained by using the sample of observed hybrids to calculate both the parameters and the coefficient of correlation. Second, a cross-validation procedure was applied to the original data set. This set was randomly divided into two groups: one, constituting 75% of the original set, was used to estimate the genetic parameters, and the other, composed of the remaining observed hybrids (25%), was used to validate the predictions. In all cases, a process of 200 resamplings with replacement (bootstrap) was employed to generate empirical distributions of the correlation coefficient estimates.

Results and Discussion
An increase was found in the correlation coefficient as the value of s was reduced, when the observed hybrids were used both in the estimation of the genetic parameters and of the correlation coefficient. However, the standard deviations associated with those estimates increased as s decreased. The value of the correlation coefficient decreased from 0.916 ± 0.0727 (s = 2) to 0.742 ± 0.0090 (s = 5), using the OLS, and from 0.851 ± 0.0217 (s = 2) to 0.733 ± 0.0049 (s = 5), when the mixed model was applied ( Table 2).
The opposite results were found when crossvalidation was employed, that is, the value of the correlation coefficient increased as the value of s increased. Likewise, the related standard deviation decreased with the increase of s. When the OLS method was applied, the correlation coefficient varied from 0.260 ± 0.1217 (s = 5) to 0.100 ± 0.2441 (s = 2), while with the mixed model the variation ranged from 0.370 ± 0.1063 (s = 5) to 0.120 ± 0.2278 (s = 2) ( Table 3). It is interesting to note that the greatest mean values of the correlation coefficient (r = 0.41) were obtained when using s = 4, in the analysis made through the mixed model. The empirical distributions of correlation co-efficient estimates for each case are shown in Figure 1. It is important to highlight that the maximum theoretical limit of this correlation is not 1.0, but the square root of the heritability coefficient (Vencovsky and Barriga, 1992). In the present work, this limit was equal to 0.734, which does not seem so unrealistic when compared with the correlations found by Bernardo (1996a). This author evaluated 4099 hybrids among several heterotic groups, using the mixed model method associated with the co-ancestry data between the parents. In his experiment, the correlations ranged from 0.136 to 0.762, with theoretical maximum limits of 0.554 and 0.864, respectively.
It is relevant to emphasize that the increase observed in the correlation coefficients, when the size of s is decreased, does not mean that the lower values of s allow better predictions. As stated by Gauch and Zobel (1988), it 318 Circulant diallel with mixed model means that the correlation is measuring the postdictive ability of the model, that is, with the decrease in s, the model can better explain the observed data. Moreover, when the correlation coefficients were evaluated through a crossvalidation process, an increase of the correlation coefficients was observed whenever the values of s were increased. In this situation, not only the ability of the model to predict non-realized observations is evaluated, but also its ability to describe the set of observed data. Thus, it is possible to assess the predictive ability of the model by approaching its predictions to the data not included in the analysis, simulating future responses that have not been measured yet.
It is thus clear that a reduction in the value of s also decreases the predictive potential of the model. Furthermore, the correlation coefficient calculated through the observed hybrids, during both the assessment of $ g i and of $ g j and the model's validation, yields bias. This bias can be calculated assuming that the average correlation coefficient obtained by using the cross-validation is the parametric value for each value of s. In this case, an increase in the bias of the correlation coefficient estimate can be observed when the value of s decreases (Table 4). The bias is of greater magnitude when the OLS method is employed in the analysis, ranging from 0.482 with s = 5 to 0.816 with s = 2. Using the mixed models, the value found was 0.363 with s = 5 and 0.731 with s = 2. Another indicator of this bias can be observed in Figure 1, where the distributions obtained using cross-validation do not exceed the maximum theoretical limit of the correlation coefficient (MC). However, this is not true for the first situation.
If hybrids with yield mean superior to the check mean were to be selected, considering all possible hybrids in the diallel table (1156 hybrids), and if the prediction was made through OLS analysis with s = 5, 19% of the hybrids would be selected. By using a mixed model, only 1.8% would be selected. However, Spearman's correlation coefficient between the ranks of hybrid means by the two analyses was equal to 0.95.
The use of the mixed model approach was more efficient than the OLS in the operation and management of this data set, resulting in more accurate estimates of correlation coefficients between observed and non-observed hybrids. Values of s < 4 have yielded poorer predictions for both the mixed model and the OLS analysis. The use of the same data set to estimate the parameters and to evaluate the Reis et al. 319 Figure 1 -Empiric distributions of the correlation coefficients between the predicted and observed hybrid means, considering the two following situations: Situation 1: the observed hybrid means are included in the estimation of the genetic parameters and in the calculation of the correlation coefficient. Situation 2: the observed hybrid means were assigned to two different groups, where 75% of them were used to calculate the parameters, and the remaining 25% to estimate the correlation coefficient (MC is the maximum limit of the correlation coefficient, OLS is the ordinary least square estimation, and MM is the mixed model approach). The circulant crossing method, with s = 5 and s = 4, associated with the methodology of mixed models, allowed to predict non-observed hybrid means, and showed good reliability, which is very important in the initial stages of the evaluation of inbred lines.