Predicting performance of soybean populations using genetic distances estimated with RAPD markers

In order to verify whether genetic distance (GD) is associated with population mean (PM), genetic variance (GV) and the proportion of superior progenies generated by each cross in advanced generations of selfing (PS), the genetic distances between eight soybean lines (five adapted and three non-adapted) were estimated using 213 polymorphic RAPD markers. The genetic distances were partitioned according to Griffing’s Model I Method 4 for diallel analysis, i.e., GDij = GD + GGDi+ GGDj + SGDij. Phenotypic data were recorded for seed yield and plant height for 25 out of 28 populations of a diallel set derived from the eight soybean lines and evaluated from F2:8 to F2:11 generations. No significant correlation for seed yield was detected between GD and GV, while negative correlations were detected between GD and PM and between GD and PS (r = -0.74** and -0.75**, respectively). Similar results were observed for the correlation between GGDi + GGDj and PM and between GGDi + GGDj and PS (r = -0.78** and -0.80**, respectively). No significant correlation was detected for plant height. The magnitudes of the correlations for seed yield were high enough to allow predictions of the potential of the populations based on RAPD markers.


Introduction
Selection in soybean breeding programs is carried out in endogamic populations obtained by artificial hybridization followed by several generations of selfing.Several endogamic populations are obtained and evaluated annually, but not all of them have sufficient potential to produce genotypes with superior performance.Previous knowledge of the potential of a population may greatly increase the efficiency of plant breeding programs, permitting the early elimination of unpromising populations or even the avoidance of their formation.Several methodologies have been proposed to predict the performance of quantitative traits in endogamic soybean populations, with some of these methodologies being based on estimates of genetic variance in early generations (Toledo, 1987;Triller and Toledo, 1996) or on the mean components obtained from diallel crosses (Pulcinelli, 1997).However, such methodologies can only be employed when the populations have already been obtained and after intermediate endogamy levels have been achieved, so although the predictions are effective they require a significant amount of time and effort to produce.
The genetic diversity among the potential parental lines may supply useful data on derived endogamic populations.The coefficient of parentage (CP) is the most frequent estimator of genetic diversity in soybean breeding programs and measures the quantity of loci occupied by alleles identical by descent.This information is obtained from the genealogies of the lines, but the CP is not a suitable measure when the genealogies of the parental lines are incomplete or even unknown.
Genetic distance (GD) based on DNA markers measures the quantity of loci occupied by specific marker alleles alike in state and is a wider measure of the diversity than the CP and can be obtained for any set of lines.In theory, genetic distance based on molecular markers has a greater potential for predicting the performance of soybean populations, but the literature shows an apparent dependence on the population set used (Helms et al., 1997;Kisha et al., 1997;Manjarrez-Sandoval et al., 1997).The study reported in the current paper was carried out to verify whether the genetic distance based on RAPD markers is useful for predicting the performance of soybean populations derived from two-way crosses for a set of parents that include adapted and non-adapted inbred lines, in a diallel scheme.
About 500 mg of leaf tissue was used to extract and purify genomic DNA (Rogers and Bendich, 1985), which was quantified in 0.8% (w/v) agarose gels by comparison with known quantities of the lambda phage DNA.Amplification reactions were carried out in medium containing PCR buffer (10 mM Tris-HCl pH 8.3, 50 mM KCl), 4 mM of MgCl 2 , 0.2 µM primer, 1.5 units of Taq DNA polymerase, 0.2 mM of dNTP and genomic DNA from the soybean lines (Vieira et al., 1997).Ninety-six primers from the A, B, C, D, E, and F kits of a commercial supplier (Operon Technologies) were used.Forty-five amplification series were carried out after the initial denaturation at 94 °C for 5 min, with each cycle consisting of DNA denaturation at 94 °C for 1 min, primer annealing at 35 °C for 1 min and DNA amplification at 72 °C for 2 min with a 5 min extension at 72 °C at the end of the 45 cycles.Two concentrations of genomic DNA (40 ng and 60 ng) were used for each genotype/primer combination and only the results confirmed in both reactions were considered in the genotype analysis.Amplification products were separated by electrophoresis in 1.4% (w/v) agarose gel, using TBE buffer (0.09 M Tris, 0.09 M boric acid and 2 mM EDTA), and the gels were stained with ethidium bromide and evaluated under UV light.

Marker analysis
Genetic distance estimates were obtained by evaluating markers polymorphic for at least one line, and the genetic similarity between lines was calculated using the Simple Matching (SM) coefficient (Sneath and Sokal, 1973) and converted to genetic distance (GD), with GD = 1 -SM.Resampling was carried out by bootstrapping to check if the number of polymorphic markers was sufficient to supply precise estimates of the genetic distances (Tivang et al. 1994).Sample size and mean coefficient of variation were used in the construction of a scatter-plot.
The genetic distances (GD) for the 28 combinations were partitioned into a mean component (GD), a general component (general genetic distance, GGD) and a specific component (specific genetic distance, SGD) according to Griffing's Model I Method 4, for diallel analysis, i.e.GD ij = GD + GGD i + GGD j + SGD ij (Melchinger et al., 1990).

Correlation between population parameters and diversity estimates based on RAPD analysis
Pearson correlation coefficients were calculated for GD, the sum of the general genetic distances (GGD i + GGD j ) and the SGD with the population mean (PM), the genetic variance (GV) and the proportion of progenies with the mean above the general population mean (PS), for seed yield and plant height at maturity.The seed yield and plant height phenotypic data were obtained by Pulcinelli (1997) in a population set comprised of 25 out of 28 populations derived from two-way crosses among the eight parents (incomplete diallel set, since the crosses 1x4, 2x7 and 3x4 were not available).The populations were derived from F 2 plants according to the "bulk within progenies method" for six generations.The evaluation trials were carried out from the F 2:8 to F 2:11 generations (four years) in a 5x5 triple lattice design, each generation being represented by a random sample of 20 progenies.Therefore, plots consisted of 20 progenies, each one represented by one 1.0 m long row spaced by 0.5 m, with 17 plants after thinning.Data were recorded for each row (sub-plots) for the traits seed yield and plant height at maturity.The analysis of variance was performed on a plot-mean basis for each year and combined across the four years.For each cross (population) data of population mean (PM), genetic variance among progenies (GV) and the proportion of superior progenies (PS) were estimated.The percentage of progenies with the mean above the general mean of all crosses (general population mean) from the combined analysis was considered for PS.

Parental line analyses using RAPD markers
Analysis of the eight parental lines with the 96 primers resulted in 213 polymorphic markers out of a total of 1,139 RAPD markers, a polymorphism rate of 18.7%.The mean number of loci sampled per primer was 11.87 and the mean polymorphism was 2.22.The mode of the number of polymorphic loci sampled by the primers was two, with extremes of zero and seven.
The genetic distances (GD) between soybean lines have shown that the smallest GD (0.14) was between Gaúcha and OC-79230 lines while the greatest GD (0.50) was between BR-80-14853 and PI-165896 lines and between PI-123439 and PI-165896 lines (Table 1).
Figure 1 shows that the mean coefficient of variation (CV) of the genetic distances between lines decreased as the number of markers increased, as expected.The decrease was at a rate of more than 0.5% up to 103 markers, with the CV reaching about 5% with 213 markers, indicating that this was a sufficient number to obtain reliable GD estimates between the eight lines.
The general genetic distance estimates (GGD) were usually greater than the specific genetic distance estimates (SGD), although the SGD was higher than the GGD in some combinations, e.g.Gaúcha and OC-79230 (Table 2).Since GGD accounted for 74% and SGD for 26% of the GD variability (data not shown), the greater part of the variability measured by the markers was due to the GGD.

Correlation between genetic distances and genetic variances
Table 3 shows that GD and SGD were not good predictors of the genetic variance (GV) of the populations, with non-significant correlation coefficients for seed yield and plant height for most of the generations.There was a week correlation (p < 0.05) for plant height in the F 2:9 generation, but the magnitude (0.44) is not useful in predicting the population genetic variances.Other studies carried out on soybeans show that the magnitude of the correlation be-tween GD as estimated by molecular markers and GV depends on the population set and the environmental conditions under which the GV is estimated (Helms et al. 1997;Kisha et al., 1997;Manjarrez-Sandoval et al., 1997).
The GV estimates showed broad confidence intervals (Pulcinelli, 1997) and, according to Gumber et al. (1999), the error associated with such estimates may cause the lack of correlation with GD.Natural selection in the selfing generations may also have caused bias in the estimates and it is possible that the present errors and bias may have changed the magnitude of the GV estimates so that the correlation with the GD could not be detected.
It is assumed that the GD based on RAPD markers provides data covering the whole genome (Ferreira and Grattapaglia, 1998) and that GV is composed exclusively of quantitative trait loci (QTL) effects which are segregating in the populations.Since the distribution of RAPD markers and QTLs responsible for GV might be different, it is probable that some of the QTLs are not linked to any marker and some of the markers are unlinked to QTLs.Theoretical studies have shown that when this occurs there is a reduction in the association between GD and heterosis Prediction of soybean population performance by genetic distance 345      (Bernardo, 1992;Charcosset and Essioux, 1994), which should be valid for genetic variances as well.It is probable that GV can only be predicted if the diversity estimate is obtained by markers linked to segregating QTLs rather than by a set of markers obtained by a random sampling of the genome (Helms et al., 1991).

Correlation between genetic distances and population means
The correlation between plant height and GD and between plant height and GGD i + GGD j was very low for all generations and in the combined analysis (Table 4).However, for seed yield these correlations were very high and negative (r = -.074** and -0.78**, respectively in the combined analysis).Genetic distance tended to be low between adapted lines, intermediate between the adapted lines and the non-adapted lines and high between non-adapted lines themselves (Table 1).Pulcinelli (1997) found that seed yield generally was greater in populations derived from adapted parents, intermediate when one of the parents was adapted and the other non-adapted, and lower when both parents were non-adapted.The negative correlation with mean seed yield found in the present study may be explained by the two types of parents (adapted and nonadapted) used to develop the populations.
Figure 2A shows the relationship between seed yield and genetic distances (GD), while 2B shows the relationship between seed yield and the sum of the general genetic distances (GGD i + GGD j ).The graphs were divided into quadrants based on the median for seed yield and GD (Figure 2A) or the median for seed yield and GGD i + GGD j (Figure 2B).Using the median of the diversity estimates as selection limits, the populations selected fall in the quadrants 1 and 4 and the unselected populations fall in quadrants 2 and 3. Populations with mean seed yield equal or higher than the median yield value were plotted in quadrants 1 and 2. Considering that it is desirable to keep only those populations that yielded equal or more than the median, those populations that fall into quadrants 2 and 4 would be mis-unselected or mis-selected, respectively, if the selection were based on GD. Figure 2A shows 19 populations in quadrants 1 and 3, while Figure 2B shows 21 populations in the same quadrants, indicating that GD and GGD i + GGD j can be used as indicators of the potential of the populations.It therefore seems that selections based on GGD i + GGD j were somewhat superior to those based on GD.Although these differences were small they were enough to slightly improve the predictive capacity.It should be pointed out, however, that this conclusion is restricted to this population set, and more studies are needed to verify whether or not the use of GGD i +GGD j provides any advantage.
Correlation between genetic diversity and population mean are not usually estimated because it is generally thought that genetic diversity can predict the variance but not the mean.However, when soybean populations are derived from adapted and non-adapted genotypes the mean is often considered more important than the variance in determining the potential of the populations (Aschoener and Fehr, 1979;Vello et al. 1984;Ininda et al. 1996).It seems that there is a strong tendency for the alleles present in adapted lines to contribute to increase mean seed yield of the populations, while alleles present in non-adapted lines seem to decrease yields.In the specific case of soybean in Brazil, the genetic base of the adapted materials is narrow (Hiromoto and Vello, 1986) and the high degree of parent- 346Barroso et al.   age between the genotypes leads to smaller genetic distances.When both these factors are present it is probable that the estimates of parental diversity and population means are correlated.
Correlation between genetic distances and the proportion of superior progenies Although no correlations were found for plant height, the proportion of superior progenies for seed yield was highly correlated with GD and GGD i + GGD j , but not with SGD (Table 5).The GD and GGD i +GGD j correlations for seed yield were negative (r = -0.75**and -0.80**, respectively in the combined analysis) and similar to those obtained with the population means (Table 4), while the SGD correlations were similar to those obtained with the genetic variances (Table 3).This is a strong indication that the population potential was determined almost exclusively by the mean and that the genetic variance had little influence.
The GD and the GGD i + GGD j correlations were high enough for reliable prediction of the seed yield of the populations.Both GD and GGD i + GGD j predict that some populations derived from crosses between adapted and non-adapted parents may have greater potential for breeding purposes than populations obtained exclusively from adapted parents.This is an important characteristic of this population set, since the populations derived from the OC-79230 X PI-239235 (3x8) and BR-80-8858 X PI-239235 (4x8) crosses were among the five populations with the highest proportion of progenies with superior seed yield (Pulcinelli et al., 1997).
Figure 3 shows the relationship between the proportion of superior progenies for seed yield and GD (Figure 3A) and the proportion of superior progenies and GGD i + GGD j (Figure 3B).The correlation was a little stronger for GGD i + GGD j since only one population falls outside quadrants 1 and 3, while for GD three populations fall outside these quadrants.These results are similar to those obtained for the correlation between genetic distances and seed yield (Figure 2).Therefore, the small increases in the correlation coefficients which occurred after separating the different genetic distance components were sufficient to increase the predictive capacity for estimating the seed yield potential of the populations.
These results show that genetic distance and general genetic distance as analyzed by RAPD marker analysis were able to predict the seed yield potential of soybean crosses, for the population mean and the proportion of superior progenies, but not for plant height at maturity.It is probable that populations with similar characteristics can have their potential predicted by genetic diversity estimates.
Prediction of soybean population performance by genetic distance 347 Figure 3 -Relationship between the proportion of superior progenies for seed yield as estimated in the combined analysis with (A) genetic distances (GD) and (B) the sum of general genetic distances (GGD i + GGD j ).Quadrants are numbered 1-4 (see text for explanations).
Table 5 -Correlation coefficients between the proportion of superior progenies (PS) and genetic distances (GD), the sum of general genetic distances (GGD i + GGD j ) and specific genetic distances (SGD) for seed yield and plant height at maturity, for F 2:8 to F 2:11 soybean generations and for the combined analysis.

Figure 1 -
Figure 1 -Mean coefficients of variation of the genetic distances between lines, estimated by the bootstrap procedure for different marker sample sizes.

F
at p = 0.01.

Figure 2 -
Figure2-Relationship between population means for seed yield as estimated in the combined analysis with (A) genetic distances (GD) and (B) the sum of general genetic distances (GGD i + GGD j ).Quadrants are numbered 1-4 (see text for explanations).

Table 2 -
Mean component (GD), general genetic distance (GGD i ) for each line and specific genetic distance (SGD ij ) between lines.

Table 3 -
Correlation coefficients between genetic variances (GV) and genetic distances (GD) and specific genetic distances (SGD) for seed yield and plant height, for F 2:8 to F 2:11 soybean generations and for the combined analysis.*Significant at p = 0.05.

Table 4 -
Correlation coefficients between population means (PM) and genetic distances (GD) and the sum of general genetic distances (GGD i + GGD j ) for seed yield and plant height, for F 2:8 to F 2:11 soybean generations and for the combined analysis.