Selection of progenitors for increase in oil content in soybean

Submitted on February 23 , 2015 and accepted on July 5 , 2016. 1 This work is part of the doctoral thesis of the first author. 2 Federal University of Viçosa, Viçosa, Minas Gerais, Brazil. josianeisabela@gmail.com; kleverantunes@yahoo.com.br; piovesan@ufv .br 3 Agronomic Institute of Paraná, Londrina, Brazil. kleverantunes@yahoo.com.br 4 Catholic University of Brasília, Post Graduation program in Genomic Sciences and Biotechnology , Brasília, Distrito Federal, Brasil. everaldodebarros@gmail.com 5 In memoriam. *Corresponding author: josianeisabela@gmail.com Selection of progenitors for increase in oil content in soybean 1


INTRODUCTION
The use of elite germplasm in crosses maintains a narrow genetic base of soybean [Glycine max (L.) Merrill] varieties cultivated in Brazil. Historically, the genetic variability has been little explored by breeding programs. The low genetic diversity brings limitation to breeding, because genetically similar genotypes share alleles in common, causing little complementarity and low vigor due to the low levels of heterozygosity in crosses. On the other hand, superior genotypes are most likely in producing populations with high genetic variability, which are resulted from crosses between genetically distant progenitors and with high phenotypic values.
In the 80's, the soybean varieties cultivated in Brazil were introductions of the Southern U.S. and the remaining came from hybridization of the North American introductions. Seventy-nine cultivars at that time descended from 26 ancestors and 11 formed 89% of the gene pool (Hiromoto & Vello, 1986). Other 69 cultivars showed an average value of relatedness of 0.16 and effective population size estimated between 11 and 15 (Vello et al., 1988). One hundred cultivars released between 1984 and 1998 had an average value of relatedness of 0.21 (Bonato et al., 2006a). In addition, the same degree of relatedness and an effective population size equal to 11 were estimated among 90 elite cultivars (Miranda et al., 2007).
Genetic relationships among Brazilian cultivars based on molecular markers are also reported in the literature. Three hundred and seventeen cultivars released between 1962 and 1998 had similarity indices between 0.17 and 0.97 and average similarity of 0.6 (Bonato et al., 2006b). Vieira et al. (2009) observed similarities between 0.27 and 0.98 and average similarity of 0.53 in another group of 53 genotypes. Other 168 cultivars showed similarity indices between 0.01 and 0.9 and average similarity of 0.42 (Priolli et al., 2010). Therefore, the low values of effective size, the high estimates of relatedness, and the elevated similarity indices indicate a high similarity in the improved cultivars. Moreover, by the estimates, a relatively low level of diversity is maintained in soybean germplasm from different breeding programs in Brazil.
Molecular markers, in general, can be useful in predicting genetic variability for the development of populations, and greater predictive ability for molecular markers is expected when the diversity estimates consider the positions of QTLs (quantitative trait loci). These estimates, when predicting genetic variability, are probably most effective when based on molecular markers located in QTL regions of the target trait in relation to random regions in the genome (Rodrigues et al., 2015). The objective of this work was to identify progenitors genetically divergent and with high oil contents in the grains for increase in oil content and in genetic basis of soybean breeding programs.

Genetic material and determination of oil content
Twenty-two soybean genotypes from the germplasm bank of the Soybean Quality Breeding Program of the Federal University of Viçosa, with wide variation in oil content including cultivars with high oil contents, were cultivated in Viçosa, MG (Dec 2009, 20°452 S, 42°522 W), Visconde do Rio Branco, MG (Feb 2010, 21º00' S, 42º50' W) and São Gotardo, MG (Feb 2010, Oct 2011' W) using the randomized block design with three replications. In the trials, fifteen seeds were sown per row of 1 m with 0.5 m spacing between rows . The grains were ground in an industrial mill (model MA020, Marconi) and the soybean flour was analyzed for oil content by infrared spectrometry using a FT-NIR spectrometer (model Antaris II, Thermo Scientific). A combined analysis of variance was performed, considering the effect of genotypes (G i ) as fixed and the effect of environments (E j ) as random. The components of variance were estimated according to Cruz et al. (2014). The percentages of the simple and complex parts of the mean square of the genotype x environment interaction (MSGxEjj') were calculated according to Cruz & Castoldi (1991).

Analysis of genetic diversity
The polymorphism information content of the molecular markers was calculated according to Cruz et al. (2011). Three matrices of dissimilarity were calculated through the complements of the unweighted and weighted similarity indices and Smouse and Peakall d 2 index. Using the estimates of the complement of the weighted similarity index, the genotypes were grouped by the clustering methods UPGMA, Tocher and modified Tocher (Vasconcelos et al., 2007), and the two and three-dimensional projections were obtained. The estimates of the complement of the weighted similarity index based on the thirty-three microsatellite markers were compared to the estimates obtained by the same index with only microsatellites of linkage group I, the most related with oil content in the literature (Rodrigues et al., 2010;Li et al., 2011, Qi et al., 2011, Rodrigues et al., 2013Leite et al., 2016), where the distances in both cases were compared. All analyses were performed with the program Genes (Cruz, 2013).

RESULTS AND DISCUSSION
The analysis of variance showed difference in the oil contents of the soybean genotypes in the different environments, indicating genetic variability, variation in the environments and differential response of the genotypes in the environments (Table 1). The genotype x environment interaction was predominantly complex, except in one of the pairs of environments (Visconde do Rio Branco/Feb 2010 and São Gotardo/Feb 2010) ( Table 2). The coefficients of variation showed precision in controlling the causes of experimental variation (2.84-6.69%) and the ratio among the largest and the smallest residual mean square indicated homogeneity in residual variances (Falconer & Mackay, 1996). The ratio CVg/CVe was greater than 1, indicating favorable condition to selection (Araújo et al., 2014).
The variation in the oil contents along the four field trials was from 16.0 to 24.2%. Suprema showed the greatest oil content based on the average of the environments (23.01%), followed by CD01RR8384 (22.91%), while BR8014887 presented the lowest oil content (17.28%) ( Table 3).  The microsatellite loci of QTL regions for oil content showed to be effective in distinguishing the twenty-two genotypes. A total of 108 alleles was observed in this study. The number of alleles per locus ranged from two to six with an average of 3.3 alleles per locus. The size of alleles was between 100 and 600 base pairs. Altogether, we observed four heterozygote genotypes, which are compatible with the level of inbreeding expected in the autogamous species. The polymorphism information content, which estimates the informativeness of each locus, ranged from 0.08 to 0.77 with an average of 0.44. These estimates are close to the values reported by Mian et al. (2009) and Mulato et al. (2010), who also evaluated microsatellite markers in elite cultivars and accessions of soybean germplasm.
The dissimilarity estimates based on the complements of the unweighted and weighted similarity indices and based on Smouse and Peakall d 2 index showed high values of Person's correlation (> 0.94), indicating concordance in the estimates of the indices and small difference in their use. For all indices, the highest dissimilarity was estimated between PI181544 and the cultivar CD224. After this pair of genotypes, higher estimates for the last two indices were observed between each one of the introductions PI371611 and PI371610 and the cultivar Suprema. Rev. Ceres, Viçosa, v. 63, n.5, p. 661-667, set/out, 2016 The distances based on the complement of the weighted similarity index ranged between 0.06 and 0.81 and had an average of 0.61, higher variation and average than those reported by Vieira et al. (2009) with microsatellite markers in soybean. The greatest distance value (0.81) was estimated for the pairs of genotypes PI181544/CD224, Suprema/PI371611, PI371610/Suprema, PI371610/CD224, and Garantia/CD225RR and the lowest distance value was estimated between PI371611/PI371610 (0.06). The pairs of genotypes Suprema/PI371610 (0.81), CD01RR8384/Suprema (0.63) and Suprema/CD219RR (0.63) showed higher distances and oil content greater than or equal to 21.5%.
In relation to the mean dissimilarity of each genotype in relation to the twenty-one remaining, the highest value was estimated for PI181544 (0.68), followed by Suprema (0.67) and CD01RR8384 (0.65) and the lowest value was estimated for CD219RR (0.56), followed by CD983321RR, CD226RR and Luziânia (0.57) ( Table 2). Despite being the most divergent, the genotype PI181544 showed low oil content, while the cultivars Suprema and CD01RR8384 showed oil contents considered elevated. When the overall mean dissimilarity was calculated for the plant introductions (PIs) (0.65), for the cultivars with the initials CD (0.61) and for the remaining genotypes (0.60), the group of PIs was the most dissimilar group, which was expected.
In the genotype clustering by the UPGMA method, a cut at 95.77% of the dissimilarity (value indicated by the method of Mojema using k = 1.25) establishes four groups (Mojema, 1977). One group gathers only PIs, another group gathers CD01RR8376 and CD01RR8384 and the remaining group gathers the other cultivars, except Suprema that is not grouped. The values of cophenetic correlation, distortion and stress were 0.76, 1.2% and 10.9%, respectively, which evidence good adjustment of the original and graphic values and small distortion of the distances in the dendrogram (Figure 1).
The Tocher method establishes six groups with more than one genotype and shows relationships also observed in the dendrogram obtained by the previous method. The cultivars CD01RR8376 and CD01RR8384 are kept together and Suprema follows ungrouped again -relationships that are also observed in the modified method (Table 4).
In the bi and tridimensional projections, there was a relative distance of PIs and the cultivars CD224 and Suprema in relation to most genotypes, despite the low adjustment between the original distances and those obtained in the projections. The values of cophenetic correlation, distortion and stress were 0.52 and 0.61, 29.07 and 13.22%, and 42.62 and 31.26%, respectively, and indicate a low effectiveness in representing the distance matrix. Based on these measurements, the clustering by the UPGMA method was the most effective.
The microsatellites of linkage group I indicated genetic variability in the main region of the linkage group. When the distances obtained by the complement of the weighted similarity index by analysis of the 33 microsatellite markers were compared to those obtained by the same index based on the analysis of microsatellites of linkage group I only, the correlation between the estimates was r = 0.61. In both cases, the greatest distance value was observed in the pairs of genotypes PI371610/ CD224, Garantia/CD225RR, Suprema/PI371611 and PI371610/Suprema and the lowest value in the pair of genotypes PI371611/PI371610. Therefore, there is genetic variation in the main linkage group that controls oil content and the concordance in distance relationships in these two cases suggests the presence of QTLs of greater effect in the linkage group.
The linkage group I has strong association with oil content. QTLs mapped on the main region of the linkage group explained from 6 to 24% of the variation of oil content in the studied populations, the most reported and involved region with oil content in soybean (Csanádi et al., 2001;Chung et al., 2003;Nichols et al., 2006).
Based on the distance estimates and clusters obtained, CD224/PI181544, PI371611/Suprema and PI371610/Supre- ma were the most divergent pairs of genotypes, with PI371610/Suprema showing greater oil contents than the previous genotypes. The group of PIs, with greater genetic distances in relation to most groups, showed percentages of oil considered low or normal, except PI371610 that exhibited a median content. Although they are very similar, PI371610 presented oil content 1% higher than PI371611. The genotypes Suprema and CD01RR8384, in turn, showed certain genetic distance and the highest oil contents. These genotypes are promising progenitors, because genetically divergent accessions and with high oil contents can be simultaneously donor sources of additive genes for increase in oil content in soybean. According to Cruz et al. (2011), greater gain with selection is expected in populations with higher averages and genetic variances. The population average seems related to averages of parents, while genetic variance seems related to genetic diversity (Rodrigues et al., 2013). Hughes et al. (2008) consider the diversity and/or genetic distance as a measure of genetic variability. In this way, genetic diversity estimates can be useful to breeding, once crosses between genetically divergent genotypes are most likely in producing greater genetic variability and heterotic effect in progenies (Filho et al., 2010;Riaz et al., 2008). Thus, genetic diversity can be considered in predicting the potential of populations in the phase of selection of parents, this way avoiding populations with low genetic variability. In this case, greater predictive capacity of genetic variability for a particular trait is expected when the estimate of genetic distance considers QTL regions of the trait, instead of random regions of the genome (Melchinger et al., 1992;Charcosset et al., 1991).
Molecular markers have been the preferably used methodologies to assess genetic relationships between cultivars because information of access genealogy is incomplete, or not available or detailed enough and because of the absence of environmental influence of the molecular markers alternatively to most agronomic traits. (Mulato et al., 2010). Moreover, between molecular markers, the microsatellite markers are the most used tools in genetic diversity studies, due to the abundance, high level of polymorphism, multiallelism, and codominant inheritance (Rodrigues et al., 2015).