QTL mapping for protein content in soybean cultivated in two tropical environments

The objectives of this study were to detect quantitative trait loci (QTL) for protein content in soybean grown in two distinct tropical environments and to build a genetic map for protein content. One hundred eighteen soybean recombinant inbred lines (RIL), obtained from a cross between cultivars BARC 8 and Garimpo, were used. The RIL were cultivated in two distinct Brazilian tropical environments: Cascavel county, in Paraná, and Viçosa county, in Minas Gerais (24o57'S, 53o27'W and 20o45'S, 42o52'W, respectively). Sixty-six SSR primer pairs and 65 RAPD primers were polymorphic and segregated at a 1:1 proportion. Thirty poorly saturated linkage groups were obtained, with 90 markers and 41 nonlinked markers. For the lines cultivated in Cascavel, three QTL were mapped in C2, E and N linkage groups, which explained 14.37, 10.31 and 7.34% of the phenotypic variation of protein content, respectively. For the lines cultivated in Viçosa, two QTL were mapped in linkage groups G and #1, which explained 9.51 and 7.34% of the phenotypic variation of protein content. Based on the mean of the two environments, two QTL were identified: one in the linkage group E (9.90%) and other in the group L (7.11%). In order for future studies to consistently detect QTL effects of different environments, genotypes with greater stability should be used.


Introduction
Most Brazilian soybean [Glycine max (L.) Merrill] cultivars present contents of 30 to 45% protein, 20 to 25% lipids, 28 to 35% carbohydrates and about 5% ash (Moreira et al., 1990).Theoretically, by combining suitable genes from the world germplasm, all these characteristics could be genetically modified.An important aspect regarding the quality of soybean grain is related to the quantity and quality of the protein fraction, because it represents a source of low cost with Pesq.agropec.bras., Brasília, v.43, n.11, p.1533Brasília, v.43, n.11, p. -1541Brasília, v.43, n.11, p. , nov. 2008 high nutritional value for human and animal consumption (Wilson, 2004).Soybean breeding programs have, therefore, emphasized the development of varieties with high protein content, given the economic importance of this trait (Carrão-Panizzi et al., 2008).
The soybean protein content results from the joint action of various genetic loci and their interactions with the environment, which makes complex the study and the genetic analysis for this characteristic (Sudarić et al., 2006).However, currently, it is possible to break down the genetic variation of quantitatively inherited traits into discrete loci (quantitative trait loci) and to identify those with greater effect.Based on genetic DNA markers, saturated genetic maps can be elaborated and used to detect and locate QTL related to agronomic, physiological and to seed composition characteristics, such as protein content (Boerma, 2000).
Genetic mapping and QTL detection are promising tools to optimize selection in genetic breeding programs, as it allows more accurate study of the genetics of quantitative traits.Selection accuracy can be increased by marker-assisted selection (MAS), especially for characteristics with low heritability or characteristics strongly influenced by the environment (Moreau et al., 2004).QTL detection is also associated to cloning studies and gene characterization, by fine mapping of genomic regions, and to the identifying of candidate genes, related to specific metabolic pathways.
Mapping and QTL detection studies for protein content in seeds have been reported extensively in literature (Chung et al., 2003;Fasoula et al., 2004;Hyten et al., 2004;Panthee et al., 2005).However, many reported QTL have still to be confirmed and their consistency validated (Fasoula et al., 2004).According to Panthee et al. (2005), the QTL reported with greatest consistency for protein content in soybean is located in the MLG I linkage group, close to the Satt292 microsatellite marker.The same study showed the difficulty for validating previously detected QTL, considering the need to carry out experiments in distinct environments, with different population structures and for different genetic backgrounds.
QTL mapping for protein content, involving Brazilian soybean germplasm, has not been reported yet.A study with this objective will allow assessment and comparison of QTL detection under tropical conditions, in addition to involving a different genetic background.Furthermore, efficient development of breeding procedures depends on the understanding of the type of genetic action and of hereditability of the quantitative traits.The objective of the present study was to detect and map the QTL that control protein content in soybean, based on SSR and RAPD markers, and to start the construction of an intra-specific genetic map for soybean, involving genotypes adapted to tropical conditions.

Materials and Methods
A population of 118 soybean recombinant inbred lines (RIL) was used, at the F 6 generation, obtained by the SSD method, from a cross between 'BARC-8' and the Brazilian cultivar Garimpo.BARC-8 is a high protein content soybean cultivar (500 g kg -1 ) developed by USDA ARS, Beltsville, MD (Leffel, 1992).Cultivar Garimpo presents normal protein content (360 g kg -1 ) and was developed by Embrapa Soja, Brazil.The RIL were planted in a randomized blocks design with intercalated controls, in December 2001, in two locations in Brazil: Viçosa, MG (20º45'S, 42º52'W, altitude of 650 m, annual rainfall of 1,340 mm), and Cascavel, PR (24º57'S, 53º27'W, altitude of 780 m, annual rainfall of 1,971 mm).Each plot consisted of one 3-m row with 45 cm between rows.The controls used were the parental cultivars.They were intercalated at every 20 families.Seeds were harvested and stored for later analysis of protein content, which were determined by taking a sample of five plants per family.The modified Kjeldahl (Association, 1984) method was used.
The variance analysis for each location was based on the family design with intercalated controls ('BARC-8' and 'Garimpo') (Cruz, 2001).The joint analysis of variance, for the two locations, was carried out using two different models: the first for the controls and the second for the families.In the controls analysis, a factorial model was used to partition the variance among controls, environments and the interaction between controls and environment.The following model was adopted: Y ijk = m + T i +A k +T i A k + e ijk , in which: Y ijk is the value of the characteristic for i th control, in the j th replication, in the k th environment; µ is the general mean of the controls; T i is the effect of the i th control (I = 1,2,...,t); A k is the effect of the k th environment (j = 1,2); T i A k is the effect as the interaction between the i th control and the k th environment; e ijk is the random error in the controls, where e ijk ~NID (θ, σ 2 ).
In the joint analysis of variance for families, a model was adopted with analysis similar to the randomized block design, in which each environment corresponded to one block.Similarly, the family vs. environment interaction was considered in the partition of the variance.The model is written, therefore, as follows: Z ik = m +F i + A k + F i A k + e k , in which: Z ik is the value of the characteristic for the i th family in the k th environment; µ is the general mean of the families; F i is the effect of the i th family (I = 1,2,...,f); A k is the effect of the j th environment (k = 1,2); F i A k is the effect of the interaction between the i th family and the k th environment; e ik is the random error in the families, where e ik = e ijk ~NID (θ, σ 2 ).
In order to extract the DNA, leaves from 5 plants from each family were collected, wrapped in aluminum foil, frozen in liquid nitrogen and stored at -80°C; DNA was extracted based on Doyle & Doyle (1990).The amplification reactions for the SSR primers were carried out in a total volume of 15 mL, containing 10 mM Tris-HCl, pH 8.3, 50 mM KCl, 2 mM MgCl 2 , 100 mM of each of dATP, dTTP, dGTP and dCTP deoxynucleotides, 0.6 mM of each primer (Research Genetics, Huntsville, AL, USA), 1 U Taq-polimerase and 30 ng DNA.The amplifications were conducted in a thermocycler programmed as follows: 7 min at 94 o C, 30 x (1 min at 94 o C, 1 min at 50 o C and 2 min at 72 o C), and 7 min at 72 o C. The amplified microsatellite fragments were analyzed by electrophoresis in 3% agarose gels, containing 6 µL ethidium bromide (10 mg mL -1 ) in TBE buffer (90 mM Tris-borate and 2 mM EDTA, pH 7), at 100 volts.The DNA samples were also amplified by the RAPD technique, according to Williams et al. (1990), using decamer primers (Operon Technologies, Alameda, CA, USA).After the run, gels were photographed under ultraviolet light by the Eagle Eye II (Stratagene) photodocumentation system.A total of 567 SSR primer-pairs and 1,200 RAPD primers were tested.
The GQMOL program (Cruz & Schuster, 2004) was used to obtain the linkage map, using the Kosambi map function; the presence of QTL and their effects were identified by multiple regression analysis or composite interval mapping (Zeng, 1993).The markers were grouped using LOD>3 and maximum recombination frequency<0.30.For those markers, which were known to be linked in the consensus map (Song et al., 2004) but were unlinked in the analysis made with LOD>3, a new grouping was performed with LOD>2.The markers order in the linkage group was obtained by RCD algorithm (rapid chain delineation) (Doerge, 1996).For the composite interval analysis (Jansen et al., 1993;Zeng et al., 1993Zeng et al., , 1994)), only markers that presented P(β)<0.20 were considered, in order to avoid drastic reduction in the population size due to missing data.

Results and Discussion
Protein content in the 118 RIL families was continuously distributed approximating the normal distribution in the two environments tested ( Figure 1).This result confirms the pattern of polygenic inheritance for protein content control in soybean.The average protein contents of cultivars BARC-8 and Garimpo varied according to the environment.The average protein contents of 'BARC-8' were 51.77 and 54.10%, in Viçosa and in Cascavel, respectively.'Garimpo' presented an average protein content of 35.58% (Viçosa) and 43.18% (Cascavel).
The RIL families and the parents presented significant genetic variability for protein content and significant genotype vs. environment interaction (Tables 1 and 2).Protein content heritability, obtained by components of variance, was high in the two locations (Viçosa: 73.47%; Cascavel: 82.11%) and in the joint analysis (73,16%), indicating that most of the variation observed was due to genetic causes.The joint analysis for the two environments indicated also the significant genotype vs. environment interaction for families and controls, showing that environmental factors have considerable influence on the regulation of genes related to protein accumulation in soybean ( Only 66 (11.64%) of the 567 SSR primer-pairs tested showed polymorphism in the RIL population, segregated according to a 1:1 ratio, and presented good amplification quality.In addition, 127 (10.6%) of the RAPD primers tested showed polymorphism between the parents, of which only 65 (5.41%) presented consistent polymorphism in the RIL population, segregating at 1:1 ratio.Ninety markers were found to be linked, and thirty linkage groups were obtained (Figure 2) in addition to 41 nonlinked markers.The linkage groups obtained were compared with the soybean consensus map (Song et al., 2004).Of these groups, 23 were allocated in 16 linkage groups (A1, B2, C1, C2, D1a, D1b, D2, E, F, G, H, J, K, N and O) of the soybean consensus map.The remaining seven groups were formed only by RAPD markers and, thus, could not be aligned to the consensus map; they were named #1 to #7.
A partial linkage map was built based on SSR and RAPD markers, covering about 829.7 cM of the soybean genome.Although the soybean genome presents about 2,523.6 cM (Song et al., 2004), the partial map obtained in the present study sets the basis for the development of a genetic map for tropical soybean genotypes.
In the QTL analysis by multiple regression, 12 markers where identified with significant effect on the expression of protein content in the RIL populations.Seven markers had significant effect on the variation of the protein content of plants grown in Cascavel (Table 3).The accumulated adjusted R 2 of these markers explained 34.19% of the variation in the protein content, in this environment.Two markers (OP-AU04 and Satt549) concentrated about 20.73% of the variation in protein content.The others presented a mean individual effect of 3.62% of the variation observed.Of the seven markers identified, two (OP-AU04 and OP-BE13) were not allocated in specific linkage groups, in the map obtained in the present study.The others were located in linkage groups N, G, C2, I and #1 (Table 3).Five markers were identified in Viçosa environment, with significant association that explained 48.79% of the variation in protein content (Table 3).On average, these markers explained individually 11.11% of the variation observed.Of the five markers identified in this environment, only OP-AO06 was not placed in a specific linkage group.The other markers were mapped in linkage groups A1, C2, H and #1 (Table 3).The multiple regression analysis, using the protein content mean of each RIL, in the two locations, identified 10 markers associated with variation in protein content, located in six different linkage groups (A1, C1, C2, K, #1 and #3) (Table 3).Together, these markers explained 41.72% of the variation in the mean protein content.Although this method cannot determine the exact position of the  3.10 * and **Significant at 1 and 5% probability, respectively.QTL, it can estimate its effect for linked and nonlinked markers, while through composite interval mapping only QTL in linkage groups can be detected.Single marker mapping and multiple regression analysis are important tools for preliminary studies to detect candidate QTL (Doerge, 2002).With the availability of high definition genetic maps, specific genome regions can be delimited and markers can be tested in different genetic backgrounds, speeding up QTL detection.Three QTL were identified by the composite interval mapping analysis for protein content, for Cascavel environment, and two QTL for Viçosa environment.
In Cascavel environment, QTL were identified in C2 (Satt422-Satt281), E (Satt384-Sat_112) and N (Satt549-Satt084) linkage groups.These QTL explained 14.37, 10.31 and 7.34% of the variation in the protein content in this environment, respectively, presenting an accumulated R2 of 32.02%.The three QTL identified presented additive positive effect, indicating that the alleles from 'BARC-8' confer increased protein content (Table 4).In Viçosa, two QTL were identified in G (Satt199-Satt594) and #1 (OP-AN09-OP-AC02) linkage groups, explaining 9.51 and 7.34% of the variation in the protein content,  respectively, presenting an accumulated R 2 of 16.85%.These QTL had negative additive effect, indicating that the presence of alleles from 'BARC-8' conferred reduction in the protein content in these loci (Table 4).
Considering the mean of the two locations, two QTL where identified in E and L linkage groups (Figure 3), which explained, respectively, 9.90 and 7.11% of the variation in the protein content (Table 4).Single marker analysis and composite interval mapping analysis showed that the QTL close to marker Satt549 (LG N) was consistent for the Cascavel environment (Table 3).Consistent QTL, close to marker OP-AN09 (LG #1), was observed in Viçosa environment, also by the two types of analysis.Linked OP-AC02 and OP-AN09 (LG #1) markers were detected simultaneously by single marker analysis in Cascavel and Viçosa environments.However, just QTL present in Viçosa was confirmed by composite interval mapping (Tables 3 and 4).The marker Satt281 was identified by single marker analysis in Viçosa environment, but was detected by composite interval mapping in Cascavel environment (Tables 3 and 4).According to the Soybase (2008), 76 QTL have been identified and reported as related to protein content.However, little consistency has been observed for QTL expressed in different environments or different populations (Brummer et al., 1997;Fasoula et al., 2004;Panthee et al., 2005).In the present study, this fact might be explained by the high genotype vs. environment interaction observed and by the fact that these QTL can be considered environmentally sensitive (Brummer et al., 1997).Although research on QTL tends to emphasize their validation in different environments and populations, in a real selection scheme, for many times, there are genes of interest which are specific to a certain environment.In this case, environment sensitive QTL can be useful in specific locations for marker-assisted selection.
The QTL located in linkage group E had also been identified for protein content in Cascavel environment.In fact, it seems to be the same QTL, because the confidence intervals for the positioning of the QTL are the same for Cascavel environment and mean analyses.Considering the confidence interval of two LR units (Schuster & Cruz, 2004), the confidence interval for the localization of the QTL in linkage group E extends from the position of marker Satt384 to 6.4 cM of it, for the data from Cascavel, and from the position of Satt384 marker to 6.9 cM of it, for the mean data of the two locations (Figure 3).The QTL detected in linkage group L, close to marker OP-AS07, was not identified in neither locations, in the individual analyses.The QTL in E and L linkage groups can, therefore, be considered a candidate stable QTL, although more studies should be conducted to validate them.Stable QTL are considered of great use for breeding programs and for use in MAS.These QTL may be associated with genes that will lead to general adaptability or stability of genotypes under Pesq.agropec.bras., Brasília, v.43, n.11, p.1533Brasília, v.43, n.11, p. -1541Brasília, v.43, n.11, p. , nov. 2008 selection.Recognizing QTL in the early breeding generations, in the same genetic background, means that MAS could be used, even under limited conditions, that is, in the same population.Since the control of complex characteristics is conditioned by several genes, which can be regulated differently in distinct environments, it is expected that different QTL will be identified.Similarly, distinct genetic backgrounds can condition the identification of different QTL.Further studies should be carried out to increase the coverage and saturation of the map and to validate QTL for protein content in tropical environments.

Conclusions
1. Genotypes with greater stability must be used to consistently detect QTL effects for different environments.
2. The QTL in E and L linkage groups are probably stable QTL, although more studies should be conducted to validate them.

Figure 1 .
Figure 1.Distribution of the protein content in 118 RIL cultivated in Viçosa and Cascavel.

Figure 2 .
Figure 2. Soybean genetic map based on a RIL population consisting of 118 recombinant inbred lines, obtained from crosses between cultivars BARC-8 and Garimpo.The linkage groups (LG) were obtained by adopting the LOD values = 3 and r = 0.40.Segments marked by stars were linked with LOD 2. The dotted sections indicate that the two LG segments belong to the same LG of the consensus map(Song et al., 2004).Values at left represent the distance between the markers (in cM) and, at right, the marker is identified.The identification below each linkage group corresponds to the name of the group.The black bar represents QTL for protein content.

Figure 3 .
Figure 3. Mapping of QTL associated to protein content in soybean in linkage groups E and L, from the RIL means in Viçosa and Cascavel.The LR values were calculated by the composite interval mapping method, using the GQMOL program.The horizontal line represents the threshold for QTL significance, obtained for each chromosome by the permutation test, using 1,000 permutations and 5% of probability.
Carrão-Panizzi et al. (2008), studied the same population used in the present work, in generations F 7 and F 8 , cultivated in a greenhouse, and obtained a high heritability value for protein content (99.86%).Piovesan(2000)also obtained high heritability values for protein content, in a diallel with contrasting parents for this trait.These results indicate good possibility for QTL detection, for both environments.Carrão-Panizzi et al. (2008)observed a significant environmental effect in protein fraction contents of 90 Brazilian soybean cultivars.

Table 1 .
Analysis of variance related to the experiments in Viçosa and Cascavel, and estimates of the genetic variance (σ 2 g ), heritability (h 2 ) and the coefficient of variation (CV).

Table 2 .
Joint analysis of the experiments in Viçosa and Cascavel, and estimates of the genetic variance (σ 2 g ), heritability (h 2 ) and the coefficient of variation (CV).

Table 3 .
Multiple regression analysis among the molecular markers and the protein contents of the families cultivated in Viçosa, MG, and Cascavel, PR.

Table 4 .
QTL for protein content in soybean, detected by composite interval mapping, from assessment of RIL in Viçosa, MG, and Cascavel, PR.