The genetic base of Brazilian soybean cultivars: evolution over time and breeding implications

Genetic diversity is essential for crop breeding and one way to estimate it is through the concept of genetic base, which can be defined as the number of ancestors and their relative genetic contributions (RGC) to each cultivar. The RGC can be estimated through the coefficient of parentage between the ancestors and cultivars. Previous studies determined that the genetic base of Brazilian soybean was very narrow. The objective of this work was to evaluate the pedigree of 444 Brazilian soybean cultivars to estimate their genetic base. The cultivars were divided according to their release dates and according to their origin (public or private), and the genetic base for each group was also estimated. We found 60 ancestors, of which the top four (CNS, S-100, Roanoke and Tokyo, respectively) contribute 55.3% of the genetic base. Only 14 ancestors have an RGC over 1.0%, and they represent 92.4% of the genetic base. Analysis of the release dates indicated that there has been an increase in the number of ancestors over time, but the four main ancestors were the same over all periods, and their cumulative RGC increased from 46.6% to 57.6%, indicating a narrowing of the genetic base.


Introduction
Soybean is one of the most important field crops in the world, and Brazil is the second largest producer and exporter of this commodity, behind the United States. The main countries which import soybean from Brazil are China (22,885,887 t), Spain (2,155,811 t), the Netherlands (1,036,919 t), Japan (548,339 t), Germany (522,354 t) and France (506,775 t). This crop was introduced into Brazil around the end of the nineteenth century. It gained economic importance during the 1970s and has been increasing ever since. In 2011/2012, 25 million ha of soybean were planted, which corresponds to almost half (49.2%) of the total area of field crops in the country, achieving a total production of 66 million metric tons and a mean yield of 2651 kg/ha (Conab, 2013).
The first soybean cultivars planted during the 1960s and 1970s were introduced from the South of the United States, e.g. Bragg, Davis, and Lee. With the growing importance of soybean, breeders began crossing these cultivars among themselves and with other sources, obtaining the first Brazilian cultivars, such as Industrial, Santa Rosa and Campos Gerais.
The frequent crossing of a small number of cultivars can lead to a reduction in genetic diversity. One way to measure the genetic diversity of a crop is through the concept of genetic base, which can be defined as the number of ancestors and their relative genetic contribution (RGC) to a specific group (Cui et al., 2000a). The term ancestor usually refers to a founding stock with no known pedigree (Gizlice et al., 1994). The RGC can be estimated through the mean coefficient of parentage (COP) between all the cultivars and an ancestor. This method has been used in many studies, and it is applicable when there are detailed pedigree records available. The genetic base of Brazilian cultivars was estimated by Hiromoto and Vello (1986), who analyzed 69 cultivars. They found only 26 ancestors, of which the main four ancestors (CNS, Roanoke, Tokyo and S-100) contributed 48.2% of the genetic base, and they concluded that it is narrow. The Brazilian and North American soybean genetic bases also shared six important ancestors. The genetic base for soybean cultivars has been estimated for the United States (Delannay et al., 1983;Gizlice et al., 1994;Sneller, 1994Sneller, , 2002, China (Cui et. al., 2000a), Japan (Zhou et al., 2000) and India (Bharadwaj, 2002). China has the largest genetic base, with 339 ancestors, while other countries have smaller ones (Table 1).
Since the study by Hiromoto and Vello (1986) there have been no large scale attempts to quantify the genetic base of Brazilian soybean cultivars and to analyze its evolu-tion. Therefore, the objectives of this work are to (i) quantify the genetic base of 444 Brazilian soybean cultivars covering the years from 1943 to 2009, using COP obtained through pedigree records, (ii) quantify the changes that occurred in the genetic base over five periods (before 1971, 1971-1980, 1981-1990, 1991-2000 and 2001-2009), (iii) compare the genetic base of public and private soybean cultivars, and (iv) compare the Brazilian genetic base to that of other countries.

Material and Methods
The 444 cultivars soybean cultivars analyzed in this study were foreign, public, or private cultivars that were widely grown by farmers during certain periods. We made no distinction between specialty soybean, such as soybean for human consumption, and commodity soybean. Among the 444 cultivars 14 are public transgenic cultivars, which were included in this study. The pedigree data was obtained from various sources in the literature, from the Germplasm Resources Information Network and information from breeders. Ancestors were defined as founding stock with no known pedigree (Gizlice et al., 1994).
Malécot's COP (f xy ) was estimated based on the equation presented by Falconer and Mackay (1996), as follows where f is the COP between two cultivars X and Y; A and B are the parents of cultivar X and C and D are parents of cultivar Y. We started by assembling the complete pedigrees of the Brazilian soybean cultivars, and traced them back to the ancestors, estimating the probability of the contribution that each ancestor made to each cultivar, that is, the COP between an ancestor and a cultivar. The following assumptions were made: (i) all the ancestors are independent, in other words, the COP between pairs of ancestors is f = 0; (ii) in a cross, each parent contributes 50% of the genes; (iii) the cultivars, ancestors, and lines are all homozygous and homogenous; (iv) the COP between a plant selection and its antecedent was considered to be f = 1; (v) the COP between a mutant and the original genotype was considered to be f = 1. The calculations were performed using the computer software Microsoft Excel® (2003). The relative genetic contribution (RGC) of different ancestors to a given selection was calculated by partitioning the genetic constitution of each cultivar, using the COP as an estimate. The mean RGC of the ancestors was calculated by averaging the RGC of that ancestor from all cultivars. The accumulated genetic contribution (AGC) is the sum, in decreasing order of importance, of the ancestral RGCs. The presence of an ancestor in a pedigree was determined as its frequency. The mean number of ancestors was determined by averaging the number of ancestors each cultivar had. The ancestor/cultivar ratio was also determined.
The cultivars were subdivided into groups according to the year of release or year of registration (when available) in Brazil. The timeframe was divided into five periods: before 1971, 1971-1980, 1981-1990, 1991-2000 and 2001-2009. The cultivars were also subdivided according to origin: public or private. The few cultivars that were developed abroad were excluded from this analysis. For each subgroup, all the above mentioned parameters were also calculated.

Results and Discussion
Pedigrees Although over 700 cultivars were officially registered in Brazil in 2010, according to CultivarWeb, the database for the National Cultivar Registry from the Ministry of Agriculture, Livestock and Food Supply, we were only able to 548 Wysmierski and Vello Sources: Hiromoto and Vello, 1986;Gizlice et al., 1994;.Cui et al., 2000a,b;Zhou et al., 2000;Bharadwaj et al., 2002. Note: AGC, accumulated genetic contribution; COP, coefficient of parentage. a Data not available.
obtain the pedigrees for 444 cultivars. Due to the Variety Protection Act of 1997, many breeders have not made public the pedigrees of released cultivars, especially more recent ones. A few pedigrees deserve special attention. First, the cultivar FT-Cristalina originated from a natural cross between UFV-1 and an unknown parent. But based on morphological data, some breeders suggest that the unknown parent was Davis (Hiromoto and Vello, 1986), so this pedigree was used in the present work. The same authors reported that cultivar BR-9 (Savana) originated from the bulk LoB 74-2, with a probable pedigree being Davis x (Hill x PI 240664). Another important observation made is that morphological and molecular marker data have not supported the pedigree of the cultivar Lincoln (Gizlice et al., 1994;Lorenzen et al., 1995), which appears in the genealogy of 28 Brazilian cultivars. So either the genealogy is incorrect, or the accessions maintained in the germplasm bank are not the real ancestors for Lincoln. Therefore, we chose to make Lincoln into an ancestral line.

Overview of the genetic base
The 444 cultivars analyzed traced back to 60 ancestors ( Table 2). The four main ancestors (CNS, S-100, Roanoke and Tokyo), contribute more than half (55.3%) to the genetic base, and CNS contributes alone with almost onefifth of the genetic base (19.2%). Only the top 14 ancestors have a mean RGC of over 1%, and their AGC is 92.4%. Therefore, the 46 remaining ancestors contribute cumula-Genetic base of Brazilian soybean 549  tively only 7.6% to the genetic base. For comparison, Hiromoto and Vello (1986) reported 26 ancestors for a group of 69 cultivars, and the AGC of the top four ancestors was 48.2%. These were the same top four ancestors found in this study, but with a different order. These results indicate that the genetic base of Brazilian soybean is still narrow, despite the incorporation of new ancestors.
For those ancestors that no longer exist (or we do not know from where seeds are available) we provide a first progeny, as defined by Gizlice et al. (1994), for screening options due to the missing ancestor. For the top 20 ancestors, only two are unavailable (PI 60406 and Mogiana).
On average, there were 10.52 ancestors for each cultivar, ranging from 2 to 23. However, this increase must be taken with caution, because it is mainly due to the incorporation of low-contribution ancestors. For example, the mean number of ancestors for the total period is 10.52 (Table 3), a high value when compared to the Chinese (3.79) and Japanese (3.20) values (Cui et al., 2000a;Zhou et al., 2000). But when we only consider ancestors with RGCs of 5% and 10% to cultivars, this number drops to 7.11 and 4.08, respectively (Figure 1), demonstrating that many ancestors have only small contributions.
The frequency of ancestors also demonstrates how narrow the genetic base is. The main ancestor, CNS, is present in the pedigree of 435 cultivars (98.0%). The other top ancestors (S-100, Roanoke and Tokyo) also have very high frequencies. CNS and S-100 are the most common ancestors because their cross resulted in cultivar Lee and the line D49-2491, a sister of Lee and an ancestor of Bragg. Lee and Bragg were used as parents in many early cultivars developed in Brazil. They are present in 118 and 135 cultivars, respectively. As the RGC decreases so does the frequency, both being highly correlated (r = 0.88), to the extreme that some ancestors (11) only contribute to one cultivar. Notwithstanding, some ancestors have low RGCs but have higher than expected frequencies. This usually occurs because the ancestor was used in backcrosses to incorporate simple traits. One clear example is the ancestor GTS-40-3-2, which was used to incorporate the CP4 EPSPS gene, confering resistance to glyphosate, into the 14 transgenic cultivars analyzed here (Sneller, 2003).
As seen above, most of the ancestors have very low mean RGCs. We believe this occurs because breeders usually use new lines as parents to introduce specific genes, usually involving backcrossing. Therefore, parents used as gene donors through backcrossing lead to an increase in the number of ancestors, but do not contribute significantly to increase the genetic base of the cultivated species.
A database search on GRIN (Germplasm Resources Information Network) allowed us to identify a few notable characteristics of some ancestors (Table 4). Some lowcontributing ancestors (but not all) may have been incorporated into the genetic base for these traits. For example, the ancestor I-Higo-Wase (PI 205085), which has a mean RGC of 0.0036% and contributes to only five cultivars, has the lx3 allele, which confers an absence of lipoxygenase-3 in seeds. This ancestor was used to incorporate this trait into five soybean cultivars developed for human consumption. Another ancestor used in this manner is PI 88788, with a mean RGC of 0.0299%. This ancestor is resistant to soybean cist nematode, races 3 and 14. Shirohadaka (PI 86490), with the small seed trait, was used to incorporate this characteristic into a cultivar aimed at producing natto, a fermented soybean dish appreciated in Japan.
The geographical origin of the 60 ancestors is provided ( Table 2). The primary geographical origins were considered to be the countries where soybean may have originated, or are considered as centers of diversity (based on information from GRIN). When the primary geographical origin was unknown, we used the secondary geographical origin, which is the country where the ancestor was collected or developed. When no information was available, the origin was left as unknown. Most of the ancestors are from Asia (78%), mainly from China and Japan. One assumption made in this work is that ancestors are unrelated; however, this may not be the case, especially with ancestors originating in the same area (e.g. Jiangsu, China). Therefore, the relationship between ancestors and cultivars is probably underestimated and the real genetic base may be narrower than the estimated genetic base (Hiromoto and Vello, 1986;Mikel et al., 2010). Indeed, Gizlice et al. (1993), using multivariate analysis of ten traits, estimated genetic similarity coefficients for 14 ancestors of North American soybean cultivars, which ranged from 0.00 to 0.88, demonstrating that some are indeed similar and may have some degree of relatedness.

Period analysis
The results from the different periods (Table 3)  not counting cultivars with unavailable pedigrees. We also clearly detect an increase in the number of ancestors over time, also supported by an increase in the mean number of ancestors per cultivar, increasing from 4.88 to 13.20. But here one must take into account that many ancestors have only low contributions (Figure 1). The genetic base of Brazilian soybean cultivars has decreased over the past few decades (Table 3). The top four ancestors in each period were always CNS, S-100, Roanoke and Tokyo, and with the exception of CNS, which remained as the ancestor with the highest RGC throughout all periods, the ranking of the other three changed. In the final period (2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009), CNS alone represented more than one-fifth of the genetic base (20.7%). These results also support the fact that the genetic base has decreased over time. Delannay et al. (1983), studying the North American genetic base also observed a similar effect, with the number of ancestors increasing over time, but accompanied by an increase of the main ancestors`RGC. On the other hand, Cui et al. (2000a) observed that the Chinese soybean genetic base increased over time, with the incorporation of new ancestors and the RGC of ancestors became more uniformly distributed.

Public vs. private cultivars
Of the 444 cultivars analyzed, 301 were of public and 112 of private origin, and 31 were of foreign or unknown origin. The public cultivars had 58 ancestors (Table 2), close to the total number of ancestors (60), with only two ancestors contributing exclusively to the private cultivars, PI 346304 and PI 230979. The top four ancestors for public cultivars are identical to the total ancestors of the total group of cultivars, and their AGC is 53.9%. The top 13 ancestors for this group contributed with 90.5% to the genetic base, and are also the same top 13 ancestors for the total group of cultivars, with minor changes in rank.
On the other hand, the private cultivars had only 37 ancestors. This low number of ancestors may reflect the lower number of cultivars in this group and the use of a restricted number of cultivars in private breeding programs. However, the genetic base of this group does not significantly differ from the public and the total group. For example, the top four ancestors for private cultivars are the same as those for the public group and general group of cultivars, CNS, S-100, Roanoke and Tokyo in this order, but with a slightly higher AGC (57.2%). Also, the top 13 ancestors for this group, which represent 93.1% of the genetic base, are also the same ancestors for the other two groups, with minor changes in rank.

Comparison with other genetic bases
When comparing the results from this study to the soybean genetic bases of other countries (Table 1), one can see that Brazil still has a very narrow genetic base. Hiromoto and Vello (1986) reported that six of the main ances- 552 Wysmierski and Vello Table 3 -Analysis of all the periods, indicating number of cultivars and ancestors in each period, average number of ancestors per cultivar (anc./cv), AGC of the top 4 ancestors, number of ancestors that correspond from 50% to 90% of the genetic base and ratio of ancestors per released cultivar (Anc./cv ratio) in each period. Period tors of that time were shared with the North American soybean genetic base. In this study, a total of 26 ancestors were shared, based on Gizlice et al. (1994). When only considering the main ancestors of both countries, then the Brazilian and North American soybean genetic bases share only six ancestors (CNS, S-100, Roanoke, Tokyo, PI 54610 and Dunfield), the same number as reported by Hiromoto and Vello (1986). Another point is that the top five ancestors described here (CNS, S-100, Roanoke, Tokyo and PI 54610) are the exact same top five ancestors for the soybean genetic base of the southern USA. This, as Hiromoto and Vello (1986) argue, is due to the fact that the Brazilian soybean genetic base was partially derived from cultivars from this part of the USA.