Genetic variability of Brazilian wheat germplasm obtained by high-density SNP genotyping

The aim of this study was to evaluate the genetic diversity of the wheat germplasm using high-density genotyping with SNP markers. A set of 211 wheat varieties genotyped with 35,142 SNP markers were used in the experiment. Genetic distances ranged from 0.013 to 0.471, with the highest frequency of distances varying between 0.31 and 0.40. In the cluster analysis by the UPGMA method, 81% of the varieties were clustered in three groups. Genetic variability in the Brazilian wheat germplasm has remained constant for over 70 years. Mean genetic distances among the varieties developed in each decade ranged from 0.33 to 0.34. A trend of genetic distance between genotypes from different eras has been observed over time as a result of breeding. Results described in this study can help Brazilian wheat breeders to manage more adequately genetic variability in the Brazilian wheat germplasm.


INTRODUCTION
One of the features of plant breeding in Brazil is the possibility of using commercial varieties in crosses regardless of the intellectual property associated with them (Riede et al. 2001).This is the so-called breeder's right, provided for by the Plant Variety Protection (PVP) Law in Brazil (Law 9456, from 1997).This possibility allows the sharing of genetic variability between the various breeding programs.Varieties developed by a breeding program can be used as germplasm source by other breeding programs.
Genetic variability is the foundation of breeding.Breeding programs are aimed at exploiting the genetic variability of species to obtain genetic combinations of adapted, high yielding, disease-resistant, and higher-quality varieties, in addition to other characteristics.Although genetic variability can be increased through the introduction of exotic germplasm, only a fraction of this variability is useful in breeding.The most part of the exotic genome is not adapted and must be eliminated after being introduced as a source of variability, through successive breeding cycles.For this reason, in most cases, breeding programs use exclusively germplasm already improved for the generation of breeding populations and variability.
Because the genetic variability used in breeding programs is only a fraction of the variability present in the species, knowing the existing variability in the A Scherlosky et al.
germplasm used by breeding programs is essential for the rational use of this germplasm.Further, it is essential to monitor variability over the years, since short-term monitoring may erroneously lead to biases in the interpretation of the data towards reduction of variability.
Molecular markers have been one of the main tools employed in studies of genetic variability (Caixeta et al. 2009, Cruz et al. 2014).Among them, Single Nucleotide Polymorphism (SNP) markers stand out for their abundance in the genome of species; their high automation capacity; and the existence of high-density SNP microarrays for the many species of interest to the breeder.Wheat germplasm evaluation with SNP markers is just beginning, and only few works have been published using this marker in wheat, e.g.Shavrukov (2014) in Kazakstan.
In view of the management and preservation of the existing genetic variability in the Brazilian wheat germplasm, the present study was conducted to analyze the genetic variability present in the Brazilian wheat germplasm by using, for the first time, high-density SNP markers.Additionally, this study examined the evolution of genetic variability in the Brazilian wheat germplasm over four decades.

Genetic material
This work was conducted in the biotechnology laboratory of Coodetec, in Cascavel, PR, Brazil.The genetic material was composed by 185 varieties and one elite line of wheat [(Triticum aestivum (L.)] developed in Brazil; 11 varieties from Paraguay; seven from Mexico; six from China; and one from Argentina, totaling 211 varieties (Table 1).Among the Brazilian varieties, twenty-one were developed before 1980, thirty-five were developed in the 1980s, twenty-four in the 1990s, sixty-four in the 2000s, and forty-one in the 2010s.

DNA extraction and genotyping of SNP markers
DNA was extracted according to the protocol described by Schuster et al. (2004).DNA samples were genotyped using the Axiom TM WhtBrd-1 Array kit, which contained 35,143 SNP markers, at Affymetrix Company (Santa Clara, CA, USA).All information pertaining to the SNP present in the platform can be accessed at http://www.cerealsdb.uk.net.
After genotyping, the obtained data were filtered in Excel sheet to discard monomorphic markers, markers that did not have one of the homozygous genotypes, markers with a call rate lower than 90% (over 10% of missing data), markers with minimum allele frequency lower than 5%, and markers with over 30% heterozygous genotypes among the 211 varieties of wheat used.

Analysis of genetic variability
Genetic distances among the wheat varieties were obtained by 1-IBS (identity by state), where IBS is defined as the probability of the alleles observed in the same locus in two individuals being the same at random.Therefore, the distance between one individual and itself is defined as 0.
This estimate is based on the following definition: for a bi-allelic locus with A and B alleles, the probability of IBS, pIBS (AA, AA) = 1, pIBS (AA, BB) = 0, pIBS (AB, xx) = 0.5, where xx is any genotype other than AB.For two taxa, pIBS is obtained as the average of all loci without lost data.The estimates of genetic distance among the wheat varieties were obtained using Tassel software (Bradbury et al. 2007).
Genetic distances among the groups of varieties developed in each period were obtained by Rogers' genetic distance estimator: Where m is the number of markers; n i is the number of alleles in marker i; and p ij and q ij are the frequencies of allele j in marker i in the pairs of eras considered in each comparison of group of genotypes.Rogers' genetic distances were obtained using an Excel (TM) spreadsheet.
A Scherlosky et al.
The cluster analyses between varieties and between genotypes of different eras were performed by the UPGMA hierarchical method, using JMP software (SAS Institute 2015).

RESULTS AND DISCUSSION
Of the 35,143 markers used, 10,049 met the quality requirements, accounting for 28.6% of the total number of markers present in the SNP panel.In the set of useful markers, the number of markers per chromosome ranged from  2014) used wheat varieties from Kazakhstan and the SNP platform Infinium 9k (Illumina) for wheat and identified 46% informative markers.In their work, 49% were associated with the A genome, 46% with the B genome, and 5% with the D genome.In addition to being larger, the marker panel used in the present study is better distributed across the three wheat genomes.
The estimates of genetic distances among the 211 varieties ranged from 0.013 (between 'BRS 193' and 'Tucano') to 0.471 (between 'Canindé 13' and 'BR 34').Of all distances, 71.34% lay between 0.31 and 0.40 (Figure 2).Khan et al. (2015) evaluated the variability existing in a collection of 95 tetraploid and hexaploid varieties of wheat from India and Turkey.The genetic distances obtained among hexaploid varieties varied from 0.02 to 0.29 in India and from 0.05 to 0.58 in Turkey, which is a similar level of variability to that obtained in this study.
In a previous study, we investigated the genetic variability of a set of 36 Brazilian varieties of wheat using microsatellite markers, and identified genetic distances of 0.10 to 0.88 (Schuster et al. 2009).Bered et al. (2001) also evaluated the variability of Brazilian wheat varieties, using RAPD markers, and observed distances from zero to 0.32.In these two studies, a small number of varieties and a small number of markers was used, resulting in higher estimates of variability in one case and lower estimates in the other.Here, we used a broader coverage of the genome, obtained from the larger quantity of markers used, which made it possible to evaluate more accurately the real genetic variability situation of the wheat.A Scherlosky et al.
Cluster analysis by the UPGMA method (Figure 3) had 12 groups formed containing more than one variety.The two largest groups contained 75 and 71 varieties, which represents 69% of the 211 varieties (Table 2).The third largest group comprised 26 varieties (group 10 in Table 2), with 172 (81%) of the set of 211 varieties clustered in these three groups.The other groups contained two to eight varieties.Five varieties did not cluster with any other (Ocepar 8 Macuco, Ocepar 20, Colonias, Anahuac, and Safira).
No relationships were observed between the groups and the breeding program of origin of the germplasm; i.e., the genetic variability observed in the wheat germplasm in Brazil is equally distributed across the local wheat breeding programs.The varieties developed by the institutions owning the largest number of wheat varieties in Brazil (Coodetec, Embrapa, OR Sementes, Fundacep, Iapar, and Biotrigo) are equivalently represented in the largest groups.This is a consequence of the so-called breeder's right, provided for in the PVP Law in Brazil (Law 9456 of 1997).Article 10 (subsection III) of that law allows breeders to use commercial varieties of any origin to perform crosses and originate new varieties.In this way, there is some sort of germplasm being shared across the many breeding programs, and the genetic base of breeding programs may be similar.
Varieties developed in Paraguay and Mexico were distributed across the groups proportionally to the size of these groups.The exception was Mexican variety Anahuac, which did not cluster with any other variety.This was an expected result, since the development of wheat varieties in Brazil involved frequent use of the germplasm developed by CIMMYT (Mexico).In Paraguay, wheat varieties are developed using both the CIMMYT germplasm and Brazilian varieties.The six varieties introduced from China were clustered in the two largest groups.This means there is not a clear distinction between the wheat germplasms from Mexico, China, and Paraguay, when compared with the Brazilian wheat germplasm.
To evaluate the evolution of genetic variability in wheat over time in Brazil, we considered only the 185 varieties developed in the country.Twentyone varieties were developed before 1980.Genetic distances among these varieties ranged from 0.05 to 0.40, averaging 0.33.The highest frequency of genetic distances among the varieties developed in Brazil before 1980 was between 0.30 and 0.40 (Figure 2); 78% of genetic distances were above 0.30 and over 93% of them were higher than 0.25.
In the group of Brazilian varieties, 35 were developed in the 1980s.Genetic distances among these varieties varied from 0.02 to 0.45, averaging 0.33.The highest frequency of genetic distances among the wheat varieties developed in the 1980s in Brazil lay in the range of 0.31 to 0.35 (Figure 2).Over 74% of genetic distances among the varieties developed in that period were higher than 0.30 and more than 92% were higher than 0.25.
In the 24 varieties developed in the 1990s, genetic distances ranged from 0.13 to 0.45, averaging 0.35.The highest frequency of genetic distances between the varieties developed in that decade was between 0.36 and   2).More than 83% of the genetic distances exceeded 0.30, and over 90% of them were greater than 0.25.
Genetic distances among the 64 varieties developed in the 2000s varied from 0.07 to 0.45, averaging 0.33.The highest frequency of genetic distances among the varieties developed in the 2000s was between 0.30 and 0.35; however, a high frequency of distances between 0.36 and 0.40 was also observed (Figure 2).Over 78% of the varieties developed in this period exhibited genetic distances greater than 0.3, and more than 94% were above 0.25.
For the 41 varieties developed in the 2010s until the year 2014, genetic distances ranged from 0.12 to 0.43, averaging 0.34.The highest frequency of genetic distances observed in these newly developed varieties was between 0.31 and 0.35, but there was also a high frequency of distances between 0.36 and 0.40 (Figure 2).More than 81% of the genetic distances observed in this group of varieties was above 0.30, with over 96% higher than 0.25.
In the last four decades, the average genetic distances among wheat varieties in Brazil have remained between 0.33 and 0.34, and maximum distances between 0.43 and 0.45.Before the 1980s, the maximum genetic distance was 0.40.From the 1990s onwards, there has been a trend towards an increase in minimum genetic distances among the wheat varieties developed in Brazil.This means that genetic variability among the wheat varieties produced in Brazil have had a slight upward trend in variability for more than 40 years, as observed by the increasing minimum distances among the recently developed varieties.
Analyses of genetic variability based on pedigree or morphological data usually demonstrate a reduction of genetic variability over time.The narrowing of genetic variability would typically occur in stages: initially, by the substitution of local varieties (landraces) for improved varieties, followed by modern breeding practices, especially through interspecific breeding of a small group of elite varieties.
Recent studies investigating genetic variability at DNA level have shown maintenance and, in some cases, increases in genetic variability with time.Manifesto et al. (2001) also observed that genetic variability was kept the same in Argentinean wheat varieties developed between the 1960s and the 1990s.In central and northern Europe, Huang et al. (2007) detected increased genetic variability among wheat varieties developed between 1950 and 1990.The same was noted by Balyan et al. (2008)  Genetic variability among the genotypes of different eras represented by the varieties developed in Brazil in each of the decades mentioned here ranged from 0.06 to 0.08.In spite of the small distances, a gradual evolution can be seen in these eras.The gene sets developed in the 2000s and in the 2010s are the closest, and both are closer to the gene set developed in the 1990s.The genotypes of these three eras are slightly farther from the genotypes developed in the 1980s and before (Figure 4).This demonstrates that although genetic variability was maintained, the current group of genotypes are farther from the older group of genotypes, suggesting that the germplasm is being modified (improved) by the breeding program without having its variability reduced.
In theory, plant breeding causes narrowing of genetic variability, because all breeding programs select the new germplasm in the same direction; i.e., high yields, disease resistance, grain quality, narrow maturity group interval, uniform plant high, and other characteristics.Crossing few number elite lines to generate new breeding populations results in increased inbreeding, which means, in theory, a reduction of genetic variability.Genetic distances between the individuals estimated by pedigree is based on probability.The probability of similarity of each pair of sisters, for instance, is the same.This means we need to assume that all pairs of sisters have the same genetic distance, which is not true.Accessing the genetic differences between the individuals through molecular markers makes it possible to quantify the real number of differences between individuals.To accurately estimate genetic variability, molecular markers need to adequately cover the genome of the species.High-density genotyping is the best way to better cover the genome with molecular markers.
Using estimates of genetic variability based on molecular markers also allows for better exploring variability, because molecular markers can reveal variability that cannot be accessed by other ways.In this work, we demonstrate that the wheat germplasm being used in Brazil has a good level of variability, and this variability has been maintained in the last four decades.Introducing germplasm and using commercial varieties from other companies is part of the strategy used in Brazil, and this needs to be continued to avoid the narrowing of variability in Brazilian wheat.

CONCLUSION
Wheat variability in Brazil has been maintained in the last four decades.The approach used by Brazilian breeders is effective in avoiding reducing genetic variability while increasing performance.Germplasm introduction and the possibility of breeders freely using commercial varieties from other companies as a source of variability, in crosses, could be the main reasons to allow the maintenance of genetic variability.

Figure 1 .
Figure 1.Distribution of the 10,049 SNP markers used to estimate the genetic distances in the 211 wheat chromosomes.

Figure 2 .
Figure 2. Frequency distribution of genetic distances among 211 wheat varieties using 10,049 SNP markers. A. Set of 211 varieties.B. Set of 185 Brazilian varieties, grouped by release decade.

Figure 3 .
Figure 3. Clustering of 211 wheat varieties including 190 Brazilian and 31 introduced varieties obtained by UPGMA.
in wheat varieties developed in India between 1910 and 2006; byKhlestkina et al. (2004) in Europe and Asia in wheat varieties developed from the 1920s to the 1980s (Austria), from the 1940s to the 1990s (Albania), from the 1930s to the 1970s (India), and from the 1930s to the 1970s (Nepal).In CIMMYT,Reif et al. (2005) and Warburton et al. (2006) observed an increase in genetic variability in wheat since the 1990s.Huang et al. (2007) reported an increase in genetic variability of wheat since the 1950s in the United Kingdom, andHysing et al. (2008) also described the same in United Kingdom since the 1970s.Prasad et al. (2009) reported an increase in genetic variability in the United States since the 1970s.Fu and Sommers (2009), on the other hand, observed a reduction in genetic variability over time in wheat varieties developed in Canada between 1845 and 2004.

Figure 4 .
Figure 4. Genetic distances and UPGMA cluster analysis among Brazilian wheat varieties as a function of the time they were developed.

Table 2 .
Clustering of 211 wheat varieties by UPGMA analysis based on the genetic distances obtained by high-density SNP marker data 0.40 (Figure