Genetic structure of Argentinean hexaploid wheat germplasm

The identification of genetically homogeneous groups of individuals is an ancient issue in population genetics and in the case of crops like wheat, it can be valuable information for breeding programs, genetic mapping and germplasm resources. In this work we determined the genetic structure of a set of 102 Argentinean bread wheat (Triticum aestivum L.) elite cultivars using 38 biochemical and molecular markers (functional, closely linked to genes and neutral ones) distributed throughout 18 wheat chromosomes. Genetic relationships among these lines were examined using model-based clustering methods. In the analysis three subpopulations were identified which correspond largely to the origin of the germplasm used by the main breeding programs in Argentina.


Introduction
Bread wheat (Triticum aestivum L.) is the principal winter crop grown in Argentina for both internal consumption and export. Genetically improved cultivars and agricultural practices have resulted in an increased average wheat yield during the past 41 years, from 1352 kg/ha in 1969 to more than 3500 kg/ha in 2010. To maintain this rate of wheat productivity, exploring the genetic variability at molecular levels in adaptation and yield components and integrating such information with conventional breeding methods will be critical (Chao et al., 2007). The identifications of genomic regions associated with relevant agronomic traits through QTL mapping using bi-parental populations, can now be complemented with alternative genetic mapping strategies like Association Mapping, in which the accurate determination of the population genetic structure is important for the appropriate association between the genotype and the phenotype (Breseghello and Sorrells, 2006;Peng et al., 2009;Le Couviour et al., 2011).
In wheat, the assessment of genetic structure in different populations has been historically based on qualitative and quantitative traits (Spagnoletti-Zeuli and Qualset, 1987;Van Beuningen and Busch, 1997b) and pedigree records (Burkhamer et al., 1998;Van Beuningen and Busch, 1997a;Bered et al., 2002). However, pedigree records are not always available or detailed enough for this type of analysis, especially when large numbers of breeding lines or cultivars are being assessed. More recently, biochemical markers such as variation in storage protein subunits (Metakovsky and Branlard, 1998;Lerner et al., 2009) and/or molecular markers like RAPDs, RFLPs, AFLPs, SSRs, DArTs and SNPs become alternative methods of obtaining a large amount of data for precisely calculating genetic relationship estimates (Mukhtar et al., 2002;Parker et al., 2002;Soleimani et al., 2002;Neumann et al., 2011;Chao et al., 2010). Particularly in Argentina the levels and patterns of genetic diversity among local wheat cultivars have been investigated using SSRs (Manifesto et al., 2001) and storage proteins (Lerner et al., 2009).
Frequently, the biochemical and/or molecular information has been analyzed using tree-based methods that calculate genetic distance between individuals and tree construction algorithms such as UPGMA or neighbor joining to group them in clusters (Sneath and Sokal, 1973;Saitou and Nei, 1987). An alternative model-based method developed recently by Pritchard et al. (2000) and implemented in the software Structure aims at delineating clusters of individuals on the basis of their genotypes at multiple loci using a Bayesian approach (Breseghello and Sorrells, 2006;Chao et al., 2010). However Evanno et al. (2005) demonstrated that in most cases the estimated "log probability of data" used in Structure fails to provide a correct estimation of the number of clusters, (K). Hence, they developed an ad hoc statistic DK based on the rate of change in the log probability of data between successive K values, and found that Structure accurately detects the uppermost hierarchical level of structure. Based on these parameters, Earl and von Holdt (2012) developed Structure Harvester, a website and software for visualizing the output based on Evanno et al. (2005).
In this report, we determined the genetic structure of a set of Argentinean wheat cultivars using a model-based approach. 102 cultivars representative of the main breeding companies in Argentina were characterized using a set of 38 biochemical and molecular markers, each mapped to a single chromosome location and distributed over 18 of the 21 wheat chromosomes.

Plant material
A set of 102 bread wheat cultivars registered in Argentina was selected to determine its genetic structure based on molecular and biochemical markers. This set included old and recent commercial cultivars selected from the main wheat breeding companies in Argentina. Seed stocks were kindly provided by the Instituto Nacional de Tecnología Agropecuaria (INTA) Marcos Juárez Wheat Germplasm Bank (Marcos Juárez, Argentina).

Glutenin analysis
Glutenins were extracted from single seeds and analyzed by SDS-PAGE according to protocols described by Pflüger et al. (2001). Glu-A1, Glu-B1 and Glu-D1 subunits were analyzed by SDS-PAGE in 8% polyacrylamide gels (16x18 cm) in a Hoefer electrophoresis system (Hoefer Inc. Holliston, MA, USA) at 30 mA/gel for approximately 12 h. The gels were stained with 0.2% (w/v) Coomassie Blue R-250 (Promega), in 5% (v/v) ethanol and 12% (w/v) trichloroacetic acid overnight and destained in tap water for 24 h.

Allele diversity
All cultivars were treated as pure lines. A small proportion of heterozygosity was observed, and the following criteria were used to define the working allele. In the case of SSRs, where cultivars displayed two bands with different intensities, only the stronger band was considered. Yet, if the two bands showed similar intensities, then the most frequent allele was considered. If none of these options could be applied, the sample was scored as missing data. In the case of biochemical (glutenins) and functional molecular markers, samples showing heterozygous alleles were scored as missing data. Rare alleles (with frequency lower than 5%) were treated as missing data for population structure. The effective number of alleles per locus was computed on the basis of common alleles as n e = 1/Sp i 2 (Hartl and Clark, 1997). The estimate n e represents the number of equally frequent alleles that would result in the same probability observed when randomly drawing two different alleles from the population. It is a measure of variability at the locus that takes into account both allele number and frequency. The polymorphism index content (PIC), a measure of allelic diversity, was calculated according to Nei's coefficient (Nei, 1973), PIC = 1-S(p i 2 ), where p i is the frequency of the i th polymorphism detected in the germplasm.

Population structure
Thirty-eight unlinked or distantly linked marker loci, distributed over all the wheat chromosomes, except 6A, 6B and 7B, were used for assessment of population structure. Population structure was investigated using a Bayesian clustering approach to infer the number of clusters (populations) with the softwares Structure v.2.3.3 (Pritchard et al., 2000) and Structure Harvester (Evanno et al., 2005;Earl and von Holdt, 2012). No prior information was used to define the clusters, and the number of subpopulations (K) was set from 1 to 10, without admixture and with correlated allele frequencies, burn-in phase of 10 5 iterations, and a sampling phase of 2 x 10 5 replicates, runs with K = 1 to 10 were repeated 10 times (Falush et al., 2003;Breseghello and Sorrells, 2006). This method estimates the proportion of the genomes of each individual derived from the different clusters and assigns individuals to subpopulations based on membership probability. We used the run that assigned all the cultivars to a single cluster at a probability > 0.50. The degree of differentiation of each subpopulation was measured by a modified F ST parameter (Falush et al., 2003). The program Genetix (Belkhir et al., 1996(Belkhir et al., -2004 was used to compute an overall F ST (Weir and Cockerham, 1984) and to conduct multiple correspondence analysis, with three dimensions.

Results
The genetic diversity of a collection of 102 bread wheat cultivars from Argentina was assessed using 35 molecular markers (13 functional markers, 4 markers closely linked to genes, 17 SSR, and 1 ISBP) and 3 storage proteins (Table 1). A total number of 124 alleles was detected in the panel, including 21 rare alleles (with frequencies lower than 5% in the panel) that were discarded in the population structure studies. The number of alleles per locus varied between two and seven, with an average of 3.26. The average numbers of common (excluding rare alleles in the analysis) and effective alleles (n e ) were 2.65 (from 1 to 6) and 2.05 (from 1.01 to 4.76), respectively. Polymorphism information content (PIC) values obtained from the 38 polymorphic markers varied between 0.076 and 0.788, with an average of 0.458. The mean frequency of missing data was 0.43%, or 1.65% when rare alleles were included (Table 1).
We explored the population genetic structure among the accessions using a model-based method. A modelbased method is a cluster analysis that evaluates genetic similarity among genotypes without using prior information. After a first analysis with the Structure program we could not determine precisely the K number (number of subgroups or subpopulations) in the population, as the curve of the Ln probability of data [L(K)], did not generate a plateau after K = 10 ( Figure 1A). Therefore, we used the output of Structure as input data for Structure Harvester, now obtaining a clear peak with the highest DK value at K = 3 ( Figure 1B). The analysis showed that three subpopulations were optimal for assigning all except 13 cultivars into one among the three clusters, with an a posteriori probability > 0.80. The 13 genotypes assigned to individual clusters with an a posteriori probability > 0.50 (but < 0.80) are underlined in Table 2. The three subpopulations K1, K2 and K3 included 17, 48 and 37 cultivars, respectively, with F ST averages slightly higher in K1 (0.1939) than in K2 and K3 (0.1279 and 0.1218, respectively), this evidencing a moderated differentiation within subgroups. Furthermore, the F ST value across subpopulations was 0.1485, indicating a moderate differentiation also between subgroups. Figure 2 shows the projection of the multiple correspondence analysis (MCA) cloud on one orthogonal plane, with different symbols identifying each subpopulation according to the classification from Structure. The cloud was continuous, with three protrusions corresponding to the three subpopulations. In agreement with F ST estimates, subpopulations 2 and 3 were less dispersed than 1.

Discussion
Genetic variability of the panel The means for allele numbers and PIC values observed across all markers used in this work were 3.26 and 0.458 respectively (Table 1). These values were considerably lower than those observed for SSR loci (9.4 and 0.720; Manifesto et al., 2001) and for storage proteins (5.00 and 0.544; Lerner et al., 2009) for different but overlapping collections of wheat cultivars from Argentina. The mean allele number detected herein was also significantly lower than those revealed by SSR analysis in US wheat germplasm (4.8 and 7.2 alleles), reported by Breseghello and Sorrells (2006) and Chao et al. (2007) respectively. It was also lower than the 6.20 and 7.49 alleles observed by Plaschke et al. (1995) and Le Couviour et al. (2011) in European wheat germplasm, and the 5.4 alleles detected by Dreisigacker et al. (2004) in CIMMYT germplasm. Finally, Balfourier et al. (2007) working with a worldwide wheat collection of 3,942 entries from 73 countries detected the very high average value of 23.9 alleles per locus. These large differences in the number of alleles detected may be due to differences in the technologies used to detect polymorphism, as well as the type of molecular markers selected for the characterization (and/or the rate and amount of the germplasm evaluated).

Markers associated with traits of agronomic interest
In our study we selected a set of markers (biochemical, functional markers and closely linked to genes markers) related with relevant traits for breeding, like growth habit and/or vernalization response (Vrn-A1, Vrn-B1 and Vrn-D1), photoperiod sensitivity (Ppd-D1), plant height (Rht-B1, Rht-D1), grain texture (PinA-D1), starch waxy proteins variants (Wx-A1 and Wx-B1), PPO activity (Ppo-A1, Ppo-D1), variants of the Viviparous-1B gene (Vp1-B3) associated with pre-harvest tolerance (Yang et al., 2007), low molecular weight glutenins (Glu-A3) and high molecular weight glutenins (Glu-A1, Glu-B1, Glu-D1) The natural variation scanned with markers based on winter/spring allelic variants from Vrn-A1, Vrn-B1 and Vrn-D1 loci confirmed the spring growth habit as the best adapted for the wheat production area of Argentina (91 of the tested cultivars carried at least one spring allele, considering Vrn-A1, Vrn-B1 and Vrn-D1 loci, vs. 11 cultivars with the triple winter alleles combination). These data agree with previous phenotypic (Appendino et al., 2003) and molecular data (Fu et al., 2005). We also noticed a higher frequency of the photoperiod insensitive (PI) alleles Ppd-D1a and/or Ppd-B1a alleles (74 cultivars) than the combination of Ppd-D1b and Ppd-B1b alleles associated with photoperiod sensitivity (PS) (28 cultivars), this suggesting a better adaptation of photoperiod insensitivity to the environmental conditions in Argentina (between 27°and 38°S). A high frequency of PI alleles was also observed in low latitude re- 394 Vanzetti et al. gions of Japan (36°N), associated also in this case with early flowering to avoid rains at harvest and preharvest sprouting (Seki et al., 2011). Unlike this situation, Lanning et al. (2012) evaluated PS and PI spring near-isogenic lines (NILs) and observed better agronomic perfomance in PS NILs planted at higher latitudes (between 45°and 54°N) and considering early planting dates, as no difference between PS and PI lines occurred for the latest planting date.
These data would support a better adaptation of spring, photoperiod insensitive and semidwarf wheats to dominant environments in Argentina, however, a fine tuning evaluation of spring NILs carrying different combinations of vernalization, photoperiod insensitivity, as well as plant height alleles is still a pending issue.
In the case of markers closely linked to Lr genes, the most valuable information is perhaps, the relatively high number of cultivars that probably possess the adult plant leaf rust resistance gene Lr34 (20 cultivars), a finding which agrees with Vanzetti et al. (2011). This gene has supported resistance to leaf rust in wheat for more than fifty years and is extensively used in breeding programs worldwide (Krattinger et al., 2009).
Relevant information for bread-making quality can be the presence of the Glu-B1 7oe subunit associated with improved dough strength of wheat (Butow et al., 2004) in ten cultivars. Valuable alleles for the development of culti-Argentinean wheat genetic structure 395  vars with superior bread quality, partial waxy wheats, low PPO activity, and pre-harvest sprouting tolerance were also detected in the panel.

K = 3 is associated with the main breading programs in Argentina
In this work, using a model-based approach we detected three subpopulations in the collection of 102 Argentinean wheat cultivars. Our hypothesis is that this subpopulation division actually reflects the origin of the germplasm used by the main breeding programs in Argentina. For example, K1 is composed mainly (70.58%) by cultivars from Nidera and Syngenta breeding programs, as 100% of the cultivars tested and released by these companies were grouped only in K1. It is worthy of note that this germplasm (at least early materials released by Nidera (Bulos et al., 2006), has a European origin, mainly from France, and was introduced gradually to Argentina since 1999. The K1 subpopulation also includes old cultivars from the Klein breeding program, like Klein 32 (released in 1932), Klein Atlas (1963) andKlein Centauro (1989) (Figure 3). K2 is mainly composed (60.41%) by cultivars from Klein, INIA and Don Mario breeding programs, the 72% of Klein, 87.5% of INIA and 80% of Don Mario tested cultivars were included in K2 (Figure 3). The cultivars grouped in K2 are basically (1) introductions and selections made in CIMMYT, as well as crosses made in Argentina, including CIMMYT material like Bobwhite, Kavkaz, Pastor, Seri, Veery and Weebill, (2) introductions from Brazil (some materials from Don Mario Breeding Company), and (3), to a lesser degree, materials selected from crosses including traditional germplasm from Argentina. Finally, K3 is composed mostly by cultivars belonging to the Buck Breeding Program (43.24%). INTA and ACA Breeding Programs have an even distribution of their tested cultivars between the K1 and K2 subpopulations (Figure 3). The cultivars grouped in K3 are mostly derived from traditional germplasm from Argentina and, to a lesser degree, from CIMMYT.
A similar type of grouping of cultivars by geographic origin and breeding history using a model-based approach was observed by Le Couviour et al. (2011) working with an elite wheat panel from Europe. They identified four subpopulations, including cultivars from UK, Germany and France divided into two subgroups and proposed that the separation between French, German and UK cultivars can be explained by the geographic origin and, in the case of France, the conformation of two subgroups as being due to the breeding history. Furthermore, Chao et al. (2007), when using a similar approach to analyze the genetic structure of U.S. wheat cultivars and breeding lines, found four subpopulations and suggested that the genetic diversity existing among these U.S. wheat germplasm was influenced by regional adaptation. Our data would suggest that in Argentina the most important factor explaining the genetic variability of adapted commercial bread cultivars would be the different core collections of germplasm used by the main breeding programs instead of geographic adaptation, as observed in Europe and US.
The results obtained in this paper are a very valuable source of information for breeding programs for the creation of novel combinations of alleles from genes involved 396 Vanzetti et al.  in adaptation, disease resistance and bread-making quality between other traits. Additionally, the genetic structure of the panel of cultivars analyzed in this study is being used as the starting point of association studies considering additional phenotypic traits of interest for the breeding like drought tolerance and yield components like kernel weight between others.

Supplementary Material
The following online material is available for this article: Table S1 -Molecular markers used in the study. Table S2 -Biochemical and functional molecular markers in Argentinean wheat cultivars. Table S3 -Neutral molecular markers in Argentinean wheat cultivars.
This material is available as part of the online article from http://www.scielo.br/gmb.

Associate Editor: Everaldo Gonçalves de Barros
License information: This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.