Genetic associations in the detection of QTLs for wheat spike ‐ related traits

The objective of this work was to assess the genetic diversity and population structure of wheat genotypes, to detect significant and stable genetic associations, as well as to evaluate the efficiency of statistical models to identify chromosome regions responsible for the expression of spike‐related traits. Eight important spike characteristics were measured during five growing seasons in Serbia. A set of 30 microsatellite markers positioned near important agronomic loci was used to evaluate genetic diversity, resulting in a total of 349 alleles. The marker‐trait associations were analyzed using the general linear and mixed linear models. The results obtained for number of allelic variants per locus (11.5), average polymorphic information content value (0.68), and average gene diversity (0.722) showed that the exceptional level of polymorphism in the genotypes is the main requirement for association studies. The population structure estimated by model‐based clustering distributed the genotypes into six subpopulations according to log probability of data. Significant and stable associations were detected on chromosomes 1B, 2A, 2B, 2D, and 6D, which explained from 4.7 to 40.7% of total phenotypic variations. The general linear model identified a significantly larger number of marker‐trait associations (192) than the mixed linear model (76). The mixed linear model identified nine markers associated to six traits.


Introduction
Wheat (Triticum aestivum L.) breeders worldwide invest a great deal of effort into creating cultivars able to challenge rising global issues, such as ongoing climate changes and a growing world population.A tendency in the breeding process is introducing novel techniques and approaches that could improve current conventional breeding programs.Particularly, the advancement in the field of molecular biology by applying genetic marker technologies and new statistical approaches are powerful tools for indirect selection of valuable traits through marker-assisted selection (Landjeva et al., 2007).The detection of specific and precisely tagged chromosome regions responsible for the expression of certain agronomic traits could be an excellent contribution for the selection and generation of new high-yielding wheat varieties.Likewise, the knowledge of population diversity and structure is of major importance for an efficient use of elite lines and varieties in a breeding process (Laido et al., 2013).
Spike-related traits are important yield components, which are less environmentally sensitive and exhibit higher heritability than yield per se (Cuthbert et al., 2008).The analyses of the genetic control of spike-related characteristics and of individual effects of different genes and quantitative trait loci (QTL) could provide specific information and be useful for indirect determination of yield improvement (Ma et al., 2007).In the last few years, association mapping has been considered one of the most promising methods for the exploration of the entire genome in the search of preferred chromosome regions, QTLs, and desired genes (Liu et al., 2010).The association mapping approach provides a greater potential for the identification of targeted QTLs and fine tuning and mapping of genes at a higher resolution than the previously used linkage mapping.Based on linkage disequilibrium, association mapping is applied directly to diverse genetic materials, resulting in a larger number of detected alleles per locus in a more representative genetic background.It also represents a higher resolution system due to the recombination events that have been accumulated during selection circles through evolution and historical breeding processes (Haseneyer et al., 2010).Cultivars genotyped with high-density markers and their associations show promise in resolving the genetic basis of complex traits of agronomic and economic importance (Wang et al., 2012).The analysis of complex traits by association mapping is required for breeders, since it facilitates even more the application of associated markers in the breeding process.One of the first association mapping studies in wheat aimed at identifying significant markers for kernel size and milling quality (Breseghello & Sorrells, 2006).Subsequently, a large number of works used genome-wide association studies (GWAS) to detect marker-trait associations (MTAs) for a large number of traits, including quality traits in soft wheat (Reif et al., 2011), yield and other agronomic traits in wheat (Liu et al., 2010), and seed longevity in hexaploid wheat (Rehman Arif et al., 2012).In bread wheat, a number of yield-component QTLs was associated with spike-related and adaptive traits (Neumann et al., 2011).The Tassel software (Bradbury et al., 2007) is one of the most sophisticated software programs with implemented algorithms and methods useful for association studies.The structure association analysis developed by Pritchard et al. (2000) first uses a set of random markers to estimate the population structure (Q matrix) and then incorporates this estimation into a general linear model (GLM) analysis.Yu et al. (2006) developed a new methodology, the mixed linear model (MLM) method, which incorporates both the population structure and the familial relatedness or the so-called "kinship" (K matrix), adapted for GWAS, to avoid false associations.This method is recommended in the absence of available pedigree data for clustering a large dataset into groups with improved statistical power (Zhang et al., 2010).
The objective of this work was to assess the genetic diversity and population structure of wheat genotypes, to detect significant and stable genetic associations, as well as to evaluate the efficiency of statistical models to identify chromosome regions responsible for the expression of spike-related traits.

Materials and Methods
A set of 283 wheat accessions originating from 24 countries was used for phenotype evaluation (Table 1).These varieties are part of the largest Wheat Core Collection in Serbia, which belongs to the Small Grains Department of the Institute of Field and Vegetable Crops in Novi Sad.The genotypes were sown in a randomized complete block design in a 1.2 m 2 plot, containing six rows, with a distance of 20 cm between rows.Field plots were cultivated at Rimski Šančevi (45°20'N, 19°51'E) in Novi Sad, Serbia, by applying standard agrotechnical practices (Malešević et al., 1994).The following spike-related traits were measured and recorded for association analysis, during five growing seasons, from 1995 to 1999: spike length,  (Pritchard et al., 2000).Genomic DNA from all varieties (approximately ten plantlets per genotype) was isolated from fresh young leaves using the CTAB protocol described by Doyle & Doyle (1990).Wheat genotype population was profiled with 30 microsatellite markers out of 41 initial markers, excluding 11 with non-specific PCR products.The sequences of SSR markers were taken from the GrainGenes database (GrainGenes, 2014) (Table 2).The additional variety Chinese Spring was used as a positive control.Microsatellites were positioned along almost all three genomes and located near previously detected important QTLs.PCR amplifications were carried out according to the protocols given by Röder et al. (1998).The reaction in 10 µL volume contained 30 ng of DNA template, 1x buffer solution, 2 mmol L -1 dNTPs, 1.5 mmol L -1 MgCl 2 , 10 pmol of fluorescently labeled forward and unlabeled reverse primers, and 1 unit of Taq polymerase.PCR started with an initial denaturation at 94°C for 5 min, followed by 40 cycles of 94°C for 30 s, 52-62°C for 45 s, and 72°C for 45 s.The final extension was 10 min at 72°C.The PCR amplicons were separated by size using capillary electrophoresis on an ABI Prism 3130 genetic analyzer (Applied Biosystems, Foster City, CA, USA).The reaction volume of 10 µL consisted of 2 µL of mixed differently-labeled PCR products, 0.2 µL of GeneScan 500 LIZ size standard (Applied Biosystems, Foster City, CA, USA), and 7.8 µL of Hi-Di formamide.The dye-labeled products were identified by fluorescence detection, and microsatellite analysis was performed using the GeneMapper software, version 4.0 (Applied Biosystems, Foster City, CA, USA).
The parameters of genetic diversity were calculated with the PowerMarker software, version 3.25 (Liu & Muse, 2005).The population structure based on genetic data was estimated by the Bayesian algorithm implemented in the Structure software, version 2.3.4 (Pritchard et al., 2000).The hypothetical number of clusters was set ranging from 1 to 20, whereas the length of the burn-in and the Markov chain Monte Carlo (MCMC) were determined at 100.000.The real number of subpopulations was obtained by comparing log probabilities of data Pr [X|K], and corrections were done according to Evanno et al. (2005).The selection of the most appropriate number of subgroups was a critical step for further association analysis.Determination of internal genetic structure was done by additional analysis through principal coordinate analysis (PCoA).
The marker-trait associations were analyzed in the Tassel software, version 2.1.(Bradbury et al., 2007) using two models: GLM and MLM (Yu et al., 2006).The Q matrix for further association analysis was determined based on the average value of three iterations of log probability of data obtained by the Structure software (Pritchard et al., 2000).In order to define the level of genetic covariance between pairs of individuals, a kinship (K) analysis was carried out by molecular data, converting the distance matrix to a similarity matrix using the Tassel software (Bradbury et al., 2007).The magnitude of QTL effects was explained by the R 2 parameter.The descriptive statistics of all phenotypic data was performed in the Statistica software, version 10 (Statsoft, Tulsa, OK, USA).

Results and Discussion
A total of 349 alleles was detected in 30 SSR loci, and the mean number of alleles per loci was 11.5 (Table 3).This result was higher than the diversity (7.2) found among USA wheat accessions (Chao et al., 2007).Chen et al. (2003) reported extremely low values of mean alleles per locus and other polymorphism parameters as a result of the genotype's specific region of origin, which led to a narrowing of genetic diversity.The sufficient genetic variation observed in the material evaluated in the present study was confirmed in other studies with the materials from the same core collection (Kobiljski et al., 2002).However, since the previous analysis was performed on only 96 genotypes, the mean average number of alleles per locus (7.96) was lower than in the present study.The average number of polymorphic information content (PIC) value was 0.688, representing a highly significant level of genetic polymorphism.Considering the cosmopolitan origin of the studied varieties (Table 1), the breeding material indicates a broad genetic diversity that proved to be an excellent base for further research.
The population structure distributed genotypes into six subpopulations using log probability of data obtained by the Structure software (Figure 1), whereas the corrections of the number of clusters (ΔK) according to Evanno et al. (2005) indicated the distribution of genotypes into three existing subpopulations (Figure 2).Evanno's corrections generally predicted the existence of two or three subpopulations regardless of the number and diversity of the investigated materials (Vigouroux et al., 2008), which was confirmed in the present study.The classification of 283 genotypes was more effective in discriminating the genotypes toward log probability of data.The largest group (Q5) consisted of 114 genotypes, mainly originating from Serbia, whereas the smallest group (Q3) included 18 cultivars, mostly from the USA.The other subpopulations consisted of 37 genotypes (Q1), with diverse geographic origin; 45 genotypes (Q2), Table 2. Microsatellite markers, sequences of forward and reverse primers, annealing temperature (Tm), repeated motif, and expected amplicons in the Chinese Spring variety of wheat (Triticum aestivum), used as a positive control.mostly from England and France; 30 genotypes (Q4), from Croatia; and 40 genotypes (Q6), from the USA.
The distribution could be explained partly by geographical origin and partly by pedigree data.Likewise, strict distribution according to origin is difficult because of the use of breeding and elite lines through and by different breeding centers.Even the distribution of genotypes originating from the same regions points to a similar selective pressure in wheat breeding during domestication and the subsequent breeding process (Laido et al., 2013).Moreover, internal genetic structure using PCoA separated the largest subpopulation (Q5) and group (Q2) mostly consistent with grouping by the Structure software (Figure 3).In addition, the groups from Croatia (Q4) and from the USA (Q3) took a particular position in the coordinate system, whereas the remaining two clusters (Q1 and Q6) showed dispersed distribution in the coordinate system.However, certain overlapping  (Pritchard et al., 2000).(Pritchard et al., 2000).Q1 to Q6, genotype clusters on the Q matrix.within some subpopulations could be a result of the frequent use of certain varieties as parents, as well as of the inclusion of a great number of genotypes into the analysis.Population structure determined by model-based clustering in the Structure software was the most appropriate tool for determining genetic structure and a key component for further association studies (Yu et al., 2006).
The total number of detected marker-trait associations in the five evaluation years was of 192 using the GLM method, but decreased to 76 for all analyzed traits and years using the MLM approach (Table 4).The advantage of the MLM approach is the detection of more real loci associated with agronomic traits, without false positive associations (Zhang et al., 2013).Neumann et al. (2011) suggested the usefulness of both models because a great number of associations could be neglected using only the MLM, resulting in many MTAs that might not be recognized as potential loci.This statement is in accordance with Yu et al. (2009), who proposed that new loci detected by GLM are also useful and should be additionally validated to avoid false-positive associations.Furthermore, the differences detected by these two models could be trait-dependent (Neumann et al., 2011).
It is important to highlight that only the stable associations detected in more than three evaluation years, at 1% probability, using the GLM and MLM approaches, were reported (Table 5).Four closely located markers (wmc18, wmc167, wmc144, and gwm157) on chromosome 2D were significant for the detection of QTLs for number of spikelets per spike, number of sterile spikelets per spike, and grain number per spike.This observation agrees with the results of high partial correlations obtained for these traits (Table 6).Besides being a carrier of three key genes for height reduction (Rht8), photoperiod (Ppd1), and yellow rust (Yr16), which are essential for adaptation, chromosome 2D contained most markers associated with the agronomically important traits.
The proximity region of the Ppd-1 gene, near gwm484, was responsible for the expression of many yield components and spike morphology, showing its high value for wheat improvement (Dodig et al., 2012).On the integrated genetic map of this chromosome created with scaffolds and markers in Aegilops tauschii, Jia et al. (2013) identified 33 QTLs or genes.One of them was the QTL for test weight near marker wmc167, which was significant for spike-related traits in the present study.Marker wmc144 showed the highest effect on phenotypic variation of spikelets per spike with mean value of 40.7%.QTLs for grain number per spike and spike length were found in association with marker gwm294, derived by Yao et al. (2009), located on the long arms of chromosome 2A. (1)Measured only in three years, and significant marker-trait associations in more than two years.ns Nonsignificant.
In the present study, this marker showed similar effects on the phenotypic variation of these traits (13 and 5%, respectively) (Table 5).Also, two markers (gwm294 and cfa2086) on chromosome 2A were associated with peduncle length apart from the previously detected QTL for this trait on chromosome 6A (Neumann et al., 2011).This trait has attracted great interest in recent studies due to its importance in avoiding ear diseases.Grain number per spike is one of the most important yield components of wheat (Ma et al., 2007), which was associated with the largest number of markers evaluated, i.e., five (Table 5).The specific marker for grain number was barc101 (2BL), which has not been previously associated with this trait, indicating the presence of a new QTL.The presence of QTLs near marker gwm11 for a large number of agronomic and adaptive traits has been proven by Wang et al. (2009), whereas, in the present study, the only association of this marker was found with sterile spikelets per spike.
Only a limited number of QTL studies for sterile spikelet number per spike have been documented (Ma et al., 2007).The coefficient of variation for sterile spikelets per spike obtained by descriptive statistics was extremely high (Table 6), probably due to the selection of a relatively small number of varieties with branched architecture of wheat spikes.Grain weight per spike and spike weight were the only traits with absence of stable associations in more evaluation years.
Using the collection of genotypes with a high level of polymorphism for association analysis and finding stable QTLs over a course of multiple years could be useful for the breeding process (Maccaferri et al., 2008).A potential new flowering-time gene on chromosome 6D (psp3200) was detected in similar material from the same core collection under contrasting water regimes (Dodig et al., 2012).However, this region has not shown importance for spike characteristics considering field conditions.The unique association between marker wmc333 on chromosome 6A and spike length detected in the present study could indicate the presence of new potential QTL with minor effect.

Conclusions
1.The evaluated collection of wheat (Triticum aestivum) genotypes shows genetic diversity, and population structure is an important tool for association analysis.
2. A significant number of associations is stable for six spike-related traits.
3. The statistical models evaluated increase the accuracy and power of the association analysis.
4. The new chromosome regions identified as responsible for spike-related traits are useful for wheat breeding programs.

Figure 1 .
Figure 1.Population structure of 283 wheat (Triticum aestivum) genotypes estimated using the model-based Bayesian algorithm implemented in the Structure software(Pritchard et al., 2000) performed with 30 microsatellite loci.Q1 to Q6, genotype clusters on the Q matrix.

Figure 2 .
Figure 2. Correction of number of clusters (ΔK) according to Evanno et al. (2005) for the different Bayesian clustering analyses implemented by the Structure software(Pritchard et al., 2000).

Figure 3 .
Figure 3. Principal coordinate analysis of the 283 wheat (Triticum aestivum) varieties.Each mark represents a sample obtained by the Structure software(Pritchard et al., 2000).Q1 to Q6, genotype clusters on the Q matrix.

Table 1 .
Wheat (Triticum aestivum) varieties and lines, origin, and distribution of subpopulations (genotype clusters, Q) obtained by the Structure software length, number of spikelets per spike, number of sterile spikelets per spike, spike index, spike weight, grain weight per spike, and grain number per spike.

Table 3 .
Basic parameters of genetic diversity in wheat (Triticum aestivum).
PIC, polymorphic information content.

Table 4 .
Total number of marker-trait associations (p≤0.01)detected with the general linear model (GLM) and the mixed linear model (MLM) methods in three evaluation years.

Table 5 .
Markers associated (p≤0.01) with spike-related traits in more than three evaluation years using the mixed linear model method, and the mean value of phenotypic variation (%).