Genomic variation and population structure detected by single nucleotide polymorphism arrays in Corriedale, Merino and Creole sheep.

THE AIM OF THIS STUDY WAS TO INVESTIGATE THE GENETIC DIVERSITY WITHIN AND AMONG THREE BREEDS OF SHEEP: Corriedale, Merino and Creole. Sheep from the three breeds (Merino n = 110, Corriedale n = 108 and Creole n = 10) were genotyped using the Illumina Ovine SNP50 beadchip(®). Genetic diversity was evaluated by comparing the minor allele frequency (MAF) among breeds. Population structure and genetic differentiation were assessed using STRUCTURE software, principal component analysis (PCA) and fixation index (FST). Fixed markers (MAF = 0) that were different among breeds were identified as specific breed markers. Using a subset of 18,181 single nucleotide polymorphisms (SNPs), PCA and STUCTURE analysis were able to explain population stratification within breeds. Merino and Corriedale divergent lines showed high levels of polymorphism (89.4% and 86% of polymorphic SNPs, respectively) and moderate genetic differentiation (FST = 0.08) between them. In contrast, Creole had only 69% polymorphic SNPs and showed greater genetic differentiation from the other two breeds (FST = 0.17 for both breeds). Hence, a subset of molecular markers present in the OvineSNP50 is informative enough for breed assignment and population structure analysis of commercial and Creole breeds.


Introduction
Natural and artificial selection processes in domestic animals occurred over a relatively short period of time (10,000 years) (Zeder, 2008), leading to a wide range of phenotypes with large genetic differences between breeds (Kijas et al., 2009). Genomic information and the vast databases of polymorphisms that are currently available provide a unique opportunity to study loci that were under selection between breeds in this recent history (Hayes et al., 2008;Flori et al., 2009). Several genomes have been sequenced and provide the information needed to identify single nucleotide polymorphisms (SNPs) (Ding and Jin, 2009), the most common variations in DNA sequences. The abundance of SNPs, their genome-wide distribution and the availability of a large number of commercial platforms for high-throughput genotyping make SNPs ideal markers for a variety of genomic studies (Ding and Jin, 2009). The OvineSNP50 beadchip ® (OvineSNP50), which was developed by Illumina in collaboration with the International Sheep Genomics Consortium (ISGC), is a genome-wide genotyping array for the ovine genome. This beadchip contains 54,241 SNPs that were chosen for being uniformly widespread across the ovine genome (with a coverage involving an average gap size and distance of 50.9kb and 46kb, respectively) and for their high levels of polymorphism in the more than 75 economically important sheep breeds in which they were evaluated by the ISGC (OvineSNP50 Datasheet -Illumina 2010).
The information provided by the OvineSNP50 has many applications, including identification of quantitative trait loci, genome-wide association studies (GWAS), characterization of genetic variability among breeds, genomic selection and genetic comparison between breeds (Kijas et al., 2009;Alam et al., 2011). Differences in genotyping results reported among breeds  may have unfavourable effects on these applications, e.g., the contribution of population stratification in generating false positive results in GWAS (Kemper et al., 2011). For this reason, many studies have focused on assessing the genetic relatedness and substructure within livestock populations in GWAS (Zenger et al., 2007). As part of current efforts to undertake GWAS in sheep populations, the aim of this study was to use the OvineSNP50 beadchip to investigate the genetic diversity and population structure within and among Corriedale, Merino and Creole breeds.

Selected animals
Data from 228 animals (98 Corriedale and 100 Merino lambs, and 10 sires of each of the three breeds) were analyzed. The Corriedale lambs were from CIEDAG (Centro de Investigacion Doctor Alberto Gallinal), a research station of the Uruguayan Wool Secretariat (SUL); since 1998 this breed has been bred selectively for resistance/susceptible (R/S) to gastrointestinal parasites (GIP). This genetic selection has been based on Expected Progeny Differences (EPD) for fecal worm egg count (FEC), as an indicator of the genetic merit of R/S to GIP (Castells and Gimeno, 2011). Merino lambs were sampled from the Fine Merino Nucleus maintained at the Tacuarembó Experimental Station of the National Agricultural Research Institute (INIA) of Uruguay. These animals had been selected for wool quality improvement (Montossi et al., 2005).
To minimize the possible influence of relationship in the dataset, Merino and Corriedalelambs with an average relatedness coefficient value (AR) < 0.04 were selected. The AR values were calculated using ENDOG v. 4.6 software (Gutiérrez and Goyache, 2005) based on the pedigree of Merino and Corrriedale. Similarly, 10 Corriedale and 10 Merino sires that were unrelated for at least two generations were genotyped. The second selection criterion was the occurrence of extreme values for the FEC of EPD for these two breeds. Finally, 10 Creole sires were selected from the San Miguel National Park in Rocha, Uruguay. Since there are no pedigree records for the Creole flock, the selection criterion was based on maximizing the observed phenotypic variation (wool colour and presence of one or two pairs of horns).
DNA extraction, genotyping and quality control DNA was isolated from blood samples (Medrano et al., 1990) and then genotyped in GeneSeek (Lincoln, NE) using the OvineSNP50 ® beadchip (Illumina, San Diego, CA). Samples with call rates < 80% were not considered for further analysis. Likewise, SNPs with call rates < 90% were excluded. Only autosomal markers were evaluated and Map OAR v. 2.0 was used in this study.

Genetic diversity
Basic indices of genetic diversity, including the percentages of polymorphic markers, observed heterozygosity and expected heterozygosity, were calculated using all autosomal markers in conjunction with GoldenHelix SNP & Variation Suite (SVS) software (GoldenHelix Inc., Bozeman, MT, USA). For each group, the number of animals (n) with a call rate > 80%, the number of autosomal SNPs with a call rate > 90% and the average observed heterozygosity (Ho) and expected heterozygosity (He) were calculated using all SNPs, based on the observed genotype frequencies. All of this information is presented in Table 1.
Allele frequencies were calculated with Golden Helix SVS software for each breed and used to assess the Ovine SNP50 performance. Rare and fixed alleles were determined by the minor allele frequency (MAF), with MAF < 0.01 and equal to 0, respectively. Highly polymorphic alleles were defined as those with MAF values between 0.3 and 0.5. A comparative analysis was used to identity which SNPs were fixed or polymorphic in the genome of each breed.

Genetic population structure
The subset of autosomal SNPs used in these analysis (STRUCTURE, principal component analysis -PCA and the fixation index -F ST ) for each breed were in moderate linkage disequilibrium (r 2 < 0.4, with MAF > 0.01), were not in Hardy-Weinberg equilibrium and were common to the three breeds. These criteria were applied to identify informative molecular markers for each breed.
Subdivisions between and within the populations were evaluated using the linkage model of STRUCTURE software version 2.3.3 (Pritchard et al., 2000), with the number of clusters (K) ranging from two to nine. Ten runs of 10,000 iterations after a burn-in of 10,000 iterations were performed for each K. The most appropriate K was identified by the DK method (Evanno et al., 2005). Population structure was also inferred by PCA of the marker data (Patterson et al., 2006). The genetic differentiation between breeds was calculated using the F ST (Weir and Cockerham, 1984), which assesses the reduction in genotypic heterozygosity. This parameter can range from zero (no genetic divergence between the subpopulations or from the ancestral population) to one (complete isolation of the subpopulations from each other and from the overall population). Moderate and large differentiations had F ST values ranging from 0.05 to 0.15 and 0.15 to 0.25, respectively. The PCA, F ST and marker filters were calculated using Golden Helix SVS software. 390 Grasso et al.

Results and Discussion
Genotypes and quality control Most samples (96.9%) had call rates > 80% (only seven samples had call rates < 80% and were therefore excluded from the analysis). Overall, 107 Corriedale, 104 Merino and 10 Creole samples were used for this study. Despite the differences among the three breeds, our results showed that OvineSNP50 had a coverage similar to that expected in the design of the beadchip (average gap size of 50.9 kb, OvineSNP50 Datasheet -Illumina, 2010), as well as a good genotyping quality as shown by the high sample and marker call rates (Table 1). Although neither Corriedale nor Creole were included among the genotyped breeds in the original beadchip validation (OvineSNP50 Datasheet -Illumina, 2010), the results obtained here were promising.
Not all SNPs present in databases have been validated and their level of polymorphism in different sheep breeds is unknown. However, in this study, the OvineSNP50 showed similar coverage for the three breeds (one SNP per 50.67 kb, 50.77 kb and 50.76 kb for Corriedale, Creole and Merino, respectively). Table 1 shows the three indices of genetic diversity that were calculated for each population. Based on the observed heterozygosity and expected heterozygosity, the Merino and Corriedale populations were more diverse than Creole. The elevated diversity of Merino and Corriedale animals was also seen in the distribution of minor allele frequencies. As shown in Table 2, Creole animals had an excess of low allele frequency SNPs (< 0.01) compared with the other two breeds (> 0.30).

Fixed SNPs
The number of fixed markers varied among breeds with 874, 1449 and 13790 SNPs for Corriedale, Merino and Creole sheep, respectively. Ninety-three percent of Corriedale SNPs with MAF = 0 were shared with Creole sheep and 69% with Merino sheep. In Merino, 90% of the fixed molecular markers were also fixed in Creole sheep. Comparison of the three breeds showed that they shared 585 fixed SNPs (Figure 1). Willing et al. (2012) tested the behavior of three estimators that infer F ST and that are commonly used in population genetic studies. Their results showed that population sample size can be significantly reduced (to as small as n = 4-6) by using an appropriate estimator and a large number of bi-allelic genetic markers (> 1,000). Thus, conservation genetic studies can yield almost the same statistical power as studies done on model organisms using markers developed with next-generation sequencing.
The Venn diagram in Figure 1 shows common and unique SNPs among the three breeds. The number of exclusive (fixed) SNPs was 99, 99 and 11,190 for Corriedale, Merino and Creole, respectively. These molecular markers could be useful when assigning breed identity or assessing the proportion of a certain breed in a given animal. Wiggans et al. (2010) reported that a set of 622 SNPs is being used to determine breed identity as part of the quality control process for the USDA genotype database for dairy cattle (~200 SNP analysis in sheep breeds 391  monomorphic SNPs for each breed). In pigs, Ramos et al. (2011) sought for breed-specific SNPs in five breeds (Duroc, Landrace, Large White, Pietrain and Wild Boar). In this case, a breed-specific marker was defined as a SNP for which one of the alleles was detected only in one breed (a fixed SNP). One hundred and ninety-three SNPs from the PorcineSNP60 beadchip were found to be breedspecific and then validated using 490 animals of the five breeds (Ramos et al., 2011). Four breed assignment validation tests were done and the results indicated that for all methods tested, 99% of the animals were correctly assigned. In an analogous manner, the fixed SNPs found in this study to be breed-specific for Corriedale, Merino and Creole sheep could be used for breed identification after validation in an independent group of animals.

Frequencies of polymorphic and very polymorphic SNPs
The percentage of polymorphic SNPs (MAF > 0.01) present in Merino and Corriedale breeds was 89.4% and 86%, respectively (Table 2). However, only 69% of the SNPs were polymorphic in Creole sheep. Highly polymorphic SNPs (MAF > 0.3) showed a similar trend to that polymorphic SNPs in the three breeds, with Corriedale and Merino animals having 50.1% and 50.9% of highly polymorphic SNPs, respectively; Creole sheep had a lower number of highly polymorphic SNPs than the other breeds (36%; Table 2).
Corriedale and Merino breeds generally had high levels of polymorphism. This result for Merino sheep was expected because this breed was included in the design/validation of OvineSNP50. Moreover, the results obtained for Merino and Corriedale (Table 2) were similar to those observed in the six breeds (Awassi, Merino, Poll Dorset, Romney, Scottish Blackface and Texel) sequenced during the development of OvineSNP50 (Gutiérrez-Gil et al., 2012). The 50 Merino animals that were used in the beadchip design showed 1.3% and 48.5% of rare and highly polymorphic SNPs, respectively. The percentages of highly polymorphic and rare SNPs in the other commercial breeds ranged from 37.8% to 47.1% and 3.2% to 13.9%, respectively (Gutiérrez-Gil et al., 2012), thus supporting the findings described here. Nevertheless, there were more rare SNPs in Uruguayan Merino than in the Merino subsample analyzed by Gutiérrez-Gil et al. (2012). Although not directly considered in the design of OvineSNP50, Corriedale sheep represent a composite breed that resulted from cross breeding between Merino and Lincoln (Mendoza, 1968). Analysis with OvineSNP50 has also revealed high levels of polymorphism in four Spanish breeds, with 95.6-98.5% of the SNPs being polymorphic and 46.5-50.6% being highly polymorphic (Gutiérrez-Gil et al., 2012). These values are slightly higher than those found in Merino and Corriedale, probably because the analysis included all SNPs rather than just autosomal SNPs.
The animals used in the present study were selected using the minimum average relatedness (AR) as a selection criterion, and this may have contributed to the high percentage of polymorphic markers by avoiding highly related and inbred animals. The AR may be a useful tool when designing GWAS in order to maximize the genomic information content and reduce the risk of spurious associations.
Creole sheep had significantly lower percentages of polymorphic and very polymorphic SNPs. The lower level of polymorphism in this case could be explained by the small size of the population to which these sheep belong (n < 200) and the small sample of Creole animals with genomic information. In addition, unknown events in the history of this Creole population, such as a severe population bottleneck, could also have contributed to a lower percentage of polymorphic SNPs.
In summary, the levels of polymorphism in the molecular markers screened with OvineSNP50 would be sufficient for GWAS and genomic selection in local commercial breeds and Creole sheep. Many GWAS in sheep based on a similar number of informative SNPs have been reported recently (Becker et al., 2010;Johnston et al., 2011;Zhao et al., 2011;White et al., 2012;Riggio et al., 2013;Våge et al., 2013). Johnston et al. (2011) conducted a GWAS in non-commercial Soay sheep using 36,000 SNPs from Ovine SNP50 and identified more SNPs in and around the Horn gene (RXFP2) that were useful in understanding the genetic basis of this trait. A total of 49,233 SNPs were used in a multi-breed GWAS to identify genomic regions associated with susceptibility to and control of ovine lentivirus (White et al., 2012), although the number of informative SNPs was variable among breeds (Rambouillet 50,275; Polypay 50,264; Columbia 44,258). Significant findings on the genetics of litter size in sheep were reported by Våge et al. (2013), who included 47,986 SNPs in their analysis (after allowing for 5% missing data per SNP and 1% MAF). These authors found a missense mutation in Growth Differentiation Factor 9 that was strongly associated with the number of lambs born in Norwegian White Sheep.

Population structure
A total of 18,181 autosomal SNPs were present in the three breeds and were informative for the analysis of population structure (Table 3). Figure 2A-C shows the results for population structure provided by STRUCTURE and PCA. Figure 2A shows the clusters provided by STRUCTURE based on Bayesian methods, assuming four hypothetical populations (K = 4) determined by the DK method (Figure 2B). For K = 4, the three breeds were clearly differentiated. Creole animals formed a central cluster (Figure 2A) where as Merino sheep were represented by one main cluster with low levels of admixture, and Corriedale individuals formed two clusters with high levels of admixture.
STRUCTURE methods allow the assignment of individuals to groups based on their genetic similarities, thereby providing information about the number of ancestral populations underlying the observed genetic diversity. Indeed, the representative clusters in Corriedale were also present in Merino with a low ancestry coefficient.A similar pattern was observed in Corriedale, which contained some features of the Merino cluster. The two clusters formed by Corriedale were suggestive of genetic substructuring that resulted from using Corridale animals belonging to divergent selection lines. Figure 2C shows the results obtained by PCA. The three breeds were clearly differentiated by the first two principal components that were sufficient to account for the observed population structure, with findings very similar to those provided by STRUCTURE (Figure 2A). Creole sheep were very tightly clustered, whereas Merino breed was located in a separate, looser cluster. PCA analysis also showed division of the Corriedale cluster into two subpopulations.
Genetic relatedness and substructure within populations should be taken into account in order to avoid spurious associations in GWAS (Zenger et al., 2007). The likelihood of false positive associations can be minimized by correction through population stratification (Lander and Schork, 1994) and the removal of outliers. The results of this study indicate that OvineSNP50 provides useful information for identifying population structures and outliers. Both PCA and STRUCTURE yielded two clusters for animals of the same breed that belonged to divergent selection lines. In addition, one animal that had been wrongly assigned to a breed was clearly detected by both methods using the selected informative SNPs.
Genetic differences in population structure and the variation detected in allele frequency could be explained by the fact that the Corriedale, Merino and Creole breeds have undergone different selection processes. In particular, the Corriedale flock has been selected for resistance to parasites, whilst the breeding programme of the Merino nucleus has focused on improving wool quality (Montossi et al., 2005). In contrast, Creole sheep have not been under artificial selection since their introduction to Uruguay in the XVIII century (Fernández, 2000). More Creole animals should be examined to confirm the findings reported here.
Moderate genetic differentiation was observed (F ST = 0.08) between the two commercial breeds. Low to moderate genetic similarity has also been reported for commercial cattle breeds (Delgado et al., 2011). On the other hand, the degree of differentiation was greater when Creole SNP analysis in sheep breeds 393   (Table 4). Thus, both analyses confirmed that Corriedale and Merino were more related to each other than to Creole (Table 4). This finding agrees with the origin of the Corriedale breed, which was derived by crossing Merino and Lincoln sheep 150 years ago (Mendoza et al., 1968). In contrast, Creole has not been under any defined selection process since its introduction to Uruguay in the 1700s (Camacho Vallejo et al., 2000;Fernández, 2000); this 300-year period with no artificial selection is more remote than the common origin between Merino and Corriedale. Analysis of F ST between Churra and Merino showed low differentiation (F ST = 0.023) when both breeds were compared (Gutiérrez-Gil et al., 2012), as shown here Merino was genetically more distant from Creole. Moradi et al. (2012) found low differentiation between sheep populations, with mean F ST values of 0.024 and 0.027 for the Zel-Lori Bakhtiari and HapMap data sets, respectively. These results agreed with a previous study in which 23 domestic breeds and two wild sheep subspecies were considered and the sheep breeds were found to have generally low differentiation (Kijas et al., 2009). Similar findings have also been reported for Creole cattle compared to commercial cattle (Delgado et al., 2011). Kijas et al. (2012) studied 74 breeds from different parts of the world and assessed the genotype of each animal using nearly 50,000 markers. Their results showed that sheep breeds have maintained high levels of genetic diversity and low genetic differentiation.
In conclusion, the high levels of OvineSNP50 polymorphism in Corriedale, Merino and Creole sheep indicated that the SNPs included in this chip could be used to characterize genetic diversity, identify breeds and detect population structure. This information could be useful for developing specific sets of SNPs for breed assignment (traceability) and parentage testing in commercial settings and for conservation programs. However, the results described here need to be validated with a representative number of animals to ensure accurate results prior to use for commercial or other applications. 394 Grasso et al.