Precise breed identification is a key step in genetic and genomic studies as accurate breed assignment can improve accuracy of the genomic breeding value estimation, especially when mixed-breed populations are used for developing or applying prediction equations (Kachman et al., 2013; Vandenplas et al., 2016). Moreover, many examples of protected denomination of origin (PDO) and protected geographical indications (PGI) for animal-derived products are directly associated with specific breeds (Dimauro et al., 2015; Mateus & Russo-Almeida, 2015), and proper certification is therefore dependent on the correct identification of livestock breed. Issuing of PDO and PGI certifications, associated with robust methods to monitor marketed animal products have contributed to prevent breed extinctions, mainly in Europe (Di Stasio et al., 2017).
Most Brazilian sheep breeds are considered local genetic resources which are currently facing the challenges associated with uncontrolled crossbreeding (McManus et al., 2010). Hair sheep breeds (as Morada Nova and Santa Inês) are found mainly in the Northeastern Brazil, that is characterized by high heat-stress challenges and is associated with lower-productivity indices. Wool sheep (as Crioula) are reared mainly in the Southern part of the country (McManus et al., 2014). Both regions have great potential for development of PDO and PGI products and depend on inexpensive and accurate methods for breed certification.
As individual animals have low-overall values, and sheep farming in Brazil is performed by small and low-income farmers, the use of low-density SNP panels for breed-assignment to lower-genotyping costs is highly appealing. Therefore, a key goal is the identification of a subset of SNPs (up to 96) that can be used for accurate breed assignment.
Vieira et al. (2015) used information generated with the Ovine SNP50 BeadChip (Illumina Inc., San Diego, CA, USA) to identify a subset of SNPs to differentiate between Crioula, Morada Nova, and Santa Inês. These authors applied three different prediction methods (least absolute shrinkage and selection operator - Lasso, Random Forest, and boosting prediction methods) to select a minimum number of SNP markers for sheep breed identification. They were able to define a set of 18 SNPs able to distinguish samples between these three breeds. However, Vieira et al. (2015) had used a reduced sampling of genotypes from only 72 animals (23 Crioula, 22 Morada Nova, and 27 Santa Inês), whose validation with an independent dataset remains necessary.
The objective of this work was to verify the usefulness of this subset of SNPs previously reported for breed identification of Crioula (BC), Morada Nova (MN), and Santa Inês (SI) sheep.
Samples from 19 BC, 308 MN, and 261 SI animals were genotyped with Ovine SNP50 BeadChip (Illumina Inc., San Diego, CA, USA). The full set of genotypes was used to calculate the genomic relationship matrix for each breed, normalized by an individual marker (GCTA method) (Yang et al., 2011). The average relationship between the animals used by Vieira et al. (2015) (reference population) and the animals evaluated in the present study (validation population) was calculated. The results showed a low relationship between animals from the two datasets (Crioula, 0.029±0.132 (mean±standard deviation); Morada Nova, 0.012±0.049; and Santa Inês, 0.008±0.053).
The eighteen SNPs selected by Vieira et al. (2015) were extracted from the dataset, and minor allele frequencies (MAF) were determined for each breed (Table 1). As the minor allele can be different from one breed to another, and can differ between the two datasets, contrasts were performed between breeds and studies. Only one SNP in Santa Inês (s32131) and one in Morada Nova (s69653) differed in minor allele between the reference population (Vieira et al., 2015) and the validation population used in the present study.
Table 1. Minor allele frequency estimates for each SNP marker used in the analyses, for each breed (Crioula, Morada Nova, and Santa Inês), and the datasets “Reference”, according to Vieira et al. (2015), and “Validation”, from present study.
Marker | Chromosome | Crioula | Morada Nova | Santa Inês | |||||||||
Reference | Validation | Reference | Validation | Reference | Validation | ||||||||
Minor | MAF | Minor | MAF | Minor | MAF | Minor | MAF | Minor | MAF | Minor | MAF | ||
s03528.1 | 1 | A | 0.435 | A | 0.425 | A | 0.227 | A | 0.460 | G | 0.074 | G | 0.188 |
OAR1_194627962.1 | 1 | ? | 0.000 | G | 0.025 | A | 0.273 | A | 0.387 | G | 0.043 | G | 0.106 |
OAR2_55853730.1 | 2 | C | 0.152 | C | 0.353 | ? | 0.000 | A | 0.047 | A | 0.106 | A | 0.111 |
s20468.1 | 2 | A | 0.152 | A | 0.225 | ? | 0.000 | A | 0.032 | G | 0.277 | G | 0.194 |
OAR3_164788310.1 | 3 | G | 0.217 | G | 0.275 | G | 0.182 | G | 0.268 | A | 0.149 | A | 0.278 |
s16949.1 | 3 | G | 0.152 | G | 0.200 | G | 0.182 | G | 0.252 | A | 0.160 | A | 0.295 |
s69653.1 | 3 | G | 0.087 | G | 0.150 | G | 0.364 | A | 0.311 | A | 0.106 | A | 0.175 |
OAR3_165050963.1 | 3 | A | 0.022 | A | 0.100 | A | 0.068 | A | 0.055 | G | 0.202 | G | 0.322 |
s32131.1 | 4 | A | 0.326 | A | 0.316 | G | 0.023 | G | 0.269 | G | 0.500 | A | 0.381 |
s06182.1 | 5 | A | 0.152 | A | 0.150 | G | 0.068 | G | 0.211 | A | 0.415 | A | 0.423 |
OAR15_45152619.1 | 15 | A | 0.239 | A | 0.300 | G | 0.023 | G | 0.029 | G | 0.053 | G | 0.056 |
s30024.1 | 25 | A | 0.087 | A | 0.100 | C | 0.023 | C | 0.144 | C | 0.277 | C | 0.257 |
s61697.1 | X | C | 0.065 | C | 0.100 | C | 0.045 | C | 0.097 | A | 0.319 | A | 0.222 |
OARX_29830880.1 | X | G | 0.196 | G | 0.200 | ? | 0.000 | A | 0.026 | A | 0.074 | A | 0.157 |
OARX_53305527.1 | X | ? | 0.000 | A | 0.079 | A | 0.091 | A | 0.021 | G | 0.277 | G | 0.226 |
s56924.1 | X | G | 0.022 | G | 0.075 | A | 0.136 | A | 0.197 | A | 0.160 | A | 0.103 |
OARX_78903642.1 | X | G | 0.043 | G | 0.200 | A | 0.068 | A | 0.013 | A | 0.096 | A | 0.098 |
OARX_121724022.1 | X | A | 0.022 | A | 0.150 | C | 0.023 | C | 0.008 | C | 0.085 | C | 0.151 |
Minor: minor allele in each breed and dataset. MAF: minor allele frequency. The two changes in minor alleles observed between the two datasets is highlighted in gray.
The Structure software version 2.3.4 (Pritchard et al., 2000) was used to estimate individual allocation probabilities in each of the three breeds. The definition of clusters was based on the admixture model and assumption that allele frequencies were correlated between breeds. Run parameters were as follows: 588 individuals; 18 loci, without a priori information of populations; length of burn-in period of 10,000; and 200,000 repetitions after burn-in for Markov Chain Monte Carlo (MCMC). The number of clusters (K) was set to 2, 3, 4, and 5, with five runs for each cluster. Following the method of Evanno et al. (2005), the best K was 3, which agrees with breeds in the data, and shows that this extremely small panel is able to identify this structure in the samples. Thereafter, we used the results for K=3 to evaluate the correct classification rate.
The percentage of individuals classified in each cluster was determined by the estimated proportion of the association of each individual genotype to each of the clusters. Tests of individual allocation were performed with and without a priori information about the source population of individuals, yielding similar outcomes. Therefore, results without a priori information were used, as they represent a real situation of breed assignment analyses more properly, since there is no previous knowledge or information about the sample.
Accurate breed assignments (confidence >90%) were observed in 89, 86, and 75% of BC, MN, and SI animals, respectively. Mean cluster allocation values ranged from 90.9 to 93.7% (Table 2). SI has been previously shown to have been formed by crossbreeding of MN, Bergamasca, and Somalis (McManus et al., 2010). MN and SI animals were observed to have some degree of admixture and estimated fixation index (Fst) of 6.59% (Genome-wide..., 2012). Therefore, some allocation errors between MN and SI were expected. Nonetheless, high levels of correct breed allocation (>90%) were observed.
Table 2. Mean cluster allocation of Crioula (BC), Morada Nova (MN), and Santa Inês (SI) sheep obtained with the Structure analysis of data from 18 SNP markers.
Population | Inferred cluster | Number of individuals | ||
1 | 2 | 3 | ||
BC | 0.929 | 0.012 | 0.059 | 19 |
MN | 0.021 | 0.937 | 0.041 | 308 |
SI | 0.034 | 0.056 | 0.909 | 261 |
The results obtained here using 18 SNPs were less accurate than those of previous studies, most likely because of the higher-information content of microsatellite markers compared to SNPs, and the great difference in number of SNPs used. SNPS for parentage... (2014) identified a set of 163 SNPs for accurate parentage testing and traceability, in many of the world’s main sheep breeds. Mateus & Russo-Almeida (2015) identified 12 microsatellite markers able to correctly classify animals into their respective breeds, while Di Stasio et al. (2017) used 15 microsatellite markers for breed certification in Italian sheep breeds. Other studies (Bertolini et al., 2015; Dimauro et al., 2015) showed that at minimum of 100 SNPs are required for correct and accurate breed assignment of cattle and sheep breeds.
The 18 SNP panel tested showed 90% correct assignment of the studied breeds. Incorrect assignments ranged between 6 to 9% of the animals (Table 2). Ideally, a system for breed certification requires a correct allocation close to 100% with minimal incorrect assignment. The SNP panel tested showed high levels of correct assignment; however, the obtained results are not enough for its widespread use for breed certification.
The construction and validation of a larger panel with additional SNPs could provide higher correct assignment rates (close to 100%) for other major sheep breeds reared in Brazil, which may contribute to breed identification and certification procedures. Thereupon, this tool could be incorporated in routine inspection services and ongoing genetic improvement and conservation activities.