Open-access Overview of the use of genomic data in animal breeding

Panorama do uso de dados genômicos no melhoramento genético animal

ABSTRACT:

The use of molecular information in breeding programs contributed to important advances in the improvement of traits of economic interest in livestock production. The advent of single nucleotide polymorphism (SNP) panels applied to genome-wide selection (GWS) and genome-wide association studies (GWAS), along with computational advances (e.g., use of powerful software and robust analyses) allowed a better understanding of the genetic architecture of farm animals and increased the selection efficiency. In this context, the statistic method single-step GBLUP has been frequently used to perform GWS, and more recently GWAS analyses, providing accurate predictions and QTL detection, respectively. Nevertheless, in developing countries, species such as sheep and goats, whose genomic data are more difficult to be obtained, the use of data simulation has been efficient in the study of the major factors involved in the selection process, such as size of training population, density of SNP chips, and genotyping strategies. The effects of these factors are directly associated with the prediction accuracy of genomic breeding values. In this review we showed important aspects of the use of genomics in the genetic improvement of production traits of animals, the main methods currently used for prediction and estimation of molecular marker effects, the importance of data simulation for validation of those methods, as well as the advantages, challenges and limitations of the use of GWS and GWAS in the current scenario of livestock production.

Key words:
genomic breeding values; GWAS; GWS; simulated data; ssGBLUP

RESUMO:

Em programas de melhoramento genético, o uso de informações moleculares garantiu importantes avanços para a melhoria de características de interesse econômico, no âmbito da produção animal. O advento da tecnologia de painéis de SNPs aplicados à seleção genômica ampla (GWS) e associação genômica ampla (GWAS), aliado ao avanço computacional, com o uso de softwares e análises robustas, permitiram melhor compreensão sobre a arquitetura genética dos animais de produção e, consequentemente, maior eficiência na seleção. Nesse contexto, o método estatístico single-step GBLUP tem sido utilizado, frequentemente, na execução da GWS e, mais recentemente, em GWAS, possibilitando predições acuradas e detecção de QTLs, respectivamente. No entanto, em países em desenvolvimento e, em espécies como os ovinos e caprinos, que existe maior dificuldade para a aquisição de dados genômicos, o uso da simulação de dados tem se mostrado eficiente para estudar os principais fatores envolvidos no processo de seleção, como o tamanho da população de treinamento, densidade de chipde SNPs e estratégias de genotipagem, cujos efeitos estão diretamente associados à acurácia da predição de valores genéticos genômicos. Nesta revisão, serão abordados pontos importantes sobre o uso da genômica no melhoramento genético de características produtivas em animais, principais métodos de predição e estimação de efeitos de marcadores moleculares na atualidade, a importância da simulação de dados para a validação desses métodos, bem como as vantagens, os desafios e as limitações no cenário atual da produção animal com o uso da seleção e associação genômica ampla.

Palavras-chave:
genéticos genômicos; GWAS; GWS; dados simulados; ssGBLUP

INTRODUCTION:

Information from molecular markers has been used in breeding programs for different species along with phenotypic and pedigree information (SIMIANER, 2016). The advent of single nucleotide polymorphism (SNPs) panels and the development of statistical methods for the inclusion of SNP data in genetic evaluations, resulted in the implementation of genomic wide selection (GWS), proposed by MEUWISSEN et al. (2001). In addition to the application in GWS, SNP panels have made it possible to detect QTL (quantitative trait loci) and to better understand the genetic architecture of traits of economic importance through genome-wide association studies (GWAS) (DEKKERS, 2012).

GWS involves estimating the effect of all SNPs simultaneously, considering their effects as random (MEUWISSEN et al., 2016), such that QTLs are in linkage disequilibrium (LD) with at least one SNP. The methods developed for GWS differ, essentially, in the assumptions made about the distribution of marker effects (DEKKERS, 2012).

Most of the methods available for genomic predictions involve performing multiple steps and are, therefore, known as multi-step methods, among which methods Bayesian and genomic BLUP (GBLUP) stand out (LEGARRA et al., 2014). Recently, the single-step or ssGBLUP method (LEGARRA et al., 2009; MISZTAL et al., 2009; AGUILAR et al., 2010; CHRISTENSEN & LUND, 2010) has been also used to estimate genomic breeding values for non-genotyped animals for both GWS and GWAS.

GWAS seeks to understand the genetic structure of individuals based on the expression of genes that are associated with productive traits. Its principle is based on the variations present in the genomes of genetically related individuals, also known as molecular markers, with SNPs being the most used for their efficiency in identifying QTLs that affect phenotype (WANG et al., 2012; LI et al., 2017). The analyses performed in GWAS seek the association of quantitative and qualitative trait loci associated with complex and polygenic traits (MEUWISSEN et al., 2001). The knowledge of genes associated with traits of interest has provided great advances in animal production. In this literature review, we discussed the use of genomics in the genetic improvement of productive traits in animals, the main methods of prediction and estimation of effects of molecular markers nowadays, the importance of data simulation for the robustness of these methods, advantages, challenges and limitations in the current scenario of animal production with the use of selection and genome-wide association.

Overview of the use of genomics in animal production

The efficiency of including genomic information for genetic progress of economically important traits has already been demonstrated in studies of traits directly related to increased production or improved meat quality in beef cattle (MEHRBAN et al., 2019), pigs (SONG et al., 2019) and beef sheep (BRITO et al., 2017), for example. However, genomic selection has resulted in greater benefits in dairy cattle, because this category presents the best structured production chain and the largest amount of animals (from several countries) in the reference populations (MEUWISSEN et al., 2016).

In the several results available in scientific literature and in real situations of breeding programs, the efficiency of genomic selection has been demonstrated by providing increased accuracy of prediction of breeding value when compared to traditional selection methods (MEUWISSEN et al., 2016), thus allowing to identify and select improving animals with greater accuracy in order to increase productivity in production systems, especially in traits of economic interest, such as litter size in pigs (FORNI et al., 2011), body weight and breast yield in chickens (CHEN et al., 2011), hot carcass weight in sheep (DAETWYLER et al., 2012), and carcass traits and yearly weight in cattle (MEHRBAN et al., 2019).

Other advantages of GWS concern the reduction of the generation interval, reduction in costs related to the maintenance of the animal in the herd and progeny testing, due to the possibility of assessing genomic merit in an accurate way in young animals, since the collection of early information in the animal allows to define new breeding strategies, aiming to boost genetic gain, to thus, speed up the selection process and improve the reliability of the estimates (IBTISHAM et al., 2017). However, it is important to note that strong selection can lead to reduced genetic variance in the population and increased undesirable genetic correlations of productive traits, potentially reducing gains in accuracy over time (HIDALGO et al., 2020).

In general, the genomic selection presents greater advantage for characteristics that are difficult to measure, such as those related to carcass and meat quality. Therefore, the inclusion of information resulting from molecular markers in genetic evaluation can contribute significantly to the genetic improvement of these traits (NAVAJAS, 2014). Several breeding programs for beef cattle (MEUWISSEN et al., 2016; SARMENTO & SENA, 2017), beef sheep (BRITO et al., 2017), pigs (TOPIGS NORSVIN, 2017), and broilers (WOLC et al., 2016), for example, already include genomic information in routine genetic evaluation for carcass and/or meat quality traits.

Within the scope of results coming from genome-wide association studies, there are numerous reports of genes or genomic regions associated with traits directly linked to carcass yield and meat quality, in different species, which can be seen on the AnimalQTLdB platform (https://www.animalgenome.org), for example (HU et al., 2019). Beef cattle, have one of the largest numbers of GWAS results for carcass and meat quality traits (SHARMA et al., 2015). Some examples of candidate genes for these traits have been identified in different species using GWAS, such as the genes GNPDA2 (HAY & ROBERTS, 2018), PLTP, TNNC2 and GPAT2 (SILVA et al., 2019) related to loin eye area in beef cattle; NDUFAF2 for fat thickness, and ACACA for muscle depth in beef sheep (HERNANDEZ et al., 2018) as well as PRKCA and SMN1 for carcass weight in pigs (IQBAL et al., 2015).

In GWAS and GWS, the success for estimation of SNPs effects and prediction of accuracy depends, among other factors, on linkage disequilibrium between markers and QTLs due to the lack of knowledge, usually, of causal mutations responsible for phenotypic variance. Thus, the level of LD between the marker and mutation can capture the variance of the marker (MEUWISSEN et al., 2016). Other factors such as effective population size, heritability of the trait, genetic structure of the population, and number of phenotyped and genotyped animals also affect the power of QTLs detection and prediction accuracy (VAN DEN BERG et al., 2013).

Limitations and challenges for the application of genomics in animal production

Although, the benefits of the application of genomics are already well documented in the literature, there are some aspects that may compromise the feasibility of using genomic information for the purposes previously reported (selection and association). According to SHARMA et al. (2015), some of the following measures may be useful to avoid problems with results from genomic studies: detailed definition of the study before it is conducted (choice of animals, statistical methodology, and genotyping panel, for example); and adequate quality control of the data to be used in genomic analyses. However, most of the factors are structural and/or financial, so they are not under the control of the researchers.

The main problem for implementing GWS is the need to have large training populations to obtain prediction equations with the ability to relate SNPs with phenotypic information more accurately (BLASCO & TORO, 2014). This is because, except for dairy cattle and pigs, in most situations related to other farm animal species, population herds are much smaller, thus, there is a lack of consistent phenotype collection and animal control, as well as major limitations for investments in genotyping in the production chain (BLASCO & TORO, 2014; SHARMA et al., 2015; MEUWISSEN et al., 2016), especially in underdeveloped countries (MRODE et al., 2018, 2019). In species such as goats and sheep, whose animals have low economic value if compared to cattle, for example, genotyping costs present the greatest relevance in deciding the feasibility of implementing genomic selection in breeding programs (BLASCO & TORO, 2014; RUPP et al., 2016).

Regarding the feasibility of genomic association studies, the same limiting factors reported above are valid. According to SHARMA et al. (2015), the main challenges for GWAS include the proper choice of a homogeneous population and accounting for existing population stratification. An interesting alternative to circumvent some limitations of using real data to verify the feasibility of applying genomics is to use simulated data analysis, as simulations only require computational resources and proper definitions of statistical methods (DAETWYLER et al., 2013).

Genomic studies with simulated data

The use of data simulation for genomic studies has proven to be an important alternative for advances in animal breeding, especially regarding the limitations of GWS and GWAS, such as the high cost to obtain and incorporate genomic information through SNPs chips in research and breeding programs; in this scope, the use of statistical methods for matching and detection of QTLs can also be leveraged with data simulation (HICKEY & GORJANC, 2012).

Several studies have incorporated simulated information into their research strategies. PÉRTILE et al. (2016) conducted GWAS and GWS analyses using simulated data similar to Santa Inês sheep under different study scenarios. The authors concluded that relationship information improved the prediction of genomic breeding values and higher heritability estimates favored the identification of regions associated with traits of interest.

TAKEDA et al. (2020), in a simulated study with genotyped bulls and a base population of cattle to investigate the detection power of QTLs using ssGWAS, reported that the detection of QTLs increased with the increasing number of progeny information from these bulls at the expense of heritability or the amount of QTLs used. A comparison between one-step and two-step GBLUP methods in beef cattle populations was performed by PICCOLI et al. (2018), who observed that direct genomic values and GEBVs predicted by the tested GBLUP procedures showed very similar predictions, with no significant bias in GEBVs accuracies, regardless of the amount of steps.

ARAUJO et al. (2021) used simulated data similar to genetically diverse sheep populations to verify the accuracy and bias of genomic predictions using ssGBLUP with individual SNPs and haplotype analyses, different panel densities, and haplotype block construction methods. The authors concluded that haplotype-based models, which could better capture the LD between SNPs and QTLs, did not improve GEBV values.

Thus, the use of simulated data has often been used by researchers because it enables replicable tests based on real scenarios for hypothesis testing at low cost, and long-term effects of selection that are not feasible using real data, can be investigated (DAETWYLER et al., 2013).

ssGBLUP

Multi-step procedures, where only animals with genotypes are included in the model (VANRADEN, 2008), can be relatively complex and involve double counting of genomic information, when genotypes of parents and progeny are included in the analysis (MISZTAL et al., 2020). LEGARRA et al. (2014) showed that during these multiple steps, biases can be generated, which makes their effectiveness compromised for practical decisions coming from genomic studies, via genomic selection.

Since their proposition, single-step methods, which incorporate genotype, phenotype, and pedigree information in the same analysis, have made the process of genomic evaluation simpler and have enabled the extension of genomic information to non-genotyped animals. Thus, as there is usually a small portion of genotypes in genetic evaluations, the combination of pedigree and genomic relationship has emerged as an alternative for these scenarios (LORENCO et al., 2020), possible through a joint distribution of breeding values of genotyped or non-genotyped animals (LEGARRA et al., 2009).

The development of a hybrid matrix, the H matrix, which combines the pedigree-based relationship matrix (A), considered as a priori information, with the genomic relationship matrix (G), as an observed relationship, has enabled the information from SNPs to be extrapolated to non-genotyped animals, as a projection of genetic merit. This matrix is complex and requires G to be invertible, with its simpler inverse, the so-called H-1 matrix (AGUILAR et al., 2010; CHRISTENSEN & LUND, 2010). The compatibility of pedigree and genomic information is important to avoid biases and increase accuracy in evaluations via ssGBLUP, especially under strong selection (VITEZICA et al., 2011).

Thus, the method called single-step genomic BLUP, ssGBLUP (LEGARRA et al., 2009; MISZTAL et al., 2009; AGUILAR et al., 2010; CHRISTENSEN & LUND, 2010), has become consolidated in genomic studies of several species due to its simplicity and accuracy, especially for traits that are not related to milk production (MISZTAL et al., 2020).

The inverse of the H matrix is usually calculated according to the proposition of AGUILAR et al. (2010) and CHRISTENSEN & LUND (2010), as:

H-1 = A-1 + 000G-1-A22-1

whereA22-1is the inverse of the pedigree-based relationship matrix for the genotyped animals.

Some advantages of the ssGBLUP method, besides its simplicity, easy implementation, and higher accuracy than other methods, such as multistep methods (LOURENCO et al., 2014), already presented here, include the simultaneous fit of genomic information and estimates of other effects in the model (LEGARRA et al., 2014), which allows the understanding of the pre-selection bias and this helps to avoid loss of information (LOURENCO et al., 2014). Complex and multi-trait information are also allowed to be included in the model (WANG et al., 2014). Following the assumptions of the method in question is crucial for the genomic study design to present simplicity in its application (LOURENCO et al., 2020).

Even presenting simplicities, genomic studies using ssGBLUP have challenges. The definition of validation ways unaffected by selection and the reduction that it causes in genetic variances throughout the selection process, as well as estimates that elucidate all genomic information used in selection can be cited (MISZTAL et al., 2020). Furthermore, this method has limitations regarding the small amount of genotyped animals included in the analysis, as well as for traits with larger effect QTLs, the latter of which may not be truthful information, since the method assumes equality in the proportions of genetic variances explained by the markers (WANG et al., 2014).

In this context, WANG et al. (2012) proposed the inclusion of weights for the estimated effects of the markers according to the explained genetic variance, called Weighted-ssGBLUP or weighted single-step genomic BLUP. This method estimates different variances for SNPs and this enables better accuracy of estimates in studies with small numbers of phenotypes and genotypes, as well as for traits affected by large effect QTLs (WANG et al., 2012). Five methods of including weights in WssGBLUP were tested by ZHANG et al. (2016), showing improved accuracies of GEBVs.

The ssGBLUP can also be used for estimating the effects of SNPs, detection of QTLs and genes associated with traits of economic importance through single-step GWAS or ssGWAS (WANG et al., 2012), by linkage disequilibrium between SNPs or possible causal mutations of relevant traits. Weighting the effects of SNPs can also be applied to this method, with WssGWAS being an iterative approach that promotes increase in the weights of SNPs with larger effects and reduce those with small effects to the mean (WANG et al., 2014).

ZHANG et al. (2016) also cited that the use of weighted genomic windows can detect unknown QTLs when analyses show few genotypes. The same authors also stated that the use of weights for SNPs is important especially when the study presents traits affected by few QTLs, thus the traditional ssGBLUP is efficient for most polygenic traits of interest in breeding.

Approaches for QTL detection in genome-wide association studies

The identification of SNPs significantly associated with important traits for animal breeding in several species is done through classical hypothesis tests, such as EMMA (efficient mixed-model association), by calculating p-values, the proportion of genetic variance explained by SNPs or using fixed or moving windows, or blocks of SNPs, formed based on linkage disequilibrium, for example (CHEN et al., 2017; WANG et al., 2014). However, many of these findings may be false-positive or false-negative associations, which are considered spurious associations (LI et al., 2021). In addition, the possibility that the results of associations are casual, via p-values or probabilities, is generally not tested and this can pose a problem (AGUILAR et al., 2019). Therefore, GWAS results should be thoroughly analyzed before assuming a possible association as a causal or significant effect (FRAGOMENI et al., 2014).

In GWAS studies, it is commonly used, as a way to observe the solution of null hypotheses, to declare a significance threshold that explains the genetic variance of adjacent markers. In addition, one can use schemes through iterations and by choosing windows between markers, which works well, but in an arbitrary way and consequently more complex to interpret and compare across studies (AGUILAR et al., 2019).

SNPs within the same segment can be highly correlated, making it difficult to identify the individual effects of each marker, as they may be acting jointly on the trait. The use of genomic windows to test the significance detected in GWAS has emerged as an alternative (LI et al., 2021). In this approach, for example, the detected effect may be constant in the population or generations or, otherwise, it cannot be extrapolated to external populations or samples because it is only true for that sample, thus being considered unreliable results and thus should be tried in populations with similar structures (FRAGOMENI et al., 2014).

Determining the correct p-value threshold for the significance of genomic associations represents one of the main options to minimize the detection of false positives and false negatives in GWAS analyses. To determine the threshold of statistical significance in GWAS, several methods and measures have been used, among which the Bonferroni correction and the false discovery rate (FDR) stand out; the latter measure controlling for the expected proportion of false positives among rejected null hypotheses and is less conservative than the Bonferroni correction (KALER & PURCELL, 2019).

AGUILAR et al. (2019) proposed a frequentist p-value methodology implemented in the ssGWAS method framework that can be used in studies with complex features or models, as well as for large population sizes. Obtaining marker estimates and genomic breeding values (with associated p-values) can be accomplished through the inverse of mixed model equations and the marker p-value is obtained in a single GWAS step.

In general, the variables tested in a given study depend on factors specific to each population evaluated, such as the pattern of linkage disequilibrium and frequency of the least frequent allele. Therefore, the appropriate threshold for significance in GWAS may vary for different populations and species (KALER & PURCELL, 2019).

CONCLUSION:

It is undeniable the contribution of genomics regarding the advances in animal production through animal breeding. The use of ssGBLUP and ssGWAS, traditional or with weightings, contribute to genetic progress when providing results on the reduction of the generation interval and increased accuracy of genomic predictions, as well as the detection of QTLs and causal variants, when compared to traditional methodologies. Thus, the knowledge of the assumptions and limitations of the methods is important to obtain accurate results.

Considering the limitations about the use of genotyping technology through SNP chips and the interference of population sizes in selection and genome-wide association studies, data simulation brought new perspectives and possibilities for conducting reliable and replicable studies, based on real scenarios, allowing the study of the effect of the selection process in the short and long term, that is not feasible with real data. Thus, the knowledge of molecular information and its use for the understanding of the genetic structure of farm animals has been of great value to increase the productivity of economically important traits and also for the genetic control of complex traits, subject to the effects of evolutionary forces, such as those related to carcass and meat quality in cattle, sheep and goats.

ACKNOWLEDGEMENTS

The authors thank the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001, for the scholarship granted to the first author, and the support and incentive from Professor José ElivaltoGuimarãesCampelo and the Universidade Federal do Piauí (UFPI).

REFERENCES

  • CR-2022-0350.R1

Edited by

Publication Dates

  • Publication in this collection
    20 Mar 2023
  • Date of issue
    2023

History

  • Received
    17 June 2022
  • Accepted
    15 Nov 2022
  • Reviewed
    30 Jan 2023
location_on
Universidade Federal de Santa Maria Universidade Federal de Santa Maria, Centro de Ciências Rurais , 97105-900 Santa Maria RS Brazil , Tel.: +55 55 3220-8698 , Fax: +55 55 3220-8695 - Santa Maria - RS - Brazil
E-mail: cienciarural@mail.ufsm.br
rss_feed Stay informed of issues for this journal through your RSS reader
Accessibility / Report Error