Introduction
In Brazil, the pork market stands out, with a production of approximately 3.6 million tons of meat in 2015 (^{ABPA, 2015}). This total represents about 3% of the world production, classifying the country as the fourth largest producer of this type of meat (^{ABPA, 2015}). According to Ministério da Agricultura, Pecuária e Abastecimento (^{Brasil, 2014}), the investments in research and in the genetic evolution of the species over the last 20 years have been responsible for the great increase in meat production and quality, with emphasis on the importance of breeding programs.
Among the several traits evaluated in pig breeding programs, those associated with carcass are important for pig production, because they are requirements of the consumer market (^{Zangeronimo et al., 2009}), which makes them frequently considered as selection criteria. However, the distribution of many of these traits presents asymmetric behavior; in this case, the usual methods of genomic selection, based on conditional expectations, E(Y|X), besides making it impossible to predict all distributions of phenotypic values, may under- or overestimate the marker effects. This occurs because, in these situations, the mean may not be the best measure to represent data distribution and, consequently, the generated estimates may lead to mistakes in the selection of animals.
Although innumerable statistical methods for genomic selection - such as random regression of the best linear unbiased prediction (RR-Blup), Bayesian lasso (Blasso), Bayesian alphabet (Bayes A, Bayes B...), partial least squares, Kernel-based regression, among others - deal with multicollinearity and dimensionality issues, the problem of asymmetry in the distribution of phenotypic values is not commonly considered.
In this sense, ^{Nascimento et al. (2017)}, in a simulation study, proposed the use of regularized quantile regression (RQR) in genomic selection. According to the authors, this method, besides overcoming the problems of multicollinearity and dimensionality, also takes into account the possibility of asymmetry in the distribution of the evaluated phenotype. In addition, it increases the accuracy of marker effects and, consequently, the genomic estimated breeding values for traits whose distributions show asymmetry. Such traits are commonly found in plant and animal breeding, such as, for example, number of days to flowering (^{Tuberosa, 2012}) and hormone concentrations (^{Mathur et al., 2012}).
In a genomic association and prediction analysis, ^{Barroso et al. (2017)} used RQR to estimate single nucleotide polymorphism (SNP) marker effects, considering as traits pig growth curve parameters. The authors identified the genomic regions that showed the most relevant marker effects and also estimated the genetic individual growth trajectory of the animals over time (genomic growth curves) under different quantiles.
RQR allows a more complete study of the relationships between the response (phenotype) and the explanatory (markers) variables. This is possible because, unlike traditional methods based on conditional expectations, it is based on conditional quantiles, which allows assessing these relationships for different phenotype levels and not only regarding their mean value.
The objective of this work was to evaluate the use of regularized quantile regression (RQR) to predict the genetic merit of pigs for asymmetric carcass traits, compared with the Bayesian lasso (Blasso) method.
Materials and Methods
The phenotypic data used in the present study were obtained from the pig breeding farm of the Department of Animal Science of Universidade Federal de Viçosa (UFV) and refer to the period from November 1998 to July 2001. These data came from an F_{2} population composed of 345 individuals, generated by crossing animals of the Piau breed with those of a commercial breed (Large White x Landrace and Pietrain).
DNA was extracted at the animal biotechnology laboratory of the Department of Animal Science of UFV, and the procedures used are described in ^{Peixoto et al. (2006)}. The genotyping of 384 SNPs was performed at the animal genetics laboratory of Embrapa Recursos Genéticos e Biotecnologia, by applying the GoldenGate assay with the VeraCode technology, using the BeadXpress reader from Illumina, as in ^{Hidalgo et al. (2013)}.
The SNPs used for fine mapping were selected according to their spacing between the chromosomes where quantitative trait loci (QTLs) had previously been detected in the population and were distributed over Sus scrofa chromosomes as follows: 85 on SSC1, 71 on SSC4, 84 on SSC7, 42 on SSC8, 36 on SSC17, and 66 on SSCX. From these 384 SNPs, 66 were discarded due to the absence of amplification, and, of the remaining 318 SNPs, 81 were discarded because they presented minor allele frequency lower than 0.05. After these procedures, the SNP markers were distributed as follows: 56 on SSC1, 54 on SSC4, 59 on SSC7, 31 on SSC8, 25 on SSC17, and 12 on SSCX, which corresponds to a total of 237 markers identified in the population (^{Hidalgo et al., 2013}).
The traits carcass yield (CY), bacon thickness (BT), and backfat thickness immediately after the last rib in the dorsal-lumbar line (LRBF) were analyzed. The phenotypic values were corrected for fixed effects of sex, batch, and halothane genotype. In order to evaluate the asymmetry of the distribution of the studied traits, the D’Agostino-Pearson test was used (^{D’Agostino & Pearson, 1973}).
The general model adopted for the prediction of genomic breeding values was proposed by ^{Meuwissen et al. (2001)} and is described as:
where y_{i} is the phenotypic value of the i^{th} animal; μ is the mean of the trait; g_{j} is the j^{th} SNP effect (j=1,2,...,237); x_{ij} are the elements of the incidence matrix of each marker j with 0.1 and 2.0 parameterization; and e_{i} is the vector of random errors, e_{i} ~ N(0,σ^{2}_{e}).
To obtain the genomic estimated breeding values (GEBVs) of the animals, both RQR (^{Li & Zhu, 2008}) considering different quantiles (τ = 0.05 to 0.95) and the Bayesian lasso (Blasso) method (^{de los Campos et al., 2009}) were applied.
With the use of RQR, the estimation of the coefficients in the different quantiles of interest consists in solving the following optimization problem:
where σ^{p}_{j=1} |g_{j}| is the sum of the absolute values of the regression coefficients; λ is the smoothing parameter that controls regularization strength, with n = 345, p = 237; and ρ_{τ} ( ) is the denoted check function (^{Koenker & Bassett Jr., 1978}), defined by:
otherwise τ indicates the quantile of interest.
After estimating the parameters (marker effects), the GEBVs were obtained through the following expression:
where GEBV_{i} (τ) is the genomic estimated breeding value for the i^{th} animal; and ĝ_{j} (τ) is the effect of the j^{th} SNP marker, defined by the functional relationship obtained for 100τ% quantile of interest. The GEBVs were also determined via Blasso by:
To compare the methods, for RQR fit, the values of the shrinkage parameter (λ) estimated from the Blasso method, as well as a grid with values varying from 0 to those provided by Blasso, with an interval of 0.5, were considered.
After obtaining the GEBVs, the studied methods were compared considering accuracy, which was calculated according to the following expression:
where r_{y,ŷ}, is the predictive capacity of the model, given by
and h^{2} is the heritability trait estimated by the restricted maximum likelihood (REML) method using a single-trait model, defined by h^{2} = v_{g}/v_{f}, where v_{g} is the additive genetic variance and v_{f} is the phenotypic variance (^{Resende et al., 2010}).
To check the degree of agreement between the methods for animal classification, Spearman’s correlation coefficient between GEBVs from the RQR and Blasso fitting models was calculated, as well as the percentage of agreement among 10% of the individuals that presented the highest genomic breeding values predicted by RQR and Blasso, using Cohen’s kappa coefficient.
Finally, Manhattan plots were constructed to verify the behavior of SNP marker effects on chromosomes in both methods.
The entire described process was performed with the use of cross-validation. As in ^{Teixeira et al. (2016)}, the pig population was divided into three distinct populations: two were used to estimate the marker effects and one to validate the estimates obtained from the prediction equations. The three possible combinations for this situation were used, and the final result refers to the mean values of accuracy.
The presence of asymmetry was evaluated through the dagoTest function from the fBasics package of the R software (^{R Core Team, 2017}). Manhattan plots were constructed using the mhtp function from the gap package (^{Zhao, 2007}). To estimate the parameters using the quantile regression, the rq function from the quantreg package was applied (^{Koenker, 2016}). The Bayesian model was fitted using the bglr function with 100,000 iterations, 20,000 burn-ins, and thin assuming the value 10, in the BGLR package (^{de los Campos & Rodriguez, 2014}). All functions were implemented in the R software (^{R Core Team, 2017}).
Results and Discussion
The studied traits presented distribution with asymmetric behavior at 5% probability (Table 1). In this case, the median (τ = 0.5) or another quantile (T) is possibly more suitable to explain the functional relationship between markers and the assessed variables, since the mean is not the best measure to represent asymmetric distributions (^{Hao & Naiman, 2007}).
Variable^{(1)} | Asymmetry | Kurtosis | Omnibus test^{(2)} |
CY | -15.82*** | 12.10*** | 396.75*** |
BT | 3.48*** | 1.18^{ns} | 14.29*** |
LRBF | 2.58*** | 2.35** | 12.19*** |
^{(1)}CY, carcass yield; BT, bacon thickness; and LRBF, backfat thickness immediately after the last rib in the dorsal-lumbar line. ^{(2)}Normality test. *, **, and ***Significant at 10, 5, and 1% probability, respectively. ^{ns}Nonsignificant.
The heritabilities of the evaluated traits, estimated via REML, were of low to moderate magnitude, with values equal to 0.20 for CY, 0.34 for BT, and 0.35 for LRBF. These values were similar to those found by ^{Azevedo et al. (2013)} for CY and by ^{Miar et al. (2014)} for BT and LRBF; the two last traits showed heritabilities of 0.45 and 0.38, respectively, considering a dataset of pigs from a Landrace × Large White crossing.
When the best combination between the quantile (τ) and the regularization parameters (λ) for obtaining the GEBVs was considered for each trait (CY, BD, and LRBF), accuracy ranged from 0.39 to 0.54 (Table 2). RQR presented better or equal results, compared with those of the Blasso method, which indicates that RQR is as a good alternative for the analysis. There was an increase of 6.7, 20.0, and 0.0% in accuracy for CY, BT, and LRBF, respectively, considering RQR_{0.15} (τ = 0.15), RQR_{0.45} (τ = 0.45), and RQR_{0.50} (τ = 0.50). These results make sense, because, unlike traditional methods, which are based on conditional means, E(Y|X), RQR allows to fit the model to different parts of the Y distribution and enables a more complete study of the phenomenon of interest (^{Koenker & Basset, 1978}; ^{Cade & Noon, 2003}).
Trait^{(1)} | Model | ||
RQR^{(2)} | RQR^{(3)} | Blasso | |
CY | 0.48 (2.5) | 0.10 (43.22) | 0.45 (43.22) |
BT | 0.54 (14.5) | 0.26 (22.44) | 0.45 (22.44) |
LRBF | 0.39 (11.5) | 0.01 (31.37) | 0.39 (31.37) |
^{(1)}CY, carcass yield; BT, bacon thickness; and LRBF, backfat thickness immediately after the last rib in the dorsal-lumbar line. ^{(2)}Model with shrinkage parameter value from the grid of values. ^{(3)}Model with shrinkage parameter value from Blasso. Values between parentheses were used as shrinkage parameters in the fit of the models.
The estimates of the RQR model obtained with the λ values from the Blasso method show that using identical λ values in the fittings, via RQR and Blasso, is not a good strategy for RQR, because the accuracy values obtained with this regression were lower than those with Blasso and the RQR models presented (Table 2). This result may be explained by the differences in the regularization strengths (shrinkage) between the two methods. In quantile regression, shrinkage causes a greater shortening of the parameter estimates than in Blasso. As an alternative, the λ value can also be defined through cross-validation (^{Silva et al., 2011}) or Bayesian inference (^{Alhamzawi et al., 2012}).
The Spearman’s correlation and Cohen’s kappa coefficients ranged from low to moderate (Table 3). The obtained values for CY and BT were, respectively: 0.57 and 0.38; and 0.35 and 0.32. These results allow inferring that different animals were selected with the used models, since both coefficients were low. According to ^{McHugh (2012)}, Cohen’s kappa coefficient values below 0.39, as observed for CY and BT, are indicative of a minimum agreement between methods. For LRBF, the values of Spearman’s correlation and Cohen’s kappa coefficients were, respectively, 0.69 and 0.54. The greater similarity between these values for LRBF may be due to the better performance of the median (τ = 0.5) in obtaining the GEBVs for this trait. It should be noted that the values for LRBF were higher than those for the other traits and show a greater agreement between the methods in the classification of animals. However, for Cohen’s kappa coefficient, the level of agreement is still considered weak according to ^{McHugh (2012)}.
Trait^{(1)} | r_{s} | k^{(2)} |
CY | 0.57*** | 0.38 |
BT | 0.35*** | 0.34 |
LRBF | 0.69*** | 0.54 |
^{(1)}CY, carcass yield; BT, bacon thickness; and LRBF, backfat thickness immediately after the last rib in the dorsal-lumbar line. ^{(2)}Refers to the agreement among 10% of the individuals that presented the highest genomic breeding values predicted by RQR and Blasso. ***Significant at 1% probability.
The SNP effects were higher in the RQR models than in Blasso for the three evaluated traits (Figure 1). Moreover, the greater strength of the shrinkage parameter on the estimation of SNP effects in quantile regression allows better discriminating these effects, which improves the definition of important regions for QTL search. The results obtained for RQR are indicative that, for CY, a higher marker effect is observed on chromosome 7. This is in alignment with ^{Sanchez et al. (2014)}, who identified a significant QTL in Large White pigs for this trait in the same chromosome.
For the trait BT, it is possible to notice that the SNP with the greatest effect was found on chromosome 1 (Figure 1). ^{Rückert & Bennewitz (2010)} identified two significant QTLs for this trait on the same chromosome in a wide range of F_{2} pig populations. Regarding LRBF, the SNPs with greatest effects were found on chromosomes 1, 4, and 7. Similarly to the present study, a QTL on chromosome 1 was found in a population from a cross between Erhualian and Duroc pigs (^{Ai et al., 2012}), on chromosome 4 in an F_{2} Iberian x Meishan pig population (^{Tomás et al., 2011}), and on chromosome 7 in an F_{2} Pietrain x commercial breed pig population (^{Cherel et al.,2011}).