Genomic growth curves of an outbred pig population

In the current post-genomic era, the genetic basis of pig growth can be understood by assessing SNP marker effects and genomic breeding values (GEBV) based on estimates of these growth curve parameters as phenotypes. Although various statistical methods, such as random regression (RR-BLUP) and Bayesian LASSO (BL), have been applied to genomic selection (GS), none of these has yet been used in a growth curve approach. In this work, we compared the accuracies of RR-BLUP and BL using empirical weight-age data from an outbred F2 (Brazilian Piau X commercial) population. The phenotypes were determined by parameter estimates using a nonlinear logistic regression model and the halothane gene was considered as a marker for evaluating the assumptions of the GS methods in relation to the genetic variation explained by each locus. BL yielded more accurate values for all of the phenotypes evaluated and was used to estimate SNP effects and GEBV vectors. The latter allowed the construction of genomic growth curves, which showed substantial genetic discrimination among animals in the final growth phase. The SNP effect estimates allowed identification of the most relevant markers for each phenotype, the positions of which were coincident with reported QTL regions for growth traits.


Introduction
The success of pig production systems, including the evaluation of alternative management and marketing strategies, requires knowledge of the body weight behavior over time, commonly referred to as the growth curve. This knowledge allows the assessment of growth characteristics in actual production situations and translates this information into economic decisions.
Differences among animal growth curves partly reflect genetic influences, with multiple genes contributing at different levels to the overall phenotype. Hence, selection strategies that attempt to modify the growth curve shape to meet demands of the pork market are very relevant. In the current post-genomic era, understanding the genomic basis of pig growth cannot be limited to simply estimating marker effects using body weight at a specific time as a phenotype, but must also consider changes in body weight over time. According to Pong-Wong and Hadjipavlou (2010) and Ibáñez-Escriche and Blasco (2011) this can be done by estimating the marker effects for parameters of nonlinear regression models that are widely used to describe growth curves.
Regardless of the phenotype used, a major challenge in genome-wide selection (GS) is to identify the most powerful statistical methods for predicting phenotypic values based on estimates of marker effects. Since the seminal GS paper by Meuwissen et al. (2001), several studies have compared the efficiency of simple methods, such as the RR-BLUP (Random Regression Blup) (Meuwissen et al., 2001), with more sophisticated methods, such as Bayesian LASSO (BL) (de los Campos et al., 2009). The main difference between these two very popular GS methods is that the first one assumes, a priori, that all loci explain an equal amount of genetic variation, while the second one allows the assumption that each locus explains its own amount of this variation. Although these two methods have already been compared in other studies, so far there has been no comparison of these methods using a major gene, such as the halothane gene in pigs (Fujii et al., 1991), as a marker. In addition, these methods have not yet been applied to the analysis of growth curves in conjunction with nonlinear regression models.
In this study, we compared the accuracies of RR-BLUP and BL for predicting genetic merit in an empirical application using weight-age data from an outbred F2 (Brazilian Piau X commercial) pig population (Silva et al., 2011). In this approach, the phenotypes were defined by parameter estimates obtained with a nonlinear logistic regression model and the halothane gene was considered a single nucleotide polymorphism (SNP) marker in order to evaluate the assumptions of the GS methods in relation to the genetic variation explained by each locus. Genomic growth curves based on genomic estimated breeding values were constructed and the most relevant SNPs associated with growth parameters were identified.

Material and Methods
The phenotypic data was obtained from the Pig Breeding Farm of the Department of Animal Science, Universidade Federal de Viçosa (UFV), MG, Brazil. A three-generation resource population was created and managed as described by Band et al. (2005). Briefly, two naturalized Piau breed grandsires were crossed with 18 granddams from a commercial line composed of Large White, Landrace and Pietrain breeds, to produce the F1 generation from which 11 F1 sires and 54 F1 dams were selected. These F1 individuals were crossed to produce the F2 population, of which 345 animals were weighed at birth and at 21, 42, 63, 77, 105 and 150 days of age. The use of these animals was reviewed and approved by the Bioethics committee of the Department of Veterinary Medicine (DVT-UFV) in agreement with the Guide to the Care and Use of Experimental Animals of the Canadian Council on Animal Care.
The SNPs used for fine mapping and estimation of marker effects were selected based on their spacing within chromosomes that contained quantitative trait loci (QTL) previously identified in this same population. These markers were distributed as follows: SSC1 (n = 56), SSC4 (n = 54), SSC7 (n = 59), SSC8 (n = 30), SSC17 (n = 25) and SSCX (n = 12), with the average distances (cM) being, respectively, 5.17, 2.37, 2.25, 3.93, 2.68 and 11.0. The animals were genotypes using Golden Gate/VeraCode technology, which provides a robust and flexible platform, in conjunction with a BeadXpress reader from Illumina. A total of 237 markers (236 SNPs plus the halothane gene) were used, with the halothane gene being considered a special marker for reasons explained later.
Five of the nonlinear regression models (Brody, Gompertz, logistic, von Bertalanffy and Richards) most widely used to describe animal growth curves were fitted to the weight-age data using the nls function of R (R Develop-ment Core Team, 2011) software. The usefulness of the models was compared based on the goodness of fit, the adjusted coefficient of determination and the residual standard deviation, which showed a relative superiority of the logistic model. This model is given by (Ratkowsky, 1983): where w ij is the animal weight i at age (t)j, a 1i is the mature weight (kg), a 2i reflects the weight at time t = 0 (birth weight, kg), a 3i is a general growth rate (curve slope at the inflection point), and e ij is a residual term, assumed to be independent and normally distributed.
Once the parameters estimates of the logistic model for each animal ( $ a 1i , $ a 2i and $ a 3i ) were obtained, these values were used as dependent variables in a linear model in order to perform a pre-correction for fixed effects (sex and lot). These pre-corrected observations (residual plus general mean) were used as phenotypes in the GS regression models in order to identify SNP marker effects. This is a well-known two-step procedure (Varona et al., 1999;Pong-Wong and Hadjipavlou, 2010) in which, in the first step, a growth curve is fitted separately to the data of each animal and, in the second step, the growth curve parameter estimates from the previous step are taken as records. The main advantage of this method is its statistical simplicity since the growth model and GS model are fitted independently; however, more sophisticated methods, such as Bayesian joint analysis (Varona et al., 1999) also have been used in GS (Ibáñez-Escriche and Blasco, 2011) and produced good results.
With respect to GS, the phenotypic outcomes (pre-corrected phenotypes for fixed effects), denoted from now on by y i (i = 1, 2, ..., 345), were regressed on marker covariates x ik (k = 1, 2, ..., 237) following the regression model proposed by Meuwissen et al. (2001): where y i is the phenotypic observation of animal I, m is the general mean, b k is the effect of marker k and e i the residual term, e~N i e ( , ) 0 2 s , in which are included the dominance and epistatic effects. In this model, x ik take the values 2 -2p k , 1 -2p k and 0 -2p k for the SNP genotypes AA, Aa and aa at each locus k, respectively, where p k is the allele frequency at the locus k. Using matrix notation, this GS model can be rewritten as: where 1' and I are, respectively, a unit vector and an identity matrix with dimensions 345, where y = [y 1 , y 2 , ..., y 345 ]' 345x1 , X k = [x 1k , x 2k , ..., x 345k ]' 345x1 and e = [e 1 , e 2 , ..., e 345 ]' 345x1 .
The first method used to fit the GS model was RR-BLUP (Meuwissen et al., 2001) (i.e., all loci explain an equal amount of genetic variation). This method was implemented using an equivalent model (called G-BLUP) in the R software (R Development Core Team, 2011) by package rrBLUP (Endelman, 2011) (Habier et al., 2007) it can be shown that: with G being the so-called genomic relationship matrix. With this approach, it is possible to work through the well-known Henderson mixed model equation using the REML estimation method, which provides estimates of variance components for calculating the heritability by 237 237 1 , the estimated marker effects vector can be obtained by the simple normal equation system of $ ( The second used method was BL (de los Campos et al., 2009), which is a more general method because it assumes that each locus explains its own amount or contribution to the overall variation. This method has been used to solve multicollinearity problems and may also be employed in situations where there are more markers (covariates) than observations. The BL is a penalized Bayesian regression procedure, whose general estimator is given by where l is the regularization parameter. When l = 0 there is no regularization and when l > 0 there is a shrinkage of the marker effects toward zero, with the possibility of setting some identically equal to zero, resulting in a simultaneous estimation and variable selection procedure. This characteristic of the BL method is especially useful for F2 populations in which larger haplotype blocks with many redundant SNPs in each block are expected; the latter implies multicollinearity that impairs fitting of the model (Eq. (2)). Thus, the regularization imposed by BL reduces some redundant SNPs to zero, thereby bypassing the problem of multicollinear SNPs in haplotype blocks. The package BLR (de los Campos et al., 2009;Pérez et al., 2010) of R software was used for this analysis. This package assumes that the joint prior distribu- , s e 2 is the residual variance, with a scaled inverse c 2 prior distribution, and t k 2 is the scale parameter related to each marker. The BLR method also assumes that the joint prior distribution for the scale parameters ( , ,... , ) t t t , and that the l prior distribution is Gamma(n 1 , n 2 ). The BL method was implemented using 10,000 MCMC (Markov chain -Monte Carlo) iterations, with a burn-in and thin of 5,000 and 2 iterations, respectively. The plausibility of these values was assessed for each MCMC chain separately using Raftery-Lewis and Geweke convergence diagnostics in boa (Smith, 2007)  As stated earlier, the main difference between RR-BLUP and BL is the assumption related to the equality of marker effects variance, and some studies has used simulated (Meuwissen et al., 2001) and real (Moser et al., 2009) data to verify this assumption. However, to date no studies have applied this assumption to real scenarios in which known major genes are postulated as markers. For this reason, the well-known halothane gene that affects growth traits (Miller et al., 2000) was included in the analysis as a simple marker to provide a real situation of unequal marker effects variances. Of the 345 animals, 291 (84.3%) and 54 (15.7%) had the normal (HAL NN ) and heterozygous (HAL Nn ) halothane genotype, respectively; none of the animals had a double recessive genotype (HAL nn ). The allelic frequencies of N and n were 7 and 93%, respectively.
The leave-one-out cross-validation was used to compare RR-BLUP and BL. For this, the original data set with 345 animals was divided into 345 training data sets (D -i ) of 344 individuals, D -1, D -2 , ..., D -345 , with each containing marker and phenotypic information of all the animals, except for animal -i. With this approach, there was no loss of generality for each method and each phenotype. In these 522 Pig growth and SNP markers analyses, the predicted genomic breeding value of animal i for each trait (parameter estimates $ a 1 , $ a 2 and $ a 2 of the logistic model) was calculated as $ $ u i where r yu $ is the correlation between observed phenotype (y) and $ u, and h 2 is the estimated heritability. A linear regression with y as the dependent variable and $ u as the independent variable was used to screen for bias produced by each method; in this case, a regression coefficient of one indicated an unbiased method.
Once = u u . Subsequently, the "genomic growth curve" for each animal could be estimated using the relationship: where $ y ij is the genomic breeding value of each animal i for weight at each age (t ij ) of interest j (even for no observed ages), m a $ 1 , m a $ 2 and m a $ 3 are the average of each phenotype (parameter estimates for the logistic model) and $ $ u 1i a , $ $ u 2i a and $ $ u 3i a are the GEBVs for these phenotypes. Table 1 shows the performance of BL and RR-BLUP. Overall, the two methods provided highly accurate values (mean overall accuracy: 0.69 ± 0.09), as shown by the strong association between the phenotypic assessment and the true breeding values. To date, there has been no evaluation of the accuracy of genomic selection for growth traits in pigs using real data, although Akanno (2012) reported a simulation study in which a performance trait (average daily gain) was simulated for different populations. In this simulation, the so-called synthetic population, which was similar to the F2 population of the present study, showed an accuracy of 0.61.

Results and Discussion
Comparison of the two methods showed that the BL method yielded more accurate values than RR-BLUP for all of the phenotypes. The mean accuracy of BL (0.70 ± 0.09) was greater than for RR-BLUP (0.62 ± 0.06), indicating that in a real data set in which one locus is known to have a larger effect (in this case, the halothane gene) the BL property of assuming different variances for markers ensures greater efficiency in genomic selection. Although both methods underestimated the breeding values, i.e., the regression coefficient estimates between the observed and predicted phenotypes were slightly greater than 1.0, for the BL method these coefficients were closer to unity for all phenotypes (indicating low bias) when compared to the corresponding values for RR-BLUP. These findings agreed with those of Ogutu et al. (2012), who used a simulated data set to show that LASSO type regressions were more efficient than RR-BLUP for genomic selection because they provided more accurate and less biased predictions.
Table 1 also shows that the accuracy of the differences between the two methods was more pronounced for the trait mature weight (a 1 ) than for the traits birth weight (a 2 ) and growth rate (a 3 ). This finding provides an indirect indication that the genetic architecture of a 1 is possibly more influenced by loci that exert larger effects, such as the halothane gene, than the other two traits. This conclusion reflects the fact that BL works better than RR-BLUP when the assumption of an equal contribution of genetic variance in the latter model is violated, as was the case here with the large-effect halothane gene that was used as a marker.
The effect of the halothane gene on estimates of growth curve parameters has not been examined before and this precludes a direct assessment of the effects of this major gene on mature weight. However, Band et al. (2005) observed a significant effect of the halothane gene on weight at 105 days of age (W105) in animals from this same population (F2 Piau X commercial). These authors also evaluated the weight at birth in relation to parameter a 1 and the average daily gain in relation to parameter a 3 . The halothane gene had a significant effect only on W105, indicating that this gene has a greater influence on final weight than on other performance traits.
In summary, the results in Table 1 indicate a superiority of BL in providing accurate estimates and suggest that this method can be recommended for estimating SNP effects and GEBV vectors. However, before using these vec- Silva et al. 523  tors to select markers and animals it is important to estimate heritabilities and genetic correlations (Table 2) in order to assess whether the three traits (a 1 , a 2 and a 3 ) are really relevant in a breeding program.
The high values of h 2 (estimated with the BL method) in Table 2 indicate that these traits can be a viable alternative for pig breeding in which the aim is to produce efficient high growth animals. However, direct selection for high a 1 implies a selection for low a 3 (as indicated by the high negative genetic correlation, -0.69, between these two parameters), i.e., in general terms such selection will result in less precocious animals (low growth rate) with a larger body size at maturity. On the other hand, direct selection for high a 3 can result in more precocious animals with high a 2 (birth weight) but low maturity weight (a 1 ). This negative correlation is expected and has been reported in most studies of growth curves and animal breeding, e.g., Koivula et al. (2008) in pigs, Mignon-Grasteau et al. (1999) in chickens and Forni et al. (2007) in beef cattle.
In practical terms, in pig breeding it is better to select for high growth rate (a 3 ) since slaughter is occurs at a standard slaughter weight (~65 kg for this F2 Piau X commercial population). As a result, the time to slaughter will be significantly lower with a high growth rate, resulting in lower production costs because the slaughter weight is practically the same for all animals (65 kg), with the difference between them reflecting the time required to reach this standard weight. Thus, animals slaughtered at an early age will generate substantially lower feeding costs. The high positive correlation between a 3 and a 2 means that young pigs do not show a loss in early growth and consequently guarantees good nutrition and health for subsequent growth phases.
As mentioned earlier, the main focus of this paper was to construct genomic growth curves using the genomic estimated breeding values (GEBVs) for each parameter of a logistic growth curve (Eq. (4)). Figure 1 shows the genomic growth curve for each animal of the F2 population.
The sigmoidal behavior of the curves in Figure 1 was ensured by the average estimates of the growth curve parameters (m a $ 1 , m a $ 2 and m a $ 3 ) and the GEBVs ( $ $ u 1i a , $ $ u 2i a and $ $ u 3i a ); the latter parameters do not predict this behavior as they may have positive or negative values and are thus outside the limits of the logistic model. The estimates for m a $ 1 , m a $ 2 and m a $ 3 were, respectively, 118 kg, 1.4 kg and 0.03 kg/day. There are no data in the literature regarding maturity weight (m a $ 1 ) of the present F2 population because the animals are slaughtered at a live weight of 60-65 kg.
One of the advantages of nonlinear growth curve models (such as the logistic curve) is the ability to predict maturity weight using partial weight-age data, i.e., it is possible to estimate the maturity weight before this weight is reached. The value observed here (118 kg) was lower than in other populations such as Yorkshire pigs (201 kg; Koivula et al., 2008) and 4-way-cross pigs (160 kg; Kusec et al., 2007). This lower weight reflects the characteristically small body size of the Piau breed compared to commercial breeds. The estimated growth rate (m a $ 3 ) (0.03 kg/day) was higher than that reported by Koivula et al. (2008) (0.016 kg/day) and lower than that reported by Kusec et al. (2007) (0.05 kg/day) for Yorkshire pigs and commercial crosses, respectively. Kusec et al. (2007) also reported estimates for the time until maximum gain (time to inflection point), which were~120 and 105 days, respectively, for intensive and restrictive feeding systems. In summary, even though the values cited above were obtained in different populations, the comparisons nevertheless provide a useful means of obtaining a general characterization of the growth patterns in the F2 population. Furthermore, as mentioned earlier, there are no data on the growth behavior of this experimental population, which was initially developed to combine the precocity of com-524 Pig growth and SNP markers   Figure 1 shows that the weight differences among GEBVs were accentuated over time such that the genetic variance for weight increased with age. This finding may be related to the greater heritability of mature weight (a 1 ) compared to birth weight (a 2 ) ( Table 2) since this value is directly proportional to genetic variance for a given value of residual variance. Similar curves can be obtained using other approaches, e.g., random regression (Meyer, 1998), but in the present study a different approach was used in which (1) the plotted values were GEBVs that had been estimated by a sophisticated method (Bayesian LASSO) that simultaneously considered regularization and variable selection (SNP markers) and (2) the use of a nonlinear (logistic) growth model allowed estimation of the genetic parameters for traits with direct biological interpretation, e.g., maturity weight and growth rate, that could not been estimated by random regression. In practical terms, Figure 1 shows that the greatest genetic difference (based on GEBV) among animals at slaughter was 10 days since the animals were killed at a live weight of 65 kg.
Since there have been no reports on QTL detection for the three phenotypes considered in this study and since genomic selection is based on SNP marker estimates, the 5% most relevant SNPs (12 SNPs out of 237 SNPs) for each phenotype (a 1 , a 2 and a 3 ) were used to assess whether their chromosomal positions coincided with others already reported in the literature as being QTLs related to general growth traits. Table 3 shows these selected SNP markers and the absolute values of their estimated effects and chromosomal positions in cM.
The SNPs with the greatest effects on a 1 were located mainly on chromosomes SSC6, SSC7, SSC17 and SSCX, with particular emphasis on the magnitude of the effect of the halothane gene (0.9114) on this phenotype, as already mentioned for Table 1. The position of marker ALGA0042216 at SSC7 (60.4 cM) agreed with Yue et al. (2003), who found a significant QTL for weight at slaughter at position 65.2 cM in crosses between Meishan (M), Pietrain (P) and European Wild Boar (W). Pierzchala et al. (2003), using a population similar to those studied by Yue et al. (2003), found a significant QTL for weight at slaughter at position 51.1 cM of SSC17, i.e., close to position 45.2 cM for marker ALGA0095662. In general, in the absence of QTL detection for mature weight (a 1 ), these references to weight at slaughter may be useful for validating the relevance of the identified SNPs for this phenotype.
In relation to trait a 2 , other studies have identified QTLs associated with birth weight at positions close to those of SNPs indicated in Table 3. Su et al. (2002) analyzed data from a Large White (LW) x M intercross population and found a significant QTL at position 153 cM of SSC1, which coincided with the position of marker ALGA0010089. Cepica et al. (2003) used F2 families based on crosses of M, W and P breeds and identified a significant QTL at position 117 cM of SSCX, close to the position of marker MARC0051258 (112.2 cM). Silva et al. 525  With respect to phenotype a 3 , the SNPs positions in Table 3 were compared with significant QTL positions for the trait average daily gain (ADG), which is frequently mentioned in the literature and is highly related to the estimated growth (a 3 ) rate in the present study. The position of marker ALGA0047440 on SSC8 (15 cM) was close to position 10 cM reported by de Koning et al. (2001) who used data from an experimental cross between M and Dutch commercial lines. Sanchez et al. (2006) used data from a backcross M x LW and found a significant QTL at position 143 cM of SSC1, close to marker ALGA0010089 (1553.3 cM).
Five SNPs (ALGA0006708, ALGA0010089, ALGA0029483, ALGA0047440 and MARC0051258) in Table 3 were simultaneously among the most important for a 2 and a 3 , which could perhaps explain the greater genetic correlation between birth weight and growth rate in Table  2. In summary, the relevant SNPs listed in Table 3 were associated with QTL regions for growth traits previously reported in the literature. This finding indicates that these regions can be exploited molecularly to enhance pig breeding.
As a general conclusion, Bayesian LASSO provided more accurate values than RR-BLUP for all of the phenotypes evaluated and worked best when the assumption of an equal contribution to genetic variance was violated, e.g., when the halothane major gene was used as a marker in this study. The GEBV vectors allowed the construction of genomic growth curves that revealed substantial genetic discrimination among animals in the final phase of growth. Estimates of the effects of SNPs allowed identification of the most relevant markers for each phenotype, the positions of which coincided with already well-known QTL regions for growth traits.