Random regression models to estimate genetic parameters for milk production of Guzerat cows using orthogonal Legendre polynomials

– The objective of this work was to compare random regression models for the estimation of genetic parameters for Guzerat milk production, using orthogonal Legendre polynomials. Records (20,524) of test‑day milk yield (TDMY) from 2,816 first‑lactation Guzerat cows were used. TDMY grouped into 10‑monthly classes were analyzed for additive genetic effect and for environmental and residual permanent effects (random effects), whereas the contemporary group, calving age (linear and quadratic effects) and mean lactation curve were analized as fixed effects. Trajectories for the additive genetic and permanent environmental effects were modeled by means of a covariance function employing orthogonal Legendre polynomials ranging from the second to the fifth order. Residual variances were considered in one, four, six, or ten variance classes. The best model had six residual variance classes. The heritability estimates for the TDMY records varied from 0.19 to 0.32. The random regression model that used a second‑order Legendre polynomial for the additive genetic effect, and a fifth‑order polynomial for the permanent environmental effect is adequate for comparison by the main employed criteria. The model with a second‑order Legendre polynomial for the additive genetic effect, and that with a fourth‑order for the permanent environmental effect could also be employed in these analyses


Introduction
Guzerat breed was introduced in Brazil at the end of the nineteenth century, and it proved to be well adapted to the country's various tropical conditions (Egito et al., 2002). This breed stands out among Zebu ones for its high rusticity and potential for double aptitude (meat and milk), which makes it an important livestock resource in tropical areas (Peixoto et al., 2010).
In 1994, the National Breeding Program for the Improvement of Guzerat Dairy Cattle (PNMGuL) was implemented, under the coordination of Embrapa Dairy Cattle Research Center (Embrapa Gado de Leite), in a partnership with the Brazilian Center for Genetic Improvement of Guzerat Breed (CBMG 2 ). The selection has been based on the results of genetic evaluations of production data from progeny tests, and on a multiple ovulation and embryo transfer (MOET) nucleus scheme. The genetic evaluations for milk production have relied on the animal model method, using repeated measures of milk production accumulated in 305 days of lactation.
Guzerat breed is second in population among the Zebu breeds in Brazil, and is widely used in crossbreeding schemes to produce dairy cows and beef calves. The number of herds and animals taking part into the Brazilian Guzerat breeding program are still small, but it is increasing, as many as the use of this breed in dual purpose production systems. This can be attributed to the belief of breeders in the results of genetic evaluations in this program.
The 305-day milk yield used in the genetic evaluations is calculated from the test-day records with a genetic correlation of unity assumed between records repeated over time. However, an appropriate model should account for the mean and covariance structure, which changes with time and should allow of the estimation of required genetic parameters (Mrode, 2005). Therefore, the evaluation of alternative methods is necessary for a better use of the available information, and for a more accurate prediction of breeding values to achieve increased genetic gains.
Among the alternative methods used to analyze milk production data are random regression models (RRMs). These models employ data on traits that are repeated over time, such as milk yield and body weight, which allows of the assessment of the test-day production information. These models consider the information on a particular trait in the form of a curve, supplying the random values as regression coefficients. Therefore, RRMs fit random curves for each individual, expressed as deviations from a mean curve of the entire herd or group of cows. Meyer & Hill (1997) verified the equivalence between RRMs and covariance functions (CF). According to Werf & Schaeffer (1997), these functions allow of a continuous change in the variances and covariances of the traits measured at different points of a trajectory, as lactation for instance, describing the covariances between the measures taken at determined points (days of lactation) as a function of these days. Likewise, the matrix of covariances between the regression coefficients describes the structure of the variances and covariances over time. Kirkpatrick & Heckman (1989) proposed the use of covariance functions with Legendre polynomials, to model the variance structure of longitudinal data, and to estimate the additive genetic and permanent environmental variance components. This way, regression models allow of the estimation of genetic factors for milk production in any lactation period.
Legendre polynomials are also among the main functions used to model the fixed curve and its trajectories for the random effects in RRMs because they have the computational advantage of reducing the correlation of estimated coefficients, thus facilitating convergence (Schaeffer & Jamrozik, 2008). In particular, orthogonal Legendre polynomials have been widely used in random regression models to estimate covariance functions and to fit lactation curves of dairy cows (Olori et al., 1999;El Faro & Albuquerque, 2003;Araújo et al., 2006;Bignardi et al., 2009;Takma & Akbas, 2009).
In general, Legendre polynomials of different orders are used to model the trajectories for random effects. Brotherstone et al. (2000) reported that in situations for which there are computational limits, high-order polynomials are more efficient for modeling random effects. However, according to other authors, third-degree Legendre polynomials are sufficient for fitting lactation curves for genetic and permanent environmental effects (Pool & Meuwissen, 2000).
Other studies applying Legendre polynomials in RRMs also show the need to consider the heterogeneous structure of the residual variance in the model for good results (Olori et al., 1999;El Faro & Albuquerque, 2003;Takma & Akbas, 2009). RRMs that consider the heterogeneity of the residual variances fit data better because they do not overestimate the additive variance which occurred in early studies with RRMs (Jamrozik & Schaeffer, 1997).
The objective of this work was to compare random regression models for the estimation of genetic parameters for Guzerat milk production, using orthogonal Legendre polynomials.

Materials and Methods
Data were obtained from the Programa Nacional de Melhoramento do Guzerá para Leite, conducted by Embrapa Gado de Leite, in cooperation with the Centro Brasileiro de Melhoramento Genético do Guzerá and the Associação Brasileira dos Criadores de Zebu. Data comprised 20,524 first-lactation test-day records from 2,816 cows, calving from 1987 to 2009 in 28 herds, in Southeastern and Northeastern Brazil. Cows were daughters of 371 sires and were 23 to 65 months old at their first calving. The pedigree file included data from 10,753 animals.
The test-day milk yield records used in the analyses included data from the 6 th to the 305 th day after calving, distributed in categories, according to the month after calving. For better data consistency and correct estimation on the genetic parameters, cows with unknown cause for the cease of lactation and less than four records in different categories were excluded from the analyses. This procedure had previously been adopted by Santos et al. (2013b), so that animals with 90 to 120 days in lactation had all the monthly records, and those with lactations less than 90 days, whose cause of drying is unknown, were discarded.
Contemporary groups were formed according to farm, year and season at calving, considering the dry season (April to September) or rainy season (October to March). Contemporary groups with less than three animals were excluded from the analyses. The number of records, means, standard deviations and coefficients of variation for test-day milk yield records, according to the monthly lactation categories, are described in Table 1.
Random regression analyses were conducted using a single-trait model which included the random effects of animal, permanent environment and error, and the fixed effects of contemporary group, age at calving (linear and quadratic regressions), and mean population lactation curve. All variance and covariance components were estimated by the restricted maximum likelihood (REML) method, using Wombat, version 1.0 (Meyer, 2007b).
The average lactation curve was fitted by regression on lactation days, with Legendre polynomials of the third order. Animal additive genetic and permanent environment random effects were fitted by regressions on lactation days, with Legendre polynomials tested for different orders, ranging from 2 to 5. The error covariance structure was tested as homogeneous or heterogeneous, with the latter categorized in ten, six or four steps. For the 10-step models, each lactation month was considered as an error variance category. For the 6-step and 4-step models, categories were defined according to the variance proximity between lactation months. Thus, the error-variance categories, according to lactation months, were 1, 2, 3-5, 6, 7, and 8-10, for the 6-step models, and 1, 2-5, 6-7, and 8-10 for the 4-step models.
The model was (El Faro et al., 2008): y = Xb + Za + Wpe + e, in which y is a vector containing N test-day milk yield records from N a animals; b is a vector of solutions for the contemporary groups and the fixed regressions for mean lactation curve (linear and quadratic), on age at calving; a is a vector of solutions for random regressions for the animal additive genetic effect; pe is a vector of random regressions for the permanent environment effect; e is a vector of errors; X is the incidence matrix relating records to the fixed effects; and Z and W are the incidence matrices containing orthogonal polynomials relating, respectively, the animal additive genetic effect and the permanent environment effect for lactation days.
Random effects were assumed to be normally distributed with expectation equal to zero and the following covariance structure (El Faro et al., 2008): Pesq. agropec. bras., Brasília, v.49, n.5, p.373-383, maio 2014 DOI: 10.1590/S0100-204X2014000500007 in which: K a and K pe are the variance and covariance matrices, between the additive genetic coefficient and the permanent environmental regression coefficient, respectively; A is the relationship matrix of order equal to the number of animals in the pedigree; I Nd is an identity matrix of dimension Nd; and R is a diagonal matrix of residual variances. R is a diagonal matrix that accommodate from 1 to n classes of residual variances for the different test-days; and σ 2 a and σ 2 pe are the animal additive genetic and permanent environment, respectively.
Comparisons among different models were made using Akaike's information criterion (AIC) (Akaike, 1973), Schwarz Bayesian information criterion (BIC) (Schwarz, 1978), and the likelihood ratio test (LRT) (Wolfinger, 1993). The information criteria can be represented with the following equations: AIC = -2log L + 2p and BIC = -2log L + plog (N -r), in which: p is the number of parameters in the model; N is the total number of records; r is the rank of the incidence matrix (X) for the systematic effects in the model; and log L is the logarithm of the restricted maximum likelihood function. The likelihood ratio test (LRT) was used for the comparison of different error-variance structures (Huelsenbeck & Bull, 1996). The significance of the differences between models was obtained using a chi-square test of the log L values, in which the degrees of freedom equaled the number of parameters from the models.
Another analysis was also conducted on the same data file, including all test-day milk yield records in a finite dimensional model (FDM), for comparison with the random regression models. The FDM were then divided into monthly classes, according to days after calving, for a total of 10 classes. In this model each monthly class was considered one trait, and the 10 traits were analyzed together. Random effects for animal additive genetic and error effects were determined along with the fixed effects of contemporary group, age at calving (linear and quadratic regressions) and days in milk (linear regression).
The notation for the models fitted with orthogonal Legendre polynomials followed the pattern "Leg. K a .K pe .R", in which: K a and K pe indicate the numbers of random regression coefficients used to fit the additive genetic effect and the permanent environment effect, respectively, and R indicates the number of categories for the error-variances.

Results and Discussion
When models 1 to 4 were compared to verify the best residual variance structure, the LRT indicated that the model which considered homogenous variance of residuals produced the worst fit, which suggests that the residual variances have different behavior during the lactation period (Table 2). Similar results with Legendre polynomials were found by El Faro & Albuquerque (2003) for the Caracu breed, by Bignardi et al. (2009) for the Holstein breed, and Costa et al. (2005) for the Gyr breed. Among the models that considered different residual variance classes, the model with six classes fit data best according to AIC, BIC and LRT. The model with six classes and that with ten were not different according to the LRT ( Table 2). The results suggest that six heterogeneous residual variance classes are sufficient to model the residual variance structure during lactation in this population.
After choosing the residual variance structure, the models were compared by combining different orders of the polynomials for the additive genetic effect and the permanent environmental effect. When compared, it was observed that the models with the same polynomial order to model the additive genetic effect, the values of log L, AIC, and BIC improved with the increase of the polynomial order to model the permanent environmental effect. The same tendency was reported by El Faro et al. (2008) and Bignardi et al. (2009).
The Leg.6.6.6 model attained the best values of log L and AIC, while the Leg.3.6.6 model had the best BIC. The BIC is considered a more rigorous criterion than the AIC, and tends to penalize parameterized models by selecting more parsimonious models (Meyer, 2007a). The Leg.6.6.6 model, however, contained 48 parameters, 15 more than the Leg.3.6.6 model, suggesting that the Leg.6.6.6 model is overparameterized. For the comparisons of the estimated genetic parameters, the following was chosen: the Leg.6.6.6 model (best AIC -least rigorous), Leg.3.6.6 model (best BIC -most rigorous), and Leg.3.5.6 model (second-best BIC and alternative parsimonious model due to the smaller number of parameters).
The phenotypic variances estimated by the Leg.3.5.6, Leg.3.6.6 and Leg.6.6.6 models were very near to each other and had the same tendency as those estimated by the FDM, showing a decreasing behavior with the progress of lactation, but the RRMs showed a slight increase in the last month ( Figure 1). This typical U-shape was reported in studies by Bignardi et al. (2009) and Miglior et al. (2009) with Holstein breed, and by Pereira et al. (2013a) with Gyr breed. Possibly, the high estimate for the phenotypic variance, in the first and last months, was due to the smaller number of records in these months (in particular, in the last month not all the cows had test-day records because of the high frequency of short lactations in the breed) (Meyer, 1999). However, according to Bohmanova et al. (2008), the U-shape can also be due to nonincluded factors in the model, such as the effect of pregnancy and preferential treatment between herd variations in the shape of the lactation curve.
Regarding the genetic variances, the Leg.3.5.6 and Leg.3.6.6 RRMs showed similar tendencies for the entire lactation period. The Leg.6.6.6 model showed higher estimates from the second to the fourth month of lactation than the other RRMs and FDM (which did not obey a structuring or a function in the estimation process). This result indicates that the obtained variances with Leg.6.6.6 model are possibly overestimated, due to the overparameterization of this model. Thus, lower order polynomials are preferred to model the genetic effect by RRM, irrespectively of data size. Starting in the second half of the lactation period, the RRMs tended to show similar tendencies and values. The difference observed in the initial months for the additive genetic variance estimates possibly resulted from using a polynomial of higher order to model the additive genetic effect. This indicates that the modeling in the initial lactation phase is susceptible to variation in the polynomial's order, for this effect, suggesting a greater difficulty of RRMs to model the variation in this phase. The RRMs and FDM showed similar tendencies and mutually close estimates in the five last months of lactation, also in the third month. However, they showed different tendencies in the initial months, with the RRMs producing estimates of higher magnitude for the genetic variance.
The estimates of the environmental variances resulting from the sum of the permanent and temporary environmental variances (σ 2 ap + σ 2 e or σ 2 e ) were very near to each other for all the RRMs and the FDM. The heritability estimates obtained by the RRMs (Leg.3.5.6, Leg.3.6.6 and Leg.6.6.6) and by the FDM are shown in Figure 3. Among the three RRMs, the Leg.6.6.6 model obtained the highest estimates for heritability between the second and fourth month. In this period, this model estimated higher values for the additive genetic variances, probably due to the higher order of the polynomial utilized to model the additive genetic effect. The heritability estimates obtained by the TDMO ranged from 0.15 (first month) to 0.27 (third month), and the estimates in the initial lactation phase were lower than those obtained by the RRMs. These heritability estimates are similar to those obtained by Santos et al. (2013b) for the Guzerat breed, and by Ledic et al. 2002 for the Gyr breed, which ranged from 0.16 to 0.24 and from 0.14 to 0.24, respectively. However, Herrera et al. (2008) found the highest heritability estimates, from 0.14 to 0.34 for Gyr breed, while Bignardi et al. (2008) found the lower estimates, which ranged from 0.07 to 0.19 for the Holstein breed.
The heritability estimates obtained by the Leg.3.5.6 and Leg.3.6.6 models were very near to each other, from 0.32 (first month) to 0.19 (eighth month), while for the Leg.6.6.6 model the heritability estimates varied from 0.35 (second month) to 0.18 (eighth month). Possibly, the high heritability estimates obtained by the RRMs in the initial months were due to the difficulty of modeling this period together with the lower number of records in these months. The heritability increase in the estimates, in the last month, can be attributed only to the lower number of records in this month, since the estimates obtained by the FDM, which do not use covariance functions, were also higher in this period. Some authors worked with Legendre polynomials and reported the same tendency of the estimates (El Faro & Albuquerque 2003). However, these authors attributed the results, at the end of lactation, to both the difficulty of fitting the Legendre polynomials and to the lower number of records. , and residual σ 2 e or (σ 2 pe + σ 2 e ) variances, and heritability estimated by Leg.3.5.6, Leg.3.6.6, Leg.6.6.6 random regression models, and by finite dimensional models (FDM). The magnitude of the heritability estimates found in the present study is similar to those reported by Olori et al. (1999), Cobuci et al. (2005), and Bignardi et al. (2009) for the Holstein breed. Similar results (0.20 to 0.33) were found by Santos et al. (2013a), who used RRM with parametric functions for Guzerat breed. When RRMs were used with parametric functions in Gyr dairy cows, Herrera et al. (2008) found similar estimates to those found here. Also Pereira et al. (2010) observed the similar estimates as the ones of the present study, but in a lager range (0.09 to 0.33). Costa et al. (2005), using Legendre polynomials for the Gyr breed, found higher heritability values, which ranged from 0.71, at the start, to 0.21 at the end of lactation, while Pereira et al. (2013aPereira et al. ( , 2013b found lower estimates, which ranged from 0.12 to 0.24 and from 0.12 to 0.26, respectively. These last authors used Spline function for estimating heritabilities, which ranged from 0.16 to 0.29 . El Faro et al. (2008) and Araújo et al. (2006) employed RRMs for Caracu breed and Holstein breed, respectively, and reported heritability estimates at the end of lactation lower than those obtained in the present study. The obtained heritability estimates were of medium magnitude and are consistent to those of the literature, indicating the suitability of RRM for the genetic evaluation of Guzerat milk yield.
The genetic correlation estimates obtained with the RRMs were high and near the unity for the adjacent records, with the values declining as the interval between the test-day records increased (Tables 3). The same tendency was reported by Bignardi et al. (2009) for the Holstein breed, by Kettunen et al. (2000) for the Ayrshire breed, by Herrera et al. (2008), and Pereira et al. (2013b) for the Gyr breed, and Santos et al. (2013a) for the Guzerat breed, all of them using parametric function. The three RRMs (Leg.3.5.6, Leg.3.6.6 and Leg.6.6.6) showed negative genetic correlation estimates between milk yield in the first and in the last test-day records, but of small magnitudes.
The Leg.6.6.6 model tended to produce lower genetic correlation estimates for milk yield, between the first lactation month and the other months than did the other RRMs, obtaining a negative value near zero (-0.003) for the genetic correlation estimate between the first and ninth month. Like heritability, the negative genetic correlation estimates can be attributed to the difficulty of RRMs to model test-day milk yield at the beginning and end of lactation (because of the smaller number of records), since the estimate between the TDMY at the start and end of lactation obtained by the FDM was positive and moderate in magnitude (0.59) ( Table 4).
Even with this difficulty, the estimated genetic correlations were of low magnitude and close to zero, so that the use of RRM would not cause major problems for different stages of lactation by correlated response to selection for any test-day milk yield. Negative correlations were also reported by El Faro et al. (2008) and Bignardi et al. (2009) using Legendre polynomials. Pereira et al. (2013b), using Legendre polynomials for Gyr dairy cows, found estimates of additive genetic correlations between daily milk yields closed to 0.99 between adjacent test days, decreasing to about 0.40 between records obtained at the beginning and at the end of lactation. When the parametric function was used in Gyr data, Herrera et al. (2008) did not estimate negative genetic correlations, while Pereira et al. (2010) reported negative genetic correlations between test-day records at the beginning and at the end of lactation. For Guzerat breed, Santos et al. (2013a) found -0.15 and -0.03 for genetic correlation between records at the beginning and at the end of lactation, with Wilmink and Ali & Schaeffer function respectively. However, Santos et al. (2013b), using a multi-trait model, estimated values which ranged from 0.91 to 0.56 between all monthly milk yields.
The permanent environmental correlations estimated by the RRMs were near to each other, with the Leg.6.6.6 model tending to show slightly higher estimates for the correlations between milk yields in the first month and the other months than the other models (Tables 5).
The correlations among the regression coefficients for the additive genetic and permanent environmental effects were small and very similar for the three RRMs (Table 6). For all the RRMs, the coefficient matrix eigenvalues for the permanent environmental effects were higher than those for the additive genetic effects. This indicates that the polynomial orders used to model the permanent environmental effect should be higher than those used to model the additive genetic effect (El Faro et al., 2008). These results are similar to those found by Brotherstone et al. (2000) for the Hosltein breed, and by El Faro et al. (2008) studying Caracu cows.
For the additive genetic effect -obtained with the Leg.6.6.6 (Table 6), Leg.5.6.6 and Leg.4.6.6 models -, the last eigenvalue of the random regression coefficient matrix was equal to zero, indicating that the addition of these parameters (referring to K a ) did not contribute to the total variation of this effect. This result suggests that a third-order polynomial is sufficient to model the additive genetic effect for the data in the present study, as also observed in other studies (Olori et al., 1999;Kettunen et al., 2000;Takma & Akbas, 2009). The eigenvalue magnitudes of the random regression coefficients for the additive genetic effect obtained in the three models were near to each other, with the first eigenvalue representing more than 75% of the total variation. For the permanent environmental effect, the sum of the three eigenvalues, obtained with the three RRMs, was also similar, representing 96% of the total variation of this effect.
The eigenfunctions were estimated from eigenvectors of the random regression coefficient matrix for the additive genetic effect. The Leg.3.5.6 had the same pattern for the entire period as the Leg.3.6.6. The first eigenfunction was positive for the entire lactation period, tending to decline slightly at the end of the period (Druet et al., 2003). Olori et al. (1999), Araújo et al. (2006) andEl Faro et al. (2008) also reported that the first eigenfunction was positive and constant during the entire lactation period, indicating Table 5. Permanent environmental correlation estimates between monthly milk yields obtained with the Leg.3.5.6, Leg.3.6.6 and Leg.6.6.6 random regression models. that practically the same genetic factors influence the genetic variance throughout the lactation period. The second eigenfunction was negative until the fifth month and positive thereafter, and the corresponding eigenvalue explained 21% of the total variation. This contrast indicates that the eingenfunction is possibly related to the persistence of lactation (Olori et al., 1999). The rank correlation between the sires breeding values predicted by the Leg.3.5.6 and Leg.3.6.6 was very high (99.88%), while by Leg.3.5.6, Leg.3.6.6, and Leg.6.6.6 values were around 90%. This result suggests that sires would be ranked practically in the same order if Leg.3.5.6 or Leg.3.6.6 were applied for genetic evaluations. However, despite correlations are also high by Leg.6.6.6, they indicated that there are some changes in the rank between bulls in relation to other evaluated models. This small difference is possibly due to overfitting of the genetic effect by this model.

Conclusions
1. It is necessary to consider the heterogeneity of the residual variances to fit test-day milk yield curves for the Guzerat breed.
2. The random regression model using a second-order Legendre polynomial for the additive genetic effect, and a fifth-order polynomial for the permanent environmental effect, with six residual variance classes, is the most suitable to fit the data. Table 6. Estimates of variances (diagonal), covariance (below the diagonal), and correlations (above the diagonal) between random regression coefficients and eigenvalues (λ), and percentage explanation (%) of the coefficient matrix obtained by Leg.3.5.6, Leg.3.6.6, and Leg.6.6.6 random regression models. 3. The random regression model using a second-order Legendre polynomial, for the genetic effect, and a fourth-order one, for the permanent environmental effect is sufficient to obtain similar results from the model with the fifth-order polynomial for the permanent environmental effect.