Genetic parameters for test day milk yield of first lactation Holstein cows estimated by random regression using Legendre polynomials

Data comprising 263,390 test-day (TD) records of 32,448 first parity cows calving in 467 herds between 1991 and 2001 from the Brazilian Holstein Association were used to estimate genetic and permanent environmental variance components in a random regression animal model using Legendre polynomials (LP) of order three to five by REML. Residual variance was assumed to be constant in all or in some classes of lactation periods for each LP. Estimates of genetic and permanent environmental variances did not show any trend due to the increase in the LP order. Residual variance decreased as the order of LP increased when it was assumed constant, and it was highest at the beginning of lactation and relatively constant in mid lactation when assumed to vary between classes. The range for the estimates of heritability (0.27 0.42) was similar for all models and was higher in mid lactation. There were only slight differences between the models in both genetic and permanent environmental correlations. Genetic correlations decreased for near unity between adjacent days to values as low as 0.24 between early and late lactation. A five parameter LP to model both genetic and permanent environmental effects and assuming a homogeneous residual variance would be a parsimonious option to fit TD yields of Holstein cows in Brazil.


Introduction
The use of test day (TD) measurements instead of 305 days lactation records has been stimulated by the possibility to improve breeding value estimation of dairy cattle (Jensen, 2001).Different approaches have been used for the analysis of TD records.The simplest approach considers TD records as repeated measures of the same trait assuming a genetic correlation of unity between yields along lactation.Another approach considers TD yields as different traits and a multivariate model is used for the analysis.An intermediate approach, more refined than the repeatability model, less parameterized and probably computationally less expensive than the multivariate model, is the random regression model (Schaeffer & Dekkers, 1994).An autoregressive TD model (AR) is an alternative approach by assuming that test day yield is the product of the expression of the same set of genes throughout the productive life of the female (Carvalheira et al., 2002).
The random regression (RR) approach allows to fit sub models for adjusting the lactation curve, assumes a structure for genetic and environmental variation specific to individual TD yields and variable correlation between TD yields (Jamrozik & Schaeffer, 1997;Meyer, 1998b;Swalve, 2000).Since Kirkpatrick et al. (1994) demonstrated the use of orthogonal Legendre Polynomials (LP) in modeling the covariance structure of TD records of dairy cows, RR have been used to model milk production along lactation (Olori et al., 1999;Brotherstone et al., 2000;Pool et al., 2000).
The choice of sub model for fitting additive genetic and permanent environment effects is the focus in finding an optimal RR model.Orthogonal polynomials are most appropriate than parametric lactating curve functions for the covariates in RRM.Orthogonal polynomials of time have much lower correlations among the coefficients and provide estimates of the covariance matrices that tend do be more robust over different data sets (Schaeffer, 2004).
The use of TD records greatly increases the amount of data to be analyzed, thus requiring the selection of a more parsimonious model with respect to the number of parameters to be estimated.Olori et al. (1999) concluded that critical issues in fitting a random regression model include the order of the polynomial used to model the lactation curve at both the fixed and random level, and the grouping of observations into residual variance subclasses.
This study aimed to investigate the use of LP for modeling the lactation curve of Holstein cows in Brazil and to determine the most appropriate genetic covariance structure among daily milk yields required for genetic evaluation.

Material and Methods
Data provided by ABCBRH (Brazilian Holstein Association) comprised 263,930 milk yield TD records of first lactation from 32,449 cows calving in 467 herds between 1991 and 2001.TD yields from 5 to 305 days of lactation, time interval between successive tests less than 45 days and age of calving between 18 and 48 months were required.Contemporary groups were defined as herd-year-test month (HTM) and were required to have at least four records.Average and standard deviation for test day milk yield were 22.53 and 6.87 kg, respectively.Cows with records were daughters of 1,955 sires.The pedigree file including parental identifications defined the relationship matrix A.
Variance components were estimated by animal model REML using the "DXMRR" program (Meyer, 1998c).Nine analyses were performed using orthogonal Legendre polynomials (LP) of order 3, 4 and 5 as sub models to fit additive genetic and permanent environment effects.Two methods of accounting for residual effect (measurement error = ME) were analyzed for each RR model.First, it was assumed that the residual variance was constant (homogeneous) throughout the lactation or ME = 1.Alternatively the residual variance was assumed to be constant for TD records within, but different (heterogeneous) between the followings groups of lactation period: [5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20][21][22][23][24][25][26][27][28][29][30] for ME = 29.The models were designated by LPmMEr, where m is the order of the additive genetic and permanent environment effects and r is the number of ME classes.Then, LP5ME1 is a model with LP of order 5 for additive genetic and permanent environment effects and one ME class.
The random regression equation, assuming the same sub model to fit fixed, genetic and permanent environment effects, was: where Y ijzl is the l th observation on animal z, at days in milk (DIM) t, age at calving X of animal z, in months, as both linear and quadratic covariate (p=1,2), in season j and HTM i; β m are the fixed regressions coefficients for an average population curve; α zm the random genetic and γ zm the random environmental regression coefficients for animal z, respectively, m the order of the polynomial fit, Φ m (t) the m-th orthogonal polynomial of DIM t (standardized in the range -1 to +1 representing days 5 to 305), f and k are respectively the number of coefficients of the fixed and random effects of the polynomial and (e ijz ) r is the temporary measurement error associated with the specific test day record, where r is the number r of ME classes (r=1, 4 or 29).
Details of the log likelihood, assumptions and estimation of variance components modeled with orthogonal polynomial regressions have been presented (Meyer & Hill, 1997;Meyer, 1998a).The convergence criterion was set to 10 -4 .Fit of different models was compared by examining estimated variances, maximum likelihood and Akaike Information Criterion (Meyer, 2001) for each analysis.Variance components and heritability were also compared with estimates obtained using a repeatability model and bivariate analyses of TD milk yields (Melo et al., 2005).

Results and Discussion
Values of Log-likelihood functions, number of parameters and Akaike information criterion (AIC) obtained from fitting LP assuming both homogeneous and heterogeneous residual variances are shown in Table 1.Log-likelihood functions increased slightly with increasing order of polynomials.However differences between loglikelihood functions due to increasing number of ME within each polynomial fit decreased from 391.93 to 81.94 as the order of LP increased.In terms of AIC, the model with the best fit was the polynomial of order five with 29 ME classes (Table 1).
Estimates of residual variance decreased as the order of the model increased and were estimated as 5.41, 4.79 and 4.41 kg 2 respectively for orders 3 to 5 when it was assumed constant during lactation.Generally residual variance was larger in the beginning and decreased along lactation when assumed to vary between classes of lactation period or ME=4 (Table 2).
Looking at the heterogeneous residual variances models for LP5, differences were not large taking into account the expressive increase in number of parameters.Also, the range of residual variance (RV) estimates along lactation did not differ between ME4 and ME29 in the first half of the lactation period (Figure 1).In the second half, variation in RV was not continuous for ME29.RV estimates for ME4 were in the range (3.93-4.79) of values observed for ME29.According to Lopez Romero et al (2003) the heterogeneity of RV is related to the lactation stage and it is larger at the extremes of lactation.This is likely due a set of no specified factors in the model equation (days open, pregnancy status, body condition at calving, etc.) that make the temporary ME larger and highly variable at the beginning and at the end of the lactation.The inclusion of all these effects in the model equation might be difficult, mainly because the lack of information.
The RV estimates in this study were significantly smaller than that obtained by fitting a repeatability animal model for TD yields (RV=8.55kg 2 ) or than those from fitting individual TD yields with univariate models (between 17.86 and 15.57kg 2 ) as reported by Melo et al. (2005).A summary of the variance components estimated by different models is shown in Table 3. Estimates of the different components of variance depended on the lactation stage and the order of the LP fitted, but differences were not large.Except for the extremes of the lactation, estimates of genetic variances were smallest for LP3ME4 (Figure 2).Similarly, estimates of permanent environmental variances were smaller for LP3 than for the other models.Similar values and trends were observed for LP4 and LP5.Small differences were observed at the beginning and end of the lactation period.Meyer (1999) observed that some problems may occur with the fit of random regressions at the extremes of the ages in the data.In part that can be explained by small numbers of records and sampling variance in partitioning of total variance.Furthermore, this is likely to be due to the large effect that the values furthest from the mean have in a regression analysis.
Genetic variances from this study were slightly larger than those obtained by Melo et al. (2005).Their reported estimates were 7.21 kg 2 for a repeatability TD model and ranged between 4.8 and 7.88 with univariate models fitting TD milk yields.Estimates with TD in mid lactation were the most similar between these studies.Except for higher estimates at the extremes, the same trend was observed in both studies for phenotypic variances.
Similar estimates of genetic and permanent environmental variances were obtained for LP5 for models ME29, ME4 and ME1.Olori et al. (1999) found that different models to fit RV resulted in different estimates but variation on genetic variances was small.They observed that accurate identification of lactation stages with similar error variance (into the same ME classes) was more important than the number of ME classes specified in the RR model.Variance components estimated by fitting the model LP5ME1 are shown in Figure 3.
No specific trend was observed for estimates of covariance components and correlation for random regression coefficients.Genetic correlation estimates were in general not large and ranged from -0.54 (a 1 ,a 2 ) to 0.34 (a 3 ,a 4 ) for LP5ME1 (Table 4).All models gave similar range for the estimates of heritability (0.27-0.42), which were higher about mid lactation.Estimates for models LP3-5 and ME1 are on Figure 4 and for LP5 and ME1-29 are on Figure 5. Heritability estimates along the lactation trajectory showed similar shapes, but were less extreme at the beginning and end of lactation, because of larger permanent environmental variances.These trends were similar to results of Pool et al. (2000), but contrary to those reported by Olori et al. (1999) and Brotherstone et al. (2000).Overall LP heritabilities were larger than the estimates of individual TD yields from univariate models reported by Melo et al. (2005), which varied from 0.23 to 0.33.This is due to larger values for GV and smaller values for RV obtained by the LP models fitted in this study.
Estimates of genetic and permanent environmental covariances did not show any trend due to increase in the order of LP.The pattern of genetic correlation estimates between TD was consistent over all models.Estimates for LP5ME1 are given in Table 5.The genetic effects at the beginning of the lactation were less correlated than those at the end of the lactation.The genetic effects for the TD yields in the middle of lactation were highly correlated with correlations higher or equal 0.90 between the genetic effect at DIM 165 and genetic effects of a large part of the lactation (from 125 to 285 DIM).Genetic correlation was equal to 0.25 for the extreme parts of the lactation.
Permanent environmental correlation estimates followed the same pattern of genetic correlation estimates, declining from near unity between adjacent yields to values as low as 0.11 between the beginning and the end of lactation.A graphic illustration of genetic and permanent environmental correlations between test day milk yields for LP5ME1are shown in Figure 6.
Genetic correlation estimates are in agreement with those reported by Brotherstone et al. (2000) and Olori et al. (1999).Jamrozik & Schaeffer (1997), Kettunen et al. (2000) and Rekaya et al. (1999) reported negative correlation estimates for the extreme parts of the lactation.López-Romero & Carabaño (2003) investigated 22 alternative RR models using Holstein data in Spain.Differences in variances among models were more mostly observed at the extremes of lactation.All the models assumed homogeneous RV along lactation.The model of choice was the most complex one, a LP of order six.However, they pointed out that submodels with a lower order polynomial for the genetic than for the permanent environmental component would be sufficient to account for the variability of these effects.In addition, in a following study, Lopez-Romero et al. (2003) observed that increasing the order of regression of permanent environmental effect resulted in correction for RV.In conclusion, the assumption of homogeneity of RV was the most plausible specification when the number of random regression coefficients was set to five.Similar results regarding the higher order of LP for the permanent environmental than for the genetic effect led Liu et al. (2006) to conclude that the currently Legendre polynomial of order five (for both additive genetic and permanent environment effects), after no consensus on the best model among different evaluation criteria, was the optimal model for multiple trait genetic evaluation for production of Holstein cattle in Canada.
Random regression models have been widely studied and evaluated for genetic evaluation at national level in many countries.RR models have the advantage of flexibility to account for the environmental and genetic components of the shape of the lactation curve.Currently, eleven countries are using TD records in their genetic evaluation systems for production in dairy cattle.Among them, eight countries use RR models (Interbull, 2006).
Genetic evaluations for production traits of Holstein cattle in Brazil are based on first lactation records adjusted to 305 days (Costa et al., 2006).Results from this and the previous study of Melo et al. (2005) suggest TD records may replace with advantages lactation data for genetic evaluations of milk yield.In addition RR models are better models than the repeatability model to fit test day milk yields of Holstein cattle in Brazil.Also it would be an opportunity to implement routine evaluations for persistency which may have greater economic value under subtropical conditions.Parameters estimated in RR test day models may be used to calculate genetic measures of persistency (Jakobsen et al., 2002;Cobuci et al., 2004).

Conclusions
The orthogonal Legendre Polynomial of order five to model animal genetic and permanent environmental effects best fitted the data.Residual variances were not homogeneous, but no significant differences were observed for genetic and permanent environmental variances when fitting different classes of ME across lactation.
Results from this study showed that using a five parameter LP to model both genetic and permanent environmental effects and assuming a homogeneous residual variance would be a parsimonious option to fit TD milk yields.
Further research is still needed to compare ranking of animals and expected genetic gain to decide whether replacing 305d records by TD records for breeding value estimation for production traits of Holstein cows in Brazil.

Figure 2 -
Figure 2 -Genetic variance estimates of test-day milk yields along lactation for LP models.

Figure 4 -
Figure 4 -Estimates of heritability of test day milk yield obtained by models LP3ME1, LP4ME1 e LP5ME1.

Figure 6 -
Figure 6 -Graphical illustration of additive genetic (above) and permanent environmental (bellow) correlation estimates for test day milk yields along lactation with model LP5ME1.

Table 3 -
Additive genetic, permanent environmental and phenotypic variances for daily milk yield at different days in milk (DIM) using LP models