Reliability of breeding values between random regression and 305-day lactation models

The objective of this work was to verify the gain in reliability of estimated breeding values (EBVs), when random regression models are applied instead of conventional 305-day lactation models, using fat and protein yield records of Brazilian Holstein cattle for future genetic evaluations. Data set contained 262,426 test-day fat and protein yield records, and 30,228 fat and protein lactation records at 305 days from first lactation. Single trait random regression models using Legendre polynomials and single trait lactation models were applied. Heritability for 305-day yield from lactation models was 0.24 (fat) and 0.17 (protein), and from random regression models was 0.20 (fat) and 0.21 (protein). Spearman correlations of EBVs, between lactation models and random regression models, for 305-day yield, ranged from 0.86 to 0.97 and 0.86 to 0.98 (bulls), and from 0.80 to 0.89 and 0.81 to 0.86 (cows), for fat and protein, respectively. Average increase in reliability of EBVs for 305-day yield of bulls ranged from 2 to 16% (fat) and from 4 to 26% (protein), and average reliability of cows ranged from 24 to 38% (fat and protein), which is higher than in the lactation models. Random regression models using Legendre polynomials will improve genetic evaluations of Brazilian Holstein cattle due to the reliability increase of EBVs, in comparison with 305-day lactation models.


Introduction
The breeding objectives for Holstein cattle are defined by a list of economically relevant traits, namely milk volume, fat, and protein yields, for instance (Banga et al., 2014).In Brazil, genetic evaluations of Holstein cattle for these yields have been carried out using a 305-day lactation model (Ferreira et al., 2003;Costa et al., 2009;Biassus et al., 2011).Alternatively, many other approaches, as repeatability, autoregressive or random regression models (Melo et al., 2007;Costa et al., 2008Costa et al., , 2009;;Bignardi et al., 2011) have been proposed using the test-day records directly in test-day models (TDM) instead of lactation models (Jensen, 2001).
The main advantages of the alternative approaches based on test-day models are that they permit to model the shape of the lactation curve (Schaeffer et al., 2000), besides accounting for environmental factors that affect test-day records of cows at different stages of lactation with more accuracy (Jensen, 2001).Among test-day models, the random regression models have been proposed for genetic evaluations in the literature (Kirkpatrick et al., 1994;DeGroot et al., 2007;Bignardi et al., 2009).In fact, Germany, Canada, United Kingdom and Italy have already adopted random regression models in their national genetic evaluations, using Legendre polynomials of third, fourth or fifth orders (Muir et al., 2007;Yamazaki et al., 2013).Costa et al. (2008), Biassus et al. (2011), andCobuci et al. (2011) studied the use of random regression models with Legendre polynomials, in order to determine the best order for genetic evaluation of Holstein cattle in Brazil, and to substitute the current 305-day lactation model.However, there are few studies on milk and its components, as fat and protein, in tropical countries as Brazil, using random regression models in the genetic evaluations of Holstein breed (Costa et al., 2008;Biassus et al., 2010).In general, the random regression models using Legendre polynomials of fourth, fifth and sixth orders have been indicated as a good option for conducting genetic evaluations of Holstein cattle; however, there are no comparisons between random regression and lactation models as for the advantages in the gain of reliability of breeding values from studies on Brazilian Holsteins (Costa et al., 2008;Biassus et al., 2010Biassus et al., , 2011)).
The objective of this work was to verify the gain in reliability of estimated breeding values (EBVs), when random regression models are applied instead of conventional 305-day lactation models, using fat and protein yield records of Brazilian Holstein cattle to be used in future genetic evaluations.

Materials and Methods
Data consisted of fat and protein milk yield collected by the technicians of the milk control and genealogy service of the Associação Brasileira de Criadores de Bovinos da Raça Holandesa (ABCBRH Brazilian association of Holstein breeders), and its state affiliates between 1990 and 2011.The data set comprised test-day milk yield and 305-day milk yield records.At first, pedigree data was checked for inconsistencies.A minimum of six test-day records, obtained between six and 305 days in milk, were edited for cows aged 18 to 48 months during the first lactation.Abnormal yield values or outliers were checked by graphical techniques as normal probability plots and boxplots, as well as by median, mean, mode, skewness and kurtosis values.Test-day records were removed, if fat and protein yields were out of the range from 258.4 to 1,510 g, and from 312.0 to 1,314.8 g, respectively.The records of 305-day fat and protein yields were deleted, if they were out of the range from 102 to 392 kg, and from 106 to 349 kg, respectively.
Four classes of age at calving (18 to 25, 26 to 27, 28 to 29, and 30 to 48 months), and four calving seasons (January through March, April through June, July through September and October through December) were combined to produce 16 age-season classes.Contemporary groups of herd-year-season of calving (305-day yield records) and herd-year-month of test (test-day records) which did not have at least four records and progeny of bulls, with at least two daughters in two different herds, were eliminated.
After editions in dataset, 262,426 test-day fat and test-day protein yield records were used to apply the random regression models, and 30,228 lactation records were used to apply the 305-day lactation models.Milk traits considered in the present analyses were 305-day fat yield (305F) and 305-day protein yield (305P).
Test-day fat and test-day protein yield records were used in single trait random regression animal model, named as RRF4 and RRF5 for fat yield, and RRP4 and RRP5 for protein yield, fitted by fourth and fifth orders of Legendre polynomials, respectively.The random regression model used to estimate genetic parameters and EBVs for 305F and 305P was as follows: classes; HYM i is the fixed effect of herd-year-month of testing; u jk and pe jk are the k th random regression coefficients that describe, respectively, the additive genetic effects and the permanent environmental effects on cow j; φ k (d t ) are the Legendre polynomials for the test-day record of cow j, made on day t, in which k is the n th parameter of coefficient of Legendre polynomials for the 4 th or 5 th order; and e ijkl is the random residual.Orthogonal Legendre polynomials were calculated as showed by Kirkpatrick et al. (1994).
It was assumed that: in which: G and P are covariance matrices of the random regression coefficients; R=Iσ 2 e is a diagonal matrix (residual); ⊗ is a Kronecker product between matrices; and I is an identity matrix.
Records of 305-day fat and protein yields were used in single trait 305-day lactation models, named as LMF (fat) and LMP (protein), which included effects of herd-year-season of calving and age at calving (linear covariable) as fixed effects, and additive genetic animal and residual as random effects.The model used to estimate genetic parameters and EBVs for 305F and 305P was Y ij = HYS i + b n χ ij + a ij + e ij , in which: Y ij is the 305-day fat or protein yield record of animal j, in herd-year-season of calving i; HYS i is the effect of herd-year-season of calving i; b n is the the linear regression coefficient for 305-day yield, as a function of age at calving (linear covariable); χ ij is the age of cow at calving, in months; a ij is the additive genetic effect of animal j in herd-year-season of calving i; and e ij is the residual effect.
The analyses were performed by REMLF90 software (Misztal et al., 2014), by the method of restricted maximum likelihood (REML), in order to estimate the solutions and the covariance matrices of random regression coefficients.
The estimated breeding values (EBVs) of random regression models were obtained by multiplying covariance matrices and vectors containing covariates specific for each animal.The EBV of animal i for test-day t was calculated by: ) of the estimates of additive genetic random regression coefficients specific to the animal i, and z' t is a vector (k a ×1) of Legendre polynomial coefficients evaluated at day t, which may be illustrated for a fifth-order Legendre polynomial: The sum of EBVs at 305 days for animal i was obtained by summing the EBVs from day 6 to 305: The standard error prediction (SEP) of estimated breeding values (EBVs) for 305F and 305P was supplied by REMLF90 software as the square root of the prediction error variance (PEV) (Misztal et al., 2014).Reliability of EBVs were derived from SEP as r 2 = 1 -(SEP 2 /σ 2 a ), in which: σ 2 a was the additive genetic variance for the trait; and r 2 is the correlation between the true breeding value and estimated breeding values (Misztal & Wiggans, 1988).
The models were compared using the values of residual variance (RV), the Akaike's information criterion AIC = -2logL + 2p, and the Schwarz's Bayesian information criterion BIC = 2logL + p log (λ), where p is the number of parameters in the model.Using REML, λ = N -r(X), in which: N is the number of test-day records; r(X) is the rank of the fixed effects incidence matrix; and 2logL is provided as default by REMLF90.The best model is indicated by the lowest values of AIC and BIC.A log-likelihood ratio test (LRT) was applied to test the significant differences between models with different orders of LP.The null hypothesis (H 0 ) implied that restricted likelihood functions of the models did not differ when the number of parameters increased.The calculated value of LRT was compared to the chi-square Table (χ 2 ) with ten degrees of freedom, at 5% probability.

Results and Discussion
AIC, BIC and 2LogL had the lowest values (highlighted) for RRF4 and RRP4 models (Table 1 Residual values showed 4% (fat) and 5% (protein) decreasing, with an increasing in the order of Legendre polynomials.LRTs between RRF4 and RRF5, and between RRP4 and RRP5 were found to be different (p<0.05), and they varied from the fourth to the fifth-order of polynomials.A reduction of the values 2LogL, AIC, and BIC, as Legendre polynomial order decreases with a significant change in the log likelihood, indicated random regression models of the fourth-order Legendre polynomials as the best fit, but residual values indicated the fifth-order as the model.In some studies, AIC, BIC, and RV indicated models with larger number of parameters in the literature (Biassus et al., 2011;Aliloo et al., 2014).The present results may be a consequence of hyper parameterization of the models.AIC and BIC criteria favor simpler models because of the penalty term.
The first eigenvalue (λ) accounted for 89.5% in RRF4 and RRF5, and for 88.7 and 89.0% in RRP4 and RRP5 of the total additive genetic covariance matrix; and the first three eigenvalues explained it by 99.99% (Table 2).For permanent environmental effects, the first eigenvalue accounted for about 70% of the total permanent environmental variance in all models, and the first four eigenvalues explained it by 98%.According to Aliloo et al. (2014), the choice of the best model is not an easy task because the use of different tests may indicate different models.Although AIC, BIC, 2LogL, and residual variances showed conflicting results in the present study, the use of different approaches may help to explain the differences between random regression models.The analysis of the eigenvalues indicates the decreasing importance of adding more parameters (Aliloo et al., 2014).The first three additive eigenvalues explained a sufficiently large proportion of the variances for the models RRF4 and RRP4.The highest values of permanent environmental eigenvalues suggested that the permanent environmental effect could be modeled with a fifth-order Legendre polynomial, in comparison to the genetic effect.Although these results suggested different Legendre polynomial orders for additive genetic and permanent environmental effects, there is not a consensus in the literature (Araújo et al., 2006;Costa et al., 2008;Biassus et al., 2011;Aliloo et al., 2014).Aliloo et al. (2014) reporting that the  RRF4 and RRF5, random regression models for fat yield fitted by fourth and fifth-order Legendre polynomials; RRP4 and RRP5, random regression models for protein yield fitted by fourth and fifth-order Legendre polynomials.
models with higher orders for genetic and permanent environmental effects were the best fit in Iranian Holsteins.However, Mohammadi & Alijani (2014), also using Holstein data in Iran, indicated lower orders of Legendre polynomials for genetic effects, and higher ones for permanent environmental effects.In Brazil, Araújo et al. (2006) and Biassus et al. (2011), who used the same orders for genetic and permanent environmental variances, suggested that the best random regression models should be fitted by at least a fourth-order Legendre polynomial, in order to estimate genetic parameters and breeding values for milk, fat, and protein yields.
For 305-day yield trait (Table 3), the heritabilities obtained from 305-day lactation models were 0.24 (LMF) and 0.17 (LMP); and random regression models showed heritability estimates of 0.21 (RRF4 and RRF5) and 0.20 (RRP4 and RRP5).The heritability estimates obtained from random regression models were very similar to those obtained from 305-day lactation models for 305-day fat and protein yields.In the literature, the estimates of heritability at 305 days, applying random regression models to analyze Holstein cattle, ranged from 0.29 to 0.41 for fat, and from 0.29 to 0.41 for protein yields (Bohmanova et al., 2008;Biassus et al., 2011;Kheirabadi & Alijani, 2014).Heritability estimates using 305-day lactation models were 0.13 for fat, and 0.12 for protein yield (Kim et al., 2009).
The heritability for test-day fat and protein yields, obtained from different random regression models, showed very similar trajectories on days in milk (Figure 1).The heritability of test-day values ranged from 0.13 to 0.23 for fat (RRF4 and RRF5), and from 0.10 to 0.23 for protein (RRP4 and RRP5).Trajectories of additive genetic variances of test-day fat and protein yields were similar between different random regression models.Additive genetic variances of test-days ranged from 4,954.4 g 2 to 7,115.6 g 2 for fat (RRF4 and RRF5), and from 2,625.8 g 2 to 4,857.3 g 2 for protein (RRP4 and RRP5).Permanent environmental variances of test-days ranged from 12,136.6 g 2 to 32,909.8g 2 for fat (RRF4 and RRF5 models), and from 9,093.8 g 2 to 19,702.0 g 2 for protein (RRP4 and RRP5).Heritability of the 305-day fat yield, obtained from 305-day lactation models, was similar to the values of heritability of test-day fat yield in mid-lactation, but it was lower than those at the extremes of lactation obtained from random regression models (RRF4 and RRF5).For protein yield, heritability of test-days, obtained from random regression models (RRP4 and RRP5), was higher in mid-lactation (90 to 270 days in milk) than that for 305-day yield estimated by 305day lactation models.Kim et al. (2009) carried out a similar comparison and found quite similar results in a population of Holstein in Korea, and they reported that random regression models showed higher heritability of test-days on days in milk than the 305-day lactation models for fat and protein yields.In previous studies with a Brazilian Holstein population from Minas Gerais state, Biassus et al. (2011) found values ranging from 0.03 to 0.21 and 0.09 to 0.33 for fat and protein yields, respectively, estimated from random regression models.Rzewuska & Strabel (2013) and Abdullahpour et al. (2013) reported very close average heritability values from 0.17 to 0.22, and 0.14 to 0.23, for fat and protein yields, respectively.Trajectories of permanent environmental variances of test-day fat and protein yields were quite constant between 30 and 240 days in milk, in all models, and coincided with the highest heritability of test-day yields in mid-lactation.The lowest values of additive genetic and heritability of test-days, and the highest values of permanent environmental variances were found at the extremes of lactation curves.That pattern was also found by Biassus et al. (2011).
Spearman correlations of EBVs of bulls for 305-day yield, between 305-day lactation models and random regression models, increased from 0.86 to 0.97 (fat) and 0.86 to 0.98 (protein) with the increase in the classes of progeny size (Table 4).Spearman correlations of EBVs of cows for 305-day yield increased from 0.83 to 0.89 (fat) and 0.81 to 0.86 (protein) for groups of cows with 6 to 10 test-days.Those results of Spearman correlations of bulls may be interpreted in two different ways.On the one hand, Spearman correlations between EBVs estimated by LMF and RRF4, and between EBVs estimated by LMP and RRP4, increased with the increase in the information amount of the progeny size of bulls.It could suggest that random regression and 305-day lactation models might have shown very similar ranks of bulls, if a large amount of information was available.On the other hand, as bulls' progeny size and number of test-days of cows decreased, there was a decrease in the Spearman correlations of EBVs of bulls and cows.In these results, it became evident the increase in the re-ranking of animals, as the amount of information decreased.The average gain in reliability of EBVs of bulls increased from 3 to 16%, when RRF4 were compared to LMF, and from 6 to 26%, when RRP4 were compared to LMP, according to the decrease in the classes of progeny size (Table 5).However, the increase in the number of test-days of cows did not show a pattern of increase or decrease in the gain of reliability of EBVs, when random regression models were compared with 305-day lactation models.For cows, the average gain in reliability of EBVs ranged between 24 and 26%, when RRF4 was compared to LMF, and between 38 and 40%, when RRP4 was compared to LMP.Although the increase in the bulls' progeny size was followed by a decrease in the gain in reliability of EBVs of bulls, random regression models estimated EBVs with more reliability than the conventional 305-day lactation models.Considering bulls with progeny size class of 200 to 399, the gain in reliability of EBVs for 305-day yield ranged from 1 to 10% (in average 3%) for fat, and from 2 to 18% for protein (in average 6%).The highest gains in reliability of EBVs of bulls were found when classes of lower progeny size were considered.Considering bulls with classes of progeny size below 200 to 399, the average percentage of gain in reliability was from 5 to 16% for fat, and it was higher for protein -between 8 and 26%.Those gains in reliability ranged from 2 to 39% for fat, and from 3 to 63% for protein, when considering the range in parentheses, instead of Pesq.agropec.bras., Brasília, v.51, n.11, p.1848-1856, nov. 2016 DOI: 10.1590/S0100-204X2016001100007 the average gain of the classes.The number of bulls decreased with the increase in the size of progeny size class, which means that some bulls have been used more intensively than others.The most relevant gain in reliability of EBVs was for bulls with lower number of progenies, which suggests that young bulls could have their EBVs estimated more precisely using a random regression model.The average percentage of gain in reliability of EBVs for 305-day yield of cows between random regression and 305-day lactation models were similar for classes of 6, 7, 8, 9 or 10 test-days in lactation.It suggested that, in the conditions of the present study, a group of cows with lower number of test-days, as 6 for instance, may show a similar average reliability and a similar average gain in reliability as those of cows with complete lactation records, which are 10 test-days in lactation.The gain in reliability of breeding values in the genetic evaluations may bring benefits to selection-based breeding programs, if a larger number of cows and bulls is considered for genetic evaluations, instead of being deleted because of low number of records.In Brazil, one of the main reasons for the low number of recorded test-days of cows has been the increased recording costs, besides the low milk prices, and the lack of financial support from the government.
EBVs for 305-day fat and protein yields by year of birth of cows and bulls showed positive trends ranging between 0.16 and 0.42 kg per year (RRF4 and LMF), and between 0.47 and 0.76 kg per year (RRP4 and LMP), from cows born from 1990 to 2008, and for bulls born from 1979 to 2001 (Table 6).All trends of EBVs by year of birth of bulls were significantly different from zero at p < 0.01, except for RRF4 model; for cows, all trends  (1) Average and standard-deviation of EBV reliability values of 305-day yield estimated from 305-day lactation models. (2)Average percentage of proportional gains of reliability compared to 305-day lactation models with range (in parenthesis).LMF, 305-day lactation model for 305-day fat yield; LMP, 305-day lactation model for 305-day protein yield; RRF4 and RRP4, random regression models fitted by fourth-order Legendre polynomial, for fat and protein yield, respectively.
were significantly different from zero at p<0.001.Genetic trend values in the literature ranged between 0.25 and 0.60 kg per year for 305-day fat yield, and around 0.45 for 305-day protein yields (Durães et al., 2001;Sawalha et al., 2005).The genetic trends in the present study accounted for approximately 0.07% (fat) and 0.18% (protein) of the average phenotypic yield, which implies that fat and protein yields have improved slightly in Brazilian Holsteins.Such genetic trends might be attributed to the emphasis on the selection for increased milk yield by the Brazilian dairy industry.Moreover, the Brazilian Holstein population structure is based on imported genetic material from the USA, Europe, and Canada (Silva et al., 2016).It is a fact that breeding programs in Brazil, Canada or Europe have different objectives of selection.

Conclusions
1.The adoption of a random regression model using Legendre polynomials will increase the reliability of estimated breeding values in the genetic evaluations.
2. A random regression model using the fourthorder Legendre polynomial is the most recommended model for genetic evaluations of fat and protein yields of Brazilian Holstein cattle.

Figure 1 .
Figure 1.Heritability (h 2 ), additive genetic (AG), and permanent environmental (PE) variances estimated from random regression models fitted by Legendre polynomials of fourth and fifth orders for fat (RRF4 and RRF5) and for protein (RRP4 and RRP5).
*Significant at 5% probability.RRF4 and RRF5, random regression models for fat yield fitted by fourth and fifth-order Legendre polynomials; RRP4 and RRP5, random regression models for protein yield fitted by fourth and fifth-order Legendre polynomials; RV, residual value; LRT, likelihood ratio test.

Table 2 .
Eigenvalues (λ i ) of the additive genetic variance and covariance matrix, and the respective percentual proportion of total variance estimated from random regression models.

Table 3 .
Estimates of heritability of 305day fat and protein yields estimated from conventional and random regression models.

Table 4 .
Spearman correlations of EBVs for 305-day fat yield between LMF and RRF4, and between LMP and RRP4 of bulls and cows, according to classes of progeny size and number of test-days during lactation.

Table 5 .
Average percentage of gain in reliability (range in parenthesis) of EBVs for 305-day yield estimated by random regression models, in comparison to 305-day lactation models.

Table 6 .
Estimates of breeding value trends (b) of bulls and cows by year of birth, standard errors (SE), and coefficients of determination (R 2 ), for cows born between 1990 and 2008, and bulls from 1979 to 2001.-day lactation models for 305-day fat yield; LMP, 305-day lactation model for 305-day protein yield; RRF4 and RRP4, random regression models fitted by fourth-order Legendre polynomial, for fat and protein yield, respectively.*,**Significant at 1 and 0.1%, respectively.