Editing and modeling of milk production data for genetic evaluation of Murrah buffaloes

The objective of this work was to assess the effect of editing and modeling of milk production data for genetic evaluation of Murrah buffaloes. Six strategies for evaluating milk production were analyzed: observed milk production (OMP); adjustment of milk production data to 305 (MP305) and 270 (MP270) days of lactation; removal of the 5 (MP5%) and 10% (MP10%) shortest lactation periods; and milk production along the lactation period as linear covariate (MPCO). Genetic parameters were estimated using the Bayesian inference, with heritability estimates of 0.19 to 0.23 and repeatability estimates of 0.35 to 0.36. Sires classified by OMP were high correlated to those classified by the other models, however, correlations to MP270, MP305 and MPCO decreased when considering only the best 20% sires. OMP showed greater differences in absolute mean deviations when compared with MPCO, MP270 and MP305. The strategies of analysis had similar heritabilities and stabilities. However, changes in the ranking of sires with better classifications, due to overestimation of genetic values, as occurred in the models MP305, MP270 and MPCO, may lead to a decrease in the genetic progress of the herd.


Introduction
The number of buffaloes (Bubalus bubalis) have been increasing in the Brazilian livestock due to their rusticity, longevity, high adaptive capacity to adverse conditions (Joele et al., 2013;Barros et al., 2016;Santos et al., 2016) and resistance to diseases and endo and ectoparasites (Hurtago-Lugo et al., 2013).
Murrah is the main buffalo genetic group for milk production in Brazil, which has average milk production of 1,496 to 2,130 kg per lactation (Marcondes, 2011;Tonhati et al., 2013).However, lactation periods of buffalo cows have large amplitude, varying from 150 to 301 days (Tonhati et al., 2007;Baldi et al., 2011;Marcondes, 2011;Malhado et al., 2013;Rangel et al., 2014), reinforcing the importance of assessing strategies for genetic evaluations of dairy buffaloes, improving predictions of genetic values of sires.
The adjustment of milk production data to 305 days of lactation is commonly used for genetic evaluations of Brazilian dairy cattle (Costa et al., 2012;Panetto et al., 2015;Silva et al., 2015).Only three genetic evaluation summaries of buffaloes were published in Brazil and in the first two, milk productions were adjusted to 270 and 305 days of lactation (Ramos, 2001;Ramos et al., 2004).This adjustment was also used by Rassi et al. (2009) and Malhado et al. (2007) to correlate partial and total milk production and estimate genetic parameters and trends, respectively.The adjustment of milk production data to 305 days of lactation was used in the third summary (ABCB, 2006).Other studies estimated genetic parameters and values in buffaloes using milk production as a covariate (Tonhati et al., 2007;Rodrigues et al., 2010;Baldi et al., 2011).
However, no studies assessing effects of editing and modeling of milk production for estimating genetic parameters and values of buffaloes are found in the literature.Moreover, several authors have reported the effect of data editing in genetic evaluations (Cardoso et al., 2009;Facó et al., 2009;Malhado et al., 2013).
The objective of this work was to assess the effect of editing and modeling of milk production data for genetic evaluation of Murrah buffaloes, to assist in genetic researches and evaluations.

Material and Methods
The experiment was performed in 2015 with lactation data of 2,952 buffalo cows of the Murrah breed from the Promebul Buffalo Breeding Program.Ten herds were evaluated, which were part of ten farms in the States of Bahia, São Paulo, Ceará, Pará, Minas Gerais, Rio Grande do Norte and Paraná.The mean and standard deviation of total milk production and lactation period were 1649.99±659.54kg and 271.34±46.59days, respectively.
Contemporary groups were divided by year of birth (21), calving season (May to July, August to October, November to January and February to April) and herd (10).The strategies proposed to evaluate milk production of buffalo cows were: observed milk production (OMP) without adjustment, used as comparative parameter to data with editing; adjustment of milk production data to 305 (MP305) and 270 (MP270) days of lactation, performed by interpolation; removal of the 5 (MP5%) and 10% (MP10%) shortest lactation periods, delimited by quartiles at 5 (199 days) and 10% (217 days), i.e., exclusion of lactations shorter than 199 (5%, representing 152 animals) and 217 (10%, representing 302 animals) days; and milk production along the lactation period as linear covariate (MPCO).
The statistical model used to describe milk production was Y X Za Wpe e PL = + + + β wherein Y is the vector of observation of the variable dependent on milk production (OMP, MP305, MP270, MP5%, MP10% and MPCO); β is the vector of fixed effects (contemporary groups, calving order and number of milking), β MPCO is the covariant lactation period (linear) for the model MPCO, related to Y by the incidence matrix X;  is the vector of random effect of additive genetic value of the animal, related to Y by the incidence matrix Z; pe is the vector of random permanent environmental effects, related to Y by the incidence matrix W; and e is the vector of random error effects.
The statistical models were determined and the densities of the variance components were obtained by samples generated by the Markov chain Monte Carlo (MCMC), using the Gibbs sampler in the program GIBBS3F90 (Misztal, 2012).Chains of 220,000 or 500,000 iterations were used, with discard (burn-in) of 20,000 or 50,000 samples and a rescue interval (thin) of 50 samples.The convergence diagnosis was performed following the Geweke method, using the BOA (Bayesiam Output Analysis) package of the R program (R Core Team, 2008).
The models were compared by using the Cross Validation technique, Akaike information criterion (AIC) and Bayesian information criterion (BIC).

Spearman correlations between the bull classifications
Pesq. agropec.bras., Brasília, v.52, n.12, p.1261-1267, dez.2017 DOI: 10.1590/S0100-204X2017001200015 were performed to compared the estimates of genetic values of the models, using the seven models, in the R program.First, all sires were subjected to these analyzes, then only the best sires (20%) with at least five progenies with milk production, and finally the eight best sires, which were used to calculate the differences of the absolute mean deviations (AMD) of the adjusted models in relation to OMP, using the model: where in âi is the genetic value estimated by OMP, âj is the genetic value estimated by the models (MP270, MP305, MP5%, MP10% and MPCO), and n is the the sample size.The lower the estimated value of the AMD, the better the adjustment.
Lower values by the Akaike information criterion (AIC) and Bayesian information criterion (BIC) denote better fitting of the model for genetic evaluations (Aliloo et al., 2014).Thus, the best models were MP10%, MP5%, MP270, MPCO, MP305 and OMP (Table 1).However, the models MP5% and MP10% may have been favored by data exclusions with the quartiles values at 5 (lactation periods shorter than 199 days) and 10% (lactation periods shorter than 217 days).
Discarding of information by data editing has been contested, since it may decrease genetic variability in population evaluations (Cunha & Melo, 2012).Therefore, the exclusion of short lactations can bias estimates, affecting the actual variability of the characteristic and genetic differences between animals (Madalena et al., 1992).However, according to Facó et al. (2009) elimination of short lactations does not decrease genetic variability, but contribute to a substantial decrease of residual variance.
The heritability coefficients of the models ranged from 0.19 to 0.23, denoting a small variation in the six models evaluated (Table 2).The models showed similar repeatability (0.35 to 0.36).According to the interval of credibility, the estimates of the models had no significant differences in heritability and repeatability.
The heritabilities of the models denote possibility of genetic gain for milk production through selection.These results confirm those reported in other studies that found heritabilities ranging from 0.20 to 0.25 (Malhado et al., 2007;Rodrigues et al., 2010;Tonhati et al., 2013).The repeatability of the models was low, preventing to predict future productions in the first lactations.Similar or higher estimates of repeatability were found by Ramos et al. (2013)  The heritabilities of the models MP270 (0.22) and MP305 (0.23) were similar to those found in the literature for buffaloes with adjustments for 270 days (Malhado et al., 2007;Rodrigues et al., 2010) and for 305 days (Aspilcueta-Borquis et al., 2010).
The lowest estimates of heritabilities of the models MP10% (0.19) and MP5% (0.20) were due to the exclusions of lactation data by genetic reasons.The variance due to permanent environmental effects decreased when excluding 10% of the short lactations, also denoting that environmental variations can permanently affect milk production of buffalo cows.Baldi et al. (2011) evaluated milk production of Murrah buffaloes with exclusions of lactations shorter than 90 and 150 days and found higher variance of permanent effect when using exclusions.Madalena et al. (1992) recommended not to exclude short lactations, avoiding decreasing variance and biasing estimates, (1) OMP, observed milk production; MP305, adjustment of milk production data to 305; MP270, adjustment of milk production data to 270 days; MP5%, removal of the 5% lower milk productions; MP10%, removal of the 10% lower milk productions; and MPCO, milk production along the lactation period as linear covariate.
The estimate of heritability of the model MPCO was 0.23, similar to the MP305.Other authors found no decreased variability with this covariate, but a decreased heritability compared with models by multiplicative correction factors and adjusted to 270 and 305 days of lactation (Tonhati et al., 2007;Baldi et al., 2011).
Correlations of bull classification by OMP with those by the other models were high and significant (Table 3).This result denotes high maintenance of classification rankings in relation to OMP.However, correlations of OMP with MP305, MP270 and MPCO decreased when considered only the classification of the 20% best sires (Table 3).This result may lead to inappropriate choices of sires for breeding, especially due to the overestimation of genetic values by the models MP305, MP270 and MPCO.The models MP5% and MP10% had slightly increased correlations, which was expected, since the exclusion of the 5% or 10% shortest lactation periods can alter results of the worst sires, with little or no effect on the best sires.
Sires classified by OMP were similar to those classified by the other models for the two best sires (Table 4).The greatest variations in the classification were observed in the subsequent positions, especially in the models MPCO, MP305 and MP270, confirming the greater differences of the absolute mean deviations (AMD) of the eight best sires, with differences of 45.37 (MPCO), 48.62 (MP270), 34.75 (MP305), 31.50 (MP5%) and 28.00 (MP10%) in relation to OMP.
2. Significant changes in the genetic values of the Murrah buffaloes due to the data editing and modeling cause significant changes in the classification of the best sires, since the genetic values are overestimated or underestimated, which can lead to an inappropriate choice of breeding sires, compromising the genetic progress.

Table 1 .
Average estimates by the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC).

Table 2 .
Estimates of the covariance components, heritability, repeatability and descriptive analyzes of milk production of Murrah buffaloes through six statistical models.
(1) OMP, observed milk production; MP305, adjustment of milk production data to 305; MP270, adjustment of milk production data to 270 days; MP5%, removal of the 5% lower milk productions; MP10%, removal of the 10% lower milk productions; and MPCO, milk production along the lactation period as linear covariate. (2)C, interval of credibility at 95% for the components of variance; and SE, standard error.(