## Services on Demand

## Journal

## Article

## Indicators

- Cited by SciELO
- Access statistics

## Related links

- Cited by Google
- Similars in SciELO
- Similars in Google

## Share

## Arquivo Brasileiro de Medicina Veterinária e Zootecnia

##
*Print version* ISSN 0102-0935*On-line version* ISSN 1678-4162

### Arq. Bras. Med. Vet. Zootec. vol.61 no.4 Belo Horizonte Aug. 2009

#### https://doi.org/10.1590/S0102-09352009000400026

**ZOOTECNIA E TECNOLOGIA E INSPEÇÃO DE PRODUTOS DE ORIGEM ANIMAL**

**Genetic evaluation for large data sets by random regression models in Nellore cattle **

**Avaliação genética para grandes massas de dados por meio de modelos de regressão aleatória em gado Nelore **

**P.R.C. Nobre ^{I}; A.N. Rosa^{II}; L.O.C. Silva^{II}**

^{I}Fundapam - Embrapa Gado de Corte - Geneplus - Campo Grande, MS

^{II}Embrapa Gado de Corte - Campo Grande, MS

**ABSTRACT **

Expected progeny differences (EPD) of Nellore cattle estimated by random regression model (RRM) and multiple trait model (MTM) were compared. Genetic evaluation data included 3,819,895 records of up nine sequential weights of 963,227 animals measured at ages ranging from one day (birth weight) to 733 days. Traits considered were weights at birth, ten to 110-day old, 102 to 202-day old, 193 to 293-day old, 283 to 383-day old, 376 to 476-day old, 551 to 651-day old, and 633 to 733-day old. Seven data samples were created. Because the parameters estimates biologically were better, two of them were chosen: one with 84,426 records and another with 72,040. Records preadjusted to a fixed age were analyzed by a MTM, which included the effects of contemporary group, age of dam class, additive direct, additive maternal, and maternal permanent environment. Analyses were carried out by REML, with five traits at a time. The RRM included the effects of age of animal, contemporary group, age of dam class, additive direct, permanent environment, additive maternal, and maternal permanent environment. Different degree of Legendre polynomials were used to describe random effects. MTM estimated covariance components and genetic parameters for weight at birth and sequential weights and RRM for all ages. Due to the fact that correlation among the estimates EPD from MTM and all the tested RM were not equal to 1.0, it is not possible to recommend RRM to genetic evaluation to large data sets.

**Keywords:** beef cattle, multiple trait, random regression

**RESUMO **

Compararam-se as diferenças esperadas nas progênies (DEPs) de gado Nelore, estimadas por meio de um modelo de características múltiplas (MTM), com um modelo de regressão aleatória (RRM). Foram utilizados 3.819.895 dados de peso corporal sequenciais para a avaliação genética de 963.227 animais, coletados do nascer aos 733 dias de idade. As características consideradas foram: peso ao nascer e pesos dos 10 aos 110, dos 102 aos 202, dos 193 aos 293, dos 283 aos 383, dos 376 aos 476, dos 467 aos 567, dos 551 aos 651, e dos 633 aos 733 dias. Sete amostras foram geradas. Duas amostras resultaram em estimativas de parâmetros mais consistentes do ponto de vista biológico, sendo, portanto consideradas representativas da população em estudo. A primeira amostra constituiu-se de 84.426 medidas, e a segunda, de 72.040. Os pesos pré-ajustados para as idades fixas foram analisados por meio de um MTM, com cinco características por processamento, no qual se incluíram efeito de grupo contemporâneo, classe de idade da vaca, aditivo direto, aditivo materno e ambiente materno permanente, utilizando-se a metodologia de máxima verossimilhança restrita (REML). Diferentes graus dos polinômios de Legendre foram utilizados em um RRM, para os efeitos aleatórios. As correlações entre as DEPs estimadas por meio do modelo para características múltiplas e de regressão aleatória não foram iguais a 1,0, portanto, não se recomenda a utilização dos modelos de regressão aleatória para avaliação genética para grande massa de dados.

**Palavras-chave:** bovino de corte, características múltiplas, regressão aleatória

**INTRODUCTION **

Recently, there has been an increased interest in so-called random regression model (RRM) for traits which are recorded repeatedly per animal, such as longitudinal data. RRM are similar to multiple trait model (MTM) in that a number of correlated additive genetic effects, namely coefficients, are estimated for each individual. Estimates of genetic RR coefficients provide a complete trajectory of genetic merit and expected progeny differences (EPDs) for any point on the longitudinal scale can be obtained by evaluating the regression equations at that point (Tier and Meyer, 2004). However, more memory requirement in RRM is demanded as the number of covariates in the model is increased (Nobre et al., 2003c).

A complete MTM with the number of traits equal to the number of ages would result in a highly overparameterised analysis. As a consequence, this would be likely to impose unnecessary computational demands. RRM could be useful in beef cattle genetic evaluation because weights at any age can be used, and EPD can be estimated for any age. In contrast, MTM provide estimates only for given points (Albuquerque and Meyer, 2001).

The analyses of weights as a longitudinal trait may result in increased accuracy of evaluation by eliminating the need for preadjustment by its ability to incorporate all weights with appropriate covariances. Meyer (2002) estimated that RRM increased accuracy of EPD up to 6% using simulated data. However, actual gains with field data sets are unknown; RRM may result in lower accuracy than MTM if parameters for RRM are poor or computations are inaccurate.

Models in beef cattle may be more complicated than in dairy because of correlated direct and maternal effects. Tsuruta et al. (2001) developed a computer program that supports large data sets by using an iteration on data technique with the preconditioned conjugate gradient (PCG) algorithm. That program has sufficient memory requirements to support national genetic evaluations.

Robbins et al. (2005) indicated that longitudinal models can be implemented effectively in beef cattle growth evaluations. According to these authors, RRM give practical and more flexible evaluations, while providing a more theoretically sound alternative to the MTM with relatively small cost of implementation.

The objectives of this study were to implement the genetic evaluation of weights for a large population of beef cattle using RRM and to compare EPD from reduced RRM.

**MATERIALS AND METHODS **

Data were collected by the Brazilian Zebu Breeders Association (ABCZ) and provided by the Brazilian Agricultural Research Corporation (EMBRAPA). The data consisted of records on 963,227 Nellore animals, progeny of 15,446 sires, and 376,818 dams raised under Brazilian pasture conditions. The records were collected from 1975 to 2001.

Traits considered were weight at birth (WB), weight at ten to 110-day old (W1 or weight at 60-day old), weight at 102 to 202-day old (W2 or weight at 152-day old), weight at 193 to 293-day old (W3 or weight at 243-day old), weight at 283 to 383-day old (W4 or weight at 333-day old), weight at 376 to 476-day old (W5 or weight at 426-day old), weight at 467 to 567-day old (W6 or weight at 517-day old), weight at 551 to 651-day old (W7 or weight at 601-day old), and weight at 633 to 733-day old (W8 or weight at 683-day old).

Edits included eliminating records of animals outside the range of three standard deviations from the overall mean for each weight, and eliminating records outside of the range in age classes provided above. Table 1 summarizes characteristics of the data.

Dams in the data were 2.0 through 20 years of age at calving. Classes of age of dam were defined every two years. The season of measurement was defined every three months, i.e., October to December; January to March; April to June; and July to September.

Seven sample data sets were formed by randomly sampling herds; however, two were chosen mainly because the biological parameters were better than the others. The samples data sets were obtained from herds with more than 500 birth weight records, an average contemporary size group greater than ten within each herd and at least five records by contemporary group, and then 3.0% and 1.5% of the herds that remained were sampled from both samples, respectively. The number of animals in the pedigree file was 26,087 and 20,413 for the samples, respectively. Both samples are described in Table 2.

Two models (MTM and RRM) were used for analyses. The multiple trait model (MTM) was:

y = Xβ + Z_{1 }d + Z_{2 }m + Z_{3 }mp + e

in which y was a vector of records preadjusted to fixed age; β was a vector of fixed effects (contemporary group and age of dam class); *d* was a vector of additive direct genetic random effects of the animal; *m* was a vector of additive maternal genetic random effects; *mp* was a vector of maternal permanent environment random effects; *X* was the incidence matrix for fixed effects; Z_{1} , Z_{2} , and Z_{3} were incidence matrices for animal, maternal, and maternal permanent environmental effects, respectively; and *e* was the vector of residual random effects.

The variances and covariances were defined as follows:

In which *G _{0}* was a covariance matrix of random genetic effects;

*MP*was a covariance matrix of maternal permanent environmental random effects; R0 was a covariance matrix of random residual effects; A was the additive genetic relationship matrix;

_{0}*I*was an identity matrix whose order was the number of dams;

_{c}*I*was an identity matrix whose order was the number of animals.

_{n}(Co)variance components were estimated for five traits at a time. Parameters presented here were based on average from analyses of models that contained that particular parameter.

The random regression model (RRM) was defined as follows:

where y_{ijklm} was the observation in contemporary group *i*, age of dam class* j*, animal *k*, dam *l*, and record m; βd was the fixed regression coefficient d for age of animal; cg_{di} was the fixed regression coefficient d for contemporary group i; cad_{dj }was the fixed regression coefficient d for age of dam class j; d_{dk} and p_{dk} were random regression coefficients d for additive direct and permanent environmental effects of animal k; m_{dl} and mp_{dl} were random regression coefficients d for additive maternal and maternal permanent environmental effects of dam l; r_{dm} was the random regression coefficient d for residual effect of record m; z_{d} ,z_{di} , z_{dj} , z_{dk} , z_{dl }, and z_{dm} were Legendre polynomials; and εijklm was residual effect. The purpose of the error effect was to indirectly model heterogeneous residual variance (Van der Werf and Schaeffer, 1997); the available software did not allow modelling this directly.

The random regression model could be written in matrix notation as:

y = X*β* + Z_{1} d + Z_{2} p + Z_{3} m + Z_{4} mp + Z_{5} r + e

in which y was the vector of records; *β* was the vector of fixed regressions; d, p, m, mp, and r were vectors for additive direct genetic, permanent environment, additive maternal genetic, maternal permanent environment, and residual effects, respectively; X was the incidence matrix for fixed effects; and Z_{1}, Z_{2}, Z_{3}, Z_{4}, and Z_{5} were incidence covariate matrices for additive direct genetic, permanent environment, additive maternal genetic, maternal permanent environment, and error effects, respectively; and e was a vector of constant residual effects.

The variances and covariances were defined as follows:

In which G_{0} was an 8 x 8 covariance matrix of random regression for genetic effects; P_{0}, MP_{0}, and R_{0} were 4 x 4 covariance matrices of random regression for permanent environment, maternal permanent environment, and residual effects, respectively; was assumed constant residual variance; A was additive genetic relationship matrix; I_{k} was an identity matrix whose order is the number of animals; I_{l} was an identity matrix whose order was the number of dams; I_{m} was an identity matrix whose order was the number of records; I_{n} was the number of records.

For observation m containing trait t and with Legendre polynomials corresponding to age for trait t, the residual effect in MTM are approximately equivalent to the sum of residual, permanent environment and error effects in RRM (Nobre et al., 2003a).

For better numerical properties of mixed model equations, the coefficients can be orthogonalized and their variances diagonalized as described by Nobre et al. (2003b). In the transformation, some eigenvalues may be very close to zero. Regression coefficients corresponding to those covariables have values close to zero. Consequently, these coefficients can be dropped from the model with a negligible decrease in accuracy but at noticeable savings in computations.

In the RRM, the error was modeled as a fixed residual as described by Nobre et al. (2003b). Then, when BLUP software supports weights, as in the case of this study, the effect r can be eliminated at a considerable saving in computations.

Covariance components for MTM and RRM were estimated by program REMLF90 (Misztal, 2005).

The EPD were obtained by program BLUPF90 (Misztal, 1999), with solutions obtained by the sparse-matrix factorization package FSPAK90 (Misztal and Perez-Enciso, 1998) and by BLUP90IOD, which uses an iteration on data with the PCG solver (Tsuruta et al., 2001). The first program computed exact solutions in the absence of numerical errors, but it required much higher computing resources. The second program was iterative and computed increasingly more accurate solutions as the iteration progressed. The convergence criterion for that program was defined as the relative average squared differences between consecutive solutions; two criteria were used: 10^{-10} (called lower accuracy) and 10^{-12 } (called higher accuracy).

Initially, EPD were obtained by programs BLUPF90 via FSPAK90 and BLUP90IOD via PCG with MTM and RRM using a sample of the data as mentioned above. Due to computing limitations, the MTM was a five-trait model using W1, W2, W3, W5, and W7. The RRM used all weights available. Solutions by RRM were calculated before and after diagonalization and with lower and higher accuracy for BLUP90IOD. Subsequently, the computations were repeated for the complete data set, but only with program BLUP90IOD because the computing requirements for BLUPF90 were excessive. Correlations between solutions from various runs were separately computed for five traits and direct and maternal effects.

**RESULTS AND DISCUSSION **

Table 3 shows a summary of the mean covariance components estimated at different ages for both samples with a MTM analyses.

Table 4 shows a summary of the mean covariance components estimated at different ages for both samples with a reduced RRM analyses, which included additive direct, permanent environment, additive maternal, and maternal permanent environment effects adjusted for cubic, cubic, quadratic, and linear polynomials, respectively.

Three others models were studied. The first included cubic additive direct genetic, cubic permanent environment, cubic additive maternal genetic, and cubic maternal permanent environment polynomials effects. The second, included cubic additive direct genetic, cubic permanent environment, cubic additive maternal genetic, and quadratic maternal permanent environment polynomials effects, and the last included cubic additive direct genetic, cubic permanent environment, quadratic additive maternal genetic, and quadratic maternal permanent environment polynomials effects.

The application of higher order polynomials is undesired mainly because of excessive costs. So, that is the reason for the study of those four models. The results presented in Table 4, with a model less parameterized, was not different than the others when compared by the likelihood ratio test (Rao, 1973).

Components of parameters of growth in beef cattle include direct and maternal variances across ages, correlations among ages for the direct effect, the same correlations for the maternal effect, and correlations between the direct and maternal effects along ages. Additional parameters include variances and correlations for environmental effects and variance for the residual effect. When some records are missing, the variances associated with ages of most missing records become erratic, and all correlations fluctuate. When connections between the direct and maternal effects are weak, the correlations between the direct and maternal effects become more negative. Random regression models are more susceptible to artifacts due to data problems than multiple trait models. If a random regression model is to be used for genetic evaluation, genetic parameters estimated by random regression models may not be satisfactory.

In this study, the covariances estimates with RRM were similar to those with MTM from WB (one d) through weight at 601-day old. Parameters estimated via RRM are susceptible to large sampling errors and estimation artifacts for data points along the trajectory that have small amounts of data (Van der Werf and Schaeffer, 1997). Records were missing sequentially; all animals had records on WB; however, only 4% had records on W8. The reason may be that at 683 days, the number of records reduced the degree of freedom.

The results with RRM are in agreement with those reported by Nobre et al. (2003a, b). It has been indicated that parameter estimates obtained by fitting polynomials could be affected by sparse data and extremes of trajectories, especially at older ages.

Table 5 presents the values of eigenvalues corresponding to covariances in each effect of the RRM.

For the genetic effects, the first two eigenvalues explained 92% of the genetic variance, and the last eigenvalue was close to zero. Also, for permanent environment effects, the first two eigenvalues explained 95% of variance. For maternal permanent environment effects and residual effects, those eigenvalues explained 96 and 82% of variances, respectively. Nobre et al. (2003b) reported that small eigenvalues indicated that parameters of RRM were indeed poorly conditioned and also indicated potential of reducing the number of effects in the model.

When diagonalization was done for all effects, low correlations remained until the covariance matrices were recreated as VD*V', in which D* was like D but with small eigenvalue set to zero and convergence criterion was set to 10^{-12}. This indicate that the diagonalization and convergence criterion were essential in obtaining accurate EPD from RRM. However, when the error effect was replaced by weights and effects corresponding to very small eigenvalues were dropped from the model, the correlations remained the same.

Since that the error effect replacement by weights did not change the correlations, four models less parameterized were studied to avoid unnecessary computation demands. Estimated correlations between EPD by MTM and the reduced RRM obtained with the complete data by BLUP90IOD with convergence criterion set to 10^{-12 }are reported in Table 6.

The correlations between MTM and all RRM were not perfect (were not equal to 1.0). There is no age variability for WB; therefore, given numerically accurate solutions and functionally identical parameters, the MTM and RRM should provide identical results. Differences for WB could be due to larger numbers of fixed effects to estimate in RRM. However, Nobre et al. (2003b) dropped all but constant terms for covariables in RRM and did not change the correlations. So, assuming numerically accurate computations, differences between MTM and RRM were largely due to differences in parameter estimates used in both models.

Nobre et al. (2003c) concluded that RRM are more susceptible to artifacts due to data problems than MTM and recommended that genetic parameters estimated by RRM may have to be adjusted based on estimates from MTM, if a RRM is to be used for genetic evaluations.

The correlations for the additive direct effect were the same for all RRM; however, for the additive maternal effect were smaller for RRM^{4}. This suggested that important differences in EPD between MTM and RRM for maternal direct effects existed due to factors beyond the models. Nobre et al. (2003b) reported that the absence of some traits in MTM made a small difference to animals with a large numbers of records. Also, these authors reported that the maternal direct effect correlations were lower for sires with __>__ 100 progeny and concluded that differences in EPD between RRM and MTM existed due to factors beyond fewer traits used in MTM.

Nobre et al. (2003b) reported that estimated genetic parameters by RRM can be inaccurate for various reasons: data size, data selection, model and methodology applied. Misztal et al. (2000) and Pool and Meuwissen (2000) related that the parameters can be estimated more accurately after improvements in methodologies, making computations more reliable and less expensive.

Covariance functions of Tabapuã cattle were estimated by RRM using Legendre polynomials by Dias et al. (2006). They concluded that the model including additive direct genetic, additive maternal genetic, permanent environment, and maternal permanent environment effects respectively adjusted by cubic, quadratic, fourth order and linear polynomials, and residual variances adjusted by fifth order variance function as the best one to describe the covariance functions of the used database.

The computation costs decreased when were dropping random regression coefficients with eigenvalues that explained less than 1% of variance; however, the correlations for the additive maternal effect decreased. This result is in agreement with Bohmanova et al. (2005). Theses authors concluded that even though the eigenvalue corresponding to the eliminated maternal direct variance component accounted for less than 1%, it explained a large portion of the variance at first ages. Similarly, the reduction of the maternal effect decreased variance at early ages and caused almost no change at older ages.

**CONCLUSIONS **

To better convergence rate and adequate performance, it is necessary diagonalization in RRM with Legendre. To implement genetic evaluation by RRM in beef cattle is necessary testing to ensure that not only numerical problems but also inaccurate parameters decrease the accuracy of EPD. RRM less parameterized was unable to estimate the reliable covariances to genetic evaluate in beef cattle as a consequence the correlations between reduced RRM and MTM were not perfect.

**AGRADECIMENTO **

Trabalho desenvolvido em projeto com suporte financeiro da Fundação de Apoio ao Desenvolvimento do Ensino, Ciência e Tecnologia do Estado de Mato Grosso do Sul (FUNDECT, Termo de Outorga 051/03), apropriado ao Sistema Embrapa de Gestão.

**REFERENCES **

ALBUQUERQUE, L.G.; MEYER, K. Estimates of covariance functions for growth from birth to 630 days of age in Nellore cattle. *J. Anim. Sci*., v.79, p.2776-2789, 2001. [ Links ]

BOHMANOVA, J.; MISZTAL, I.; BERTRAND, J.K. Studies on multiple trait and random regression models for genetic evaluation of beef cattle for growth. *J. Anim. Sci*., v.83, p.62-67, 2005. [ Links ]

DIAS, L.T.; ALBUQUERQUE, L.G.; TONHATI, H. Estimação de parâmetros genéticos para peso do nascimento aos 550 dias de idade para animais da raça Tabapuã utilizando modelos de regressão aleatória. *Rev. Bras. Zootec*., v.35, p.1915-1925, 2006. [ Links ]

MEYER, K. Accuracy of genetic evaluation of beef cattle for growth fitting a random regression model in genetic evaluation. *J. Anim. Sci*., v.80, suppl.1, p.49, 2002. (Abstract). [ Links ]

MISZTAL, I. Complex models, more data: Simpler programming? In: INTERBULL WORKSHOP COMPUTATION CATTLE BREED, Tuusala, Finland. *Proceedings... Interbull Bull.,* v.20, p.33-42, 1999. [ Links ]

MISZTAL, I. REMLF90 manual. Available on: <ftp://nce.ads.uga.edu/pub/ignacy/blupf90/2005>. Acessed on: jan. 30, 2005. [ Links ]

MISZTAL, I.; PEREZ-ENCISO, M.A Fortran 90 interface to sparce matrix package FSPAK with dynamic memory allocation and sparce matrix structure. In: WORLD CONGRESS GENETIC APPLIED LIVESTOCK PRODUCTION, 6., Armidale, Australia. *Proceedings.*.. Armidale, NSW: University of New England, 1998. p.77-78. [ Links ]

MISZTAL, I.; STRABEL, T.; JAMROZIK, J. et al. Strategies for estimating the parameters needed for different test-day-models. *J. Anim. Sci*., v.83, p.1125-1134, 2000. [ Links ]

NOBRE, P.R.C.; LOPES, P.S.; TORRES, R.A. et al. Analyses of growth curves of Nellore cattle by Bayesian method via Gibbs sampling *Arq. Bras. Med. Vet. Zootec*., v.55, p.480-490, 2003a. [ Links ]

NOBRE, P.R.C.; MISZTAL, I.; TSURUTA, S. et al. Analyses of growth curves of Nellore cattle by multiple trait and random regression models. *J. Anim. Sci*., v.81, p.918-926, 2003b. [ Links ]

NOBRE, P.R.C.; MISZTAL, I.; TSURUTA, S. et al. Genetic evaluation of growth in Nellore cattle by multiple trait and random regression models. *J. Anim. Sci*., v.81, p.927-932, 2003c. [ Links ]

POOL, M.V.; MEUWISSEN, T.H.E. Reduction of the number of parameters needed for a polynomial random regression test day model. *Livest. Prod. Sci.*, v.64, p.133-145, 2000. [ Links ]

RAO, C.R. *Linear statistical inference and its applications*. 2.ed. New York: John Wiley, 1973. p.417-420. [ Links ]

ROBBINS, K.R.; MISZTAL, I.; BERTRAND,J.K. A practical longitudinal model for evaluating growth in Gelbvieh cattle. *J. Anim. Sci*., v.83, p.29-33, 2005. [ Links ]

TIER, B.; MEYER, K. Approximating prediction error covariances among additive genetic effects within animals in multiple-trait and random regression models. *J. Anim. Breed. Genet*., v.121, p.77-89, 2004. [ Links ]

TSURUTA, S.; MISZTAL, I.; STRANDÉN, I. Use of preconditioned conjugate gradient algorithm as a generis solver for mixed-model equations in animal breeding applications. *J. Anim. Sci*., v.79, p.1166-1172, 2001. [ Links ]

Van der WERF, J.; SCHAEFFER, L.R. Random regression in animal breeding. Course notes. CGIL, University of Guelph, Canada. 1997. Available on: <http://cgil.uoguelp.ca/pub/notes/.html>. Acessed on: Jan. 30, 2005. [ Links ]

Recebido em 28 de fevereiro de 2008

Aceito em 26 de maio de 2009

E-mail: geneplus@cnpgc.embrapa.br