INTRODUCTION:

The persistency of milk production is defined as the capacity of a cow to maintain its milk production after peak lactation. Thus, persistency is related to the shape of the lactation curve as a function of days of production. Persistency is a trait that is directly associated with economic factors of dairy farming since improved persistency may reduce the costs of the production system (^{TEKERLI et al., 2000}; ^{JAKOBSEN et al., 2002}), improving reproductive efficiency (^{MUIR et al., 2004}) and reducing the need of concentrate for milk production (^{SÖLKNER & FUCHS, 1987}).

The currently most suitable method for the measurement and representation of persistency takes into account the prediction of breeding values (EBVs) for components of the lactation curve using random regression models (RRM). These models permit to predict EBVs for lactation persistency since each animal will have a set of EBVs for the additive genetic regression coefficients that describe the lactation curve. It is therefore possible to predict EBVs for yields on each day of lactation and consequently for partial cumulative periods (^{LIN & TOGASHI, 2002}). Some countries, especially in the northern hemisphere, already use lactation persistency in their genetic evaluations (^{GENGLER, 1996}; ^{GROSSMAN et al., 1999}).

To obtain genetic gains in milk yield, it is desirable to select animals with high EBVs for both milk yield and lactation persistency. Existence of genetic variation permits adding genetically similar cows into groups using multivariate cluster analysis based on EBVs for milk yields on each test day or on components of the lactation curve. Division into groups would permit to select animals based on the genetic curve pattern of milk yield desired by the breeding program. The objective of cluster analysis is to divide similar animals into groups based on certain traits, reducing heterogeneity between groups and increasing intragroup homogeneity (^{HAIR et al., 2009}). There are no reports in the literature that use cluster analysis to explore the genetic profile and shape of the lactation curve in dairy Zebu cows. The aim of this study was to explore the pattern of genetic lactation curves of Guzerá cattle using cluster analysis.

MATERIALS AND METHODS:

The database used in this study contained 34,193 records of monthly test-day milk yields (TDMY) for first lactations of 5,274 purebred Guzerá cows with calving records comprising the period from 1987 to 2012. Cows were born to 628 sires and belonged to 101 herds. The data were obtained from the National Dairy Guzerá Cattle Breeding Program (PNMGuL), coordinated by Embrapa Gado de Leite in cooperation with the Brazilian Center for the Genetic Improvement of Guzerá Cattle (CBMG²) and the Brazilian Association of Zebu Breeders (ABCZ). The TDMY records were divided into 10 monthly classes ranging from day 6 to day 305 of lactation. EBVs were estimated by random regression using a single-trait animal model that included milk yield on each test day as the response variable. The fixed effects of contemporary group, defined by herd and test-day year and season, age of cow at calving as covariate (linear and quadratic effect), and random direct additive genetic, permanent environmental and residual effects were considered. The fixed lactation curve of the population was modeled using fourth-order orthogonal Legendre polynomials. The random additive genetic and permanent environmental effects were modeled using fourth- and fifth-order orthogonal Legendre polynomials, respectively. Residual variance was considered to be heterogeneous, with seven classes of variance.

This model was chosen after six different models had been tested using Akaike (AIC) and Bayesian Schwarz (BIC) (^{WOLFINGER, 1993}) criteria. The matrix representation of the model is:

y = Xb + Za +Wpe + e

where: y = vector of TDMY; b = vector of solutions for fixed effects which included the solutions for contemporary group and for the covariates age at calving and days in milk;

a = vector of solutions for additive genetic random effects; pe = vector of solutions for permanent environmental random effects; e = vector of the random residual effect; X, Z and W = incidence matrices for fixed effects and the random effect of animal and permanent environment, respectively.

The following model assumptions were: E(a)=0, E(pe)=0 and E(e)=0, and V(a)=k_{a} AV(pe) =kpe l_{Nd} and V(e)=R, where *k*
_{A} and *k*
_{PE} are (co)variance matrices between additive genetic and permanent environmental random regression coefficients, respectively; *A* is the relationship matrix between individuals; *l*
_{ND} is the identity matrix of dimension *Nd*; is the Kronecker product between matrices, and R is a diagonal block matrix containing residual variances. The variance components were estimated by the restricted maximum likelihood (REML) method using the WOMBAT statistical package (^{MEYER, 2006}).

Additive genetic random regression coefficients predicted by RRM for each animal were used to predict EBVs on each day of lactation. The EBVs for milk yields of animal i on the tth day of lactation were calculated as follows: EBVtj = (ebv0(j) *fint + ebv1(j) *f1 + ebv2(j) *f2 + ebv3(j) * f3 + ebv4(j) *f4), where ebv0 to ebv4 are the EBVs for the random regression coefficients of each animal (j), and fint to f4 represent the Legendre polynomials referring to each day of lactation (6 to 305 days). The EBVs for milk yield on each day of lactation were used to calculate the cumulative milk yield at 305 days (MRA305), peak yield (PSpeak) and partial yields (PS100, PS200 and PS300 corresponding to early lactation, mid-lactation and late lactation, respectively), as well as three measures of persistency (PS1, PS2 and PS3).

The EBVs for 305-day cumulative milk yield were obtained by the sum of EBVs on each day of lactation of the animal, represented by formula

Peak yield, defined as the period when maximum production is reached (^{COBUCI et al., 2004}), was determined as the average EBV between 6 and 30 days of lactation since the breed exhibits an atypical average lactation curve: .

In this study, three measures of lactation persistency were used. The first measure of lactation persistency (PS1) proposed here was calculated as the mean EBV between 30 and 270 days of lactation:

Proposed by ^{JAMROZIK et al. (1997}), the second measure of persistency (PS2) uses specific EBVs from two periods of lactation and was calculated as: PS 2 = (EBV 270- EBV30) . The third measure of persistency (PS3), adapted from ^{Pereira et al. (2012}), takes into consideration the sum of deviations of EBVs between 30 and 270 days of lactation in relation to the EBV predicted for peak yield and can be written as follows:

Measures of the partial periods of lactation correspond to cumulative milk yields in the first (6 to 100 days), second (101 to 200 days) and last third of lactation (201 to 300 days) and were calculated using the following formulas, respectively:

Simple and rank correlations between EBVs for the measures were estimated using the Proc Corr and Spearman procedures of the SAS statistical package (^{SAS, 2008}), respectively.

Hierarchical and non-hierarchical cluster analyses were used to create groups of the animals based on the EBVs obtained for the production measures reported above. Hierarchical cluster analysis was applied to choose the number of clusters into which the population studied could be divided. The Euclidian distance was used as a measure of similarity between animals and ^{Ward's clustering algorithm (1963}) to create the groups. After definition of the number of groups, non-hierarchical cluster analysis by the k-means method was performed to explore the genetic profile of milk production. In the cluster analyses, the EBVs for MRA_{305}, PS_{peak}, PS_{1}, PS_{2,} PS_{3,} PS_{100}, PS_{200} and PS_{300}, as well as EBVs at 30, 60, 90, 120, 150, 180, 210, 240, 270 and 305 days of lactation, were used to divide the individuals into groups. In a second step, only the measure of persistency showing the best behavior, in addition to EBVs for PS_{peak}, PS_{100}, PS_{200}, PS_{300} and MRA_{305}, was used to evaluate the genetic profile of the groups. The PROC CLUSTER procedure of the SAS program (^{SAS, 2008}) and the STATISTICA 8.0 program (^{STATSOFT, Inc., 2008}) were used for cluster analysis.

RESULTS AND DISCUSSION:

Peak yield was observed in the first month of lactation, with a mean milk yield of 9.04±4.62kg. Production continued in the second month of lactation (8.98±4.14kg) and subsequently decreased after the third month until the end of lactation (5.44±2.53kg). The heritabilities of TDMY (Table 1) were higher than those reported by ^{SANTOS et al. (2013}) for the same breed. These authors estimated a higher heritability in the second month (0.35) and a lower heritability in the eighth month of lactation (0.18). However, the trajectory and magnitude of the estimates were similar in the two studies after the second test day. Genetic correlations (Table 1) between monthly milk yields ranged from -0.03 to 0.95 and the lowest or negative estimates occurred between yields at the beginning and end of lactation. Similar results have been reported by SANTOS et al. (2013) for Guzerá cattle. Phenotypic correlations (Table 1) were all positive and ranged from 0.18 to 0.81. The same was observed for permanent environmental correlations which ranged from 0.48 to 0.98, with lower correlations between more distant TDMY.

Simple correlations of EBVs for MRA_{305} with PS_{peak}, PS_{100}, PS_{200} and PS_{300} were positive and high (0.76, 0.90, 0.95 and 0.76, respectively). Higher correlations with milk yields in the first and second third of lactation (PS_{100} and PS_{200}) indicated that they are strongly associated with the milk production level during lactation.

Simple correlations between EBVs for measures representative of the beginning of lactation (PS_{peak} and PS_{100}) and the measure representative of the end of lactation (PS_{300}) were considered to be unfavorable. The correlation between PS_{peak} and PS_{300} was 0.22 and the correlation between PS_{100} and PS_{300} was 0.40. PS_{300} expresses the EBVs for milk yields during late lactation. As is known, Zebu breeds exhibit a relatively shorter lactation length than specialized taurine breeds. During this period, many animals have already ceased lactation mainly because of low yields, especially in the case of first lactations. Rank correlations (Spearman) between EBVs for MRA_{305} and partial measures indicated high agreement in animal ranking, with the highest correlation between MRA_{305} and PS_{200} (0.94). Greater errors in animal ranking were obtained when complete lactation was correlated with late lactation (0.69).

According to ^{DEKKERS et al. (1998}), measures of persistency that show low genetic correlations with milk yield until 305 days of lactation are the most suitable in selection processes; otherwise, studies designed to select lactation persistency would not be justified since it would be sufficient to select total milk yield and an increase in the level of persistency would consequently be achieved (^{COBUCI, 2002}). Within this context, PS_{1} was not a good measure of persistency since the simple correlation of this trait with total milk yield during lactation was close to unity. Measures PS_{2} and PS_{3} showed simple correlations of -0.55 and -0.42 with MRA_{305}, respectively, indicating an unfavorable association between milk production during lactation and persistency. These results suggest the use of either PS_{2} or PS_{3} in selection programs since, although calculated differently and showing a low correlation with milk production during lactation, the two measures were highly correlated and practically ranked the same animals. The correlation between these criteria was high because both corresponded to the post-peak period up to 270 days. In studies on Holstein cattle using RRM, the genetic correlations between different measures of persistency ranged from -0.49 to 0.57 (^{JAKOBSEN et al., 2002}; ^{COBUCI et al., 2004}).

Only PS_{3} was chosen as a measure of persistency in multivariate cluster analysis, taking into consideration its correlation with MRA_{305}, which was the lowest among the three measures (-0.42). However, the antagonistic association between the two measures should be a warning to breeders who only use the increase in cumulative milk yield during lactation as a criterion. Hierarchical analysis using the EBVs of all animals was performed to visualize the genetic profile of the animals based on the selection criteria proposed. This analysis generated a dendrogram (Figure 1), which revealed the possibility of dividing the population into two groups (clusters). Non-hierarchical cluster analysis was then performed on these two groups using all measures of persistency proposed, partial periods, peak yield, total milk yield and measures obtained at intervals of 30 days to certify the results of the correlations discussed above (Figure 2). This analysis demonstrated that it is not necessary to use the persistency measure that takes into consideration mean EBVs between days 31 and 270, with the observation of high similarity between its EBV and the EBV for total milk yield (MRA_{305}) and yields at 30-day intervals. High similarity was also observed between PS_{2} and PS_{3}, indicating that either measure can be chosen for the following discussion. Figure 2 illustrates the results of non-hierarchical cluster analysis obtained by the k-means method using the EBVs for peak yield, partial yields (100, 200, and 300 days), persistency measures, complete lactation, and milk yields at 30, 60, 90, 120, 150, 180, 210, 240, 270 and 305 days of lactation. Regarding this division into two groups, group 1 exhibited EBVs above the average for partial periods of lactation and peak yield, but an EBV below the average for lactation persistency. Group 2 presented negative EBVs for partial periods and peak yield, but an EBV above the average for persistency, indicating that these animals could be more persistent despite the lower production level.

With respect to partial measures (Figure 2), the EBVs were higher at the beginning of lactation, decreasing thereafter and reaching the lowest values at the end of lactation. These results support those obtained by correlation analysis in this study, in which higher correlations were observed between measures of the first two-thirds of lactation. The genetic profiles of animals of group 1 obtained by hierarchical cluster analysis indicated the existence of different shapes of the genetic curves within the same group, although these animals exhibited EBVs above the average for partial measures, peak yield and complete lactation. A second multivariate cluster analysis was therefore performed, which only included animals of group 1. The division of animals of group 1 of the first analysis into two subgroups (Figure 3) showed positive values closer to the average in subgroup 1, except for persistency which was negative. The genetic profile of subgroup 2 was characterized by higher values and also a negative EBV for PS_{3}. However, the EBV for lactation persistency (PS_{3}) was lower for animals of subgroup 2 compared to subgroup 1. Animals of subgroup 1, which have lower production levels but still positive when compared to the average of the population, can be used in simpler production systems in which the animals are kept on pasture throughout the year. Animals of this subgroup can be used in dual-purpose production systems. Production systems of dairy Zebu breeds, which were not as selected for high milk yields as taurine breeds, should still emphasize the milk production level and animals of group 1, subgroup 2, meet these premises. However, this group of animals had lower EBVs for persistency, a fact resulting in lower total yields at the end of lactation. Thus, lactation persistency should compose a selection index seeking to combine both criteria, MRA_{305} and persistency.

According to ^{HARDER et al. (2006}), cows with smoother production curves can be more persistent than cows exhibiting the same milk production with peak yields, but a sudden decline in production after peak lactation. From an economic point of view, the adoption of these more persistent animals in selection programs would be indicated. According to ^{DEKKERS et al. (1998}), cows with higher lactation persistency require less feed and healthcare and reproduction costs are lower. Consequently, these animals are more profitable for the breeder. However, these proposals were made for taurine breeds, animals that have been extensively selected for high milk yield.

In the case of dairy Zebu breeds in which milk production levels are still low, especially the Guzerá breed which has dual-purpose (beef and milk) lineages, milk production level continues to be the main selection criterion. However, certain emphasis is given to persistency of lactation in an attempt to correct long-term problems related to lower lactation persistency and shorter lactation length in animals of this breed. Nevertheless, studies broadening the knowledge of the production systems used by producers of these breeds, in conjunction with this proposal, are needed since they are generally pasture-based production systems.

CONCLUSION

The persistency measure (PS_{3}) was the most suitable for genetic evaluations of lactation persistency, mainly because of its lower correlation with MRA_{305}. Cluster analysis identified animals belonging to different groups according to milk production level and lactation persistency. Study of EBVs of the population using cluster analysis permits to identify superior animals for the desired traits, as well as to distinguish the genetic levels of the population. Selection of animals of this breed should emphasize the level of production, which is still low, including lactation persistency as a selection criterion.