Linear and nonlinear models to describe the lactation curve of Girolando cows

– The objective of this work was to compare the main linear and nonlinear models used to describe lactation curves and to evaluate the nonlinearity of the nonlinear models, in order to obtain the most adequate model to describe the lactation curves of the Girolando breed. Data from 165 lactations of 89 3/4 Holstein + 1/4 Gyr cows were used, and average yield was calculated every 20 days up to 310 days of lactation. Seventeen models of lactation curves available in the literature were compared. The selection of the best model was based on the curvature measures of Bates & Watts, the bias of Box, adjusted coefficient of determination, Akaike’s information criterion, and residual standard deviation. The linear model of Cobuci estimated a yield peak of 16.7 kg at 40 days of lactation, whereas the nonlinear model of Wood estimated a yield peak of 16.8 kg at 41 days of lactation and a persistence of 6.82. Nonlinearity measures were the most appropriate for selecting the most suitable nonlinear model for the description of lactation curves. To describe the lactation curves of the Girolando breed, the most suitable linear model is that of Cobuci and the nonlinear model is that of Wood.


Introduction
Cow milk production is expressive worldwide. According to Food and Agriculture Organization (FAO), in 2020, milk yield was 718.03 million tons, of which 227.85 million tons were produced only in Europe (FAO, 2022). Currently, Brazil is the third largest milk producer, yielding 36.50 million tons in the same year, only behind USA and India (FAO, 2022).
Milk yield can be represented graphically during cow lactation by what is called a lactation curve. Glória et al. (2010) showed that the study of these curves is important because it allows of estimating total yield from partial yields, which makes it possible to carry out early culling and the evaluation of sires based on the incomplete lactations of their daughters. The study of lactation curves also facilitates the estimation of the milk volume needed by calves and aids in the strategic planning of calve nutrition and supplementation, from lactation to finishing for meat production (Henriques et al., 2011). However, the shape of the lactation curve can be modified by environmental factors that may interfere with milk yield, such as herd, calving year, calving season, and cow age at calving (Cobuci et al., 2000).
Lactation curves have been used to model the milk yield of different breeds and species, both through linear and nonlinear regression models. In linear models, estimation is exact, whereas, in nonlinear models, it is obtained through linear approximations. Despite this difference, nonlinear models have an excellent quality of fit and provide parameters with biological interpretation (Maia et al., 2009). However, according to Fernandes et al. (2015), the greater the nonlinearity of a model, the further from linear approximation it will be, which makes inferences about the studied parameters less reliable.
Expressions used to assess the adequacy of linear approximations and their effects on inferences are known as nonlinearity measurements, among which stand out the curvatures of Bates & Watts (1988) and the bias of Box (1971). Fernandes et al. (2015) concluded that measuring the nonlinearity of a nonlinear regression model is fundamental to assess the reliability of parameter estimates; however, these authors did not measure the nonlinearity of nonlinear lactation-curve models.
Regarding dairy cattle, Facó et al. (2002) pointed out that the Holstein x Gyr crossbreed combines the most relevant characteristics between two breeds, i.e., the great ability to adapt to challenging environments of the Gyr breed and the rusticity and high milk yield of Holstein cows, resulting in an animal superior to those of other dairy crossbreeds. Therefore, it is important to compare the lactation curves of these two breeds to improve their crossbreed.
The objective of this work was to compare the main linear and nonlinear models used to describe lactation curves and to evaluate the nonlinearity of the nonlinear models, in order to obtain the most adequate model to describe the lactation curves of the Girolando breed.

Materials and Methods
The used data were obtained from 165 lactations of 89 3/4 Holstein + 1/4 Gyr (Girolando) cows reared in a milk production system on pasture with supplementation. In this herd, the number of lactations varied from one to three or more per cow.
Milk was weighed monthly between November 2012 and February 2018 at the Santa Lúcia farm, located in the municipality of São Simão, in the state of São Paulo, Brazil. A full lactation was considered as four weighings in sequential months, totaling 16 weighings. Therefore, only the weighings of a same cow that reached the total of 16 were used to describe the lactation curves, in order to avoid the loss of information when, for example, cows were sold. The average yield was calculated every 20 days until 310 days of lactation. Seventeen lactation-curve modelsnamed after the authors of each study -were used to analyze the behavior of the lactation curves (Table 1), as adapted from Calvo Cardona et al. (2015).
Each model was partially derived in relation to its parameters and then, according to the result of these derivatives, were classified into linear or nonlinear. To estimate the parameters of the linear and nonlinear models, the 'lm' and 'nls' functions of the R statistical software (R Core Team, 2020) were used, respectively.
To assess the quality of fit of the model, the curvature measures of nonlinearity of Bates & Watts and the bias of Box were obtained using the IPEC package of the R software (R Core Team, 2020).
Nonlinearity measurements are based on the geometric concept of curvature, being decomposed Pesq. agropec. bras., Brasília, v.57, e02678, 2022 DOI: 10.1590/S1678-3921.pab2022.v57.02678 into two components: intrinsic nonlinearity and nonlinearity due to the effect of parameters (Bates & Watts, 1988). Zeviani et al. (2012) concluded that intrinsic nonlinearity measures the lack of flatness of the expected surface, whereas parametric nonlinearity measures the nonexistence of uniformity of the surface coordinate system near the neighborhood of the solution's location. The authors highlighted that Box's bias allows of indicating which model parameters lead further away from a linear behavior. Concomitantly to nonlinearity measures, quality estimators -i.e., adjusted coefficient of determination (R 2 aj ), Akaike's information criterion (AIC), and residual standard deviation (RSD) -were used, as follows: where R 2 is the coefficient of determination; n is the number of measurements; p is the number of parameters; i is related to the fit of the curve intercept, being equal to 1 if there is an intercept in the model and equal to 0 if there is no intercept; ln is the natural logarithmic operator; L( )  θ is the maximum point of the maximum likelihood function of the models; and QME is the residual mean square. The best model was considered the one with the highest R 2 value and the lowest AIC and RSD values.
To verify the assumptions of normality, homogeneity, and independence of the models, the tests of Shapiro-Wilk, Breusch-Pagan, and Durbin-Watson were used, respectively, with their functions in R, at 1% probability. If p-value > 0.01, these tests are nonsignificant, i.e., the residuals present normality and constant variance and are also independent.
Confidence intervals were constructed for the estimates of the models selected as the most adequate to describe the lactation curves of Girolando cows. According to Draper & Smith (1998), when calculating the y-value corresponding to a given x-value in a linear model, the (x 0 ; y 0 ) pair is obtained and the confidence interval for y 0 is given by: where x 0 is the line of the X incidence matrix of the linear model corresponding to the value of the independent variable (days in lactation) for which the Y estimate is being obtained,  β j is the estimate of the j th parameter, and t (n-p,α/2) is the upper quantile of the t-distribution. Draper & Smith (1998) added that, in a nonlinear regression model, the approximate confidence interval for the y 0 -value estimated for the x 0 -value is given by:  For both the linear and nonlinear models, parameter estimation was performed using the least squares method as described in Seber & Wild (1989). In the case of the nonlinear models, before the ordinary least squares method, the Gauss-Newton convergence algorithm or method of linearization, in which a Taylor series expansion is used to approximate the nonlinear regression model to linear terms, was applied.
As in most iterative methods, the first step was to assign initial values to the vector of parameters, searching for estimates obtained for the lactation curves of other breeds in the available literature (Cobuci et al., 2000;Jacopini et al., 2016;Daltro et al., 2018).

Results and Discussion
The models classified as linear, whose partial derivatives did not depend on any parameter, were those of: Dave The nonlinearity measurements of the nonlinear models were also evaluated as recommended by  (Table 2). According to Fernandes et al. (2015), lower values of nonlinearity measures indicate a better fit of the model as more reliable estimates are obtained.
The model of Rook, France & Dhanoa and that of Dijkstra presented parametric nonlinearity measures (c θ ) with very high values (Table 2), indicating a low reliability of the parameter estimates. The high values obtained can be explained either by the arrangement of the parameters in the model or by the lack of suitability of the model for this data set (Fernandes et al., 2015(Fernandes et al., , 2019, suggesting that a possible reparametrization of those models can be more efficient. Box's bias was higher for the models with a number of parameters equal to four since the greater the bias of the parameter, the greater the deviation from linearity (Fernandes et al., 2019). Of the models with two parameters, the one that presented the lowest values for Box's bias was that of Brody, Ragsdale & Turner, and, of the models with three parameters, that of Wood. Considering all quality-of-fit criteria, the model of Wood showed better R 2 aj , RSD, and AIC values than that of Brody, Ragsdale & Turner.
Therefore, considering the results of all qualityof-fit criteria and also the measurements of the curvature of Bates & Watts and the bias of Box, the best nonlinear model to describe the lactation curve of Girolando cows is that of Wood. In alignment with the present study, Fernandes et al. (2019) highlighted that these two measurements are the most used to evaluate the nonlinearity of a nonlinear regression model. Furthermore, Oliveira et al. (2020) observed that the model of Wood shows the rigor and precision needed for the selection of cows based on their performance as to parametric estimates of lactation quality, such as peak milk yield, average milk yield, and ascending and descending rates.
The R 2 aj values were mostly above 0.94 (Table 2), emphasizing the good quality of fit obtained in the present work since R 2 is generally not very high in studies on lactation curves. Pereira et al. (2016), for example, found an R 2 aj value equal to 0.86 while using the model of Bianchini Sobrinho to describe the lactation curve of the Bos taurus x Bos indicus cross. Daltro et al. (2019) Table 2); one of the reasons for this is that the expression of the model corresponds to a decreasing straight line, which is not the pattern of a lactation curve. Similarly Torquato et al. (2017) did not observe a peak in the lactation curve of Gyr daughters of the 1/2 Holstein-Gyr breed. Lazzari et al. (2013) also found that the lactation curve for Zebu cows tended not to peak or to peak only in the first few weeks.
The models of Wilmink, of Rook, France & Dhanoa, and of Dijkstra have an R 2 aj above 0.95 (Table 2); the first two were the nonlinear models with the highest R 2 aj . The lowest RSD and AIC values were found for the models of Wilmink and of Sikka, respectively. Considering only the R 2 aj , RSD, and AIC evaluators of quality of fit, the model of Wilmink would be the most adequate to describe the lactation curves of Girolando cows.
Most studies on lactation curves use either R 2 , R 2 aj , AIC, or Bayesian information criteria (BIC) as criteria to determine a model's quality of fit. Jacopini et al. (2016) and Hossein-Zadeh (2019), for example, used only R 2 aj and R 2 , respectively, as a criterion, whereas Daltro et al. (2018) and Hossein-Zadeh (2019) applied AIC, BIC, and root mean square error. However, the literature on lactation curves is usually not concerned with the nonlinearity of the studied models, which are selected mistakenly via R 2 , i.e., have unreliable parameter estimates. The results of the present work confirm that it is extremely important to consider nonlinearity measures in studies using nonlinear models as reported by Fernandes et al. (2015), Diel et al. (2019), andFernandes et al. (2019).
Of the evaluated models, that of Wood is also used to describe the lactation curve of other bovine breeds, such as Holstein, Guzerá, Caracu, Holstein x Guzerá, Holstein x Nellore, Holstein x Zebu, Taurino x Zebu, as well as of other species, as buffaloes and goats. The model of Cobuci is used to characterize the lactation curves of the Guzerá and Holstein breeds.  Lazzari et al. (2013) and Cobuci et al. (2000) compared the adjustments of the models of Cobuci and Wood for the Holstein and Guzerá breeds, respectively. The authors concluded that the model of Wood, with a R 2 of 0.95, was better adjusted to the Holstein breed and that of Cobuci to the Guzerá breed.
As it is a nonlinear model, the parameter estimates of the model of Wood have practical interpretations: milk yield of 13.90 kg at the beginning of lactation (parameter a), a phase of increase in milk yield of 0.07 at the beginning of lactation (parameter b), and a phase of decline in milk yield of 0.001 after its peak (parameter c). Based on the estimates of the model of Wood, according to Glória et al. (2010), it is also possible to estimate lactation peak (p = (a*(b/c)^b)exp(-b)), peak time (d=b/c), and persistence (P=-(b+1)*ln(c)). In the present study, yield peak was 16.8 kg, peak time occurred approximately on day 41, and persistence was 6.82. Jacopini et al. (2016) obtained similar results for Girolando cows of the 3/4 Holstein + 1/4 Gyr genetic group in the first lactation, with a peak of 17.13 kg, approximate peak time on day 49, and persistence of 6.66. Oliveira et al. (2020) found a peak of 17.2 for Holstein cows in the first lactation and of 18.2 for the Jersey breed.
Although linear models do not have parameters with a practical interpretation, peak day and yield can be easily obtained through certain calculations, considering peak milk yield is the maximum peak of the lactation curve. By deriving the model of Cobuci with respect to t, the expression f(t) = (1/t)-c was obtained and, then, by equaling to zero and substituting the estimate of parameter c = 0.0252, it was used to determine peak day, which was approximately on the fortieth day. Moreover, by substituting the respective parameter in the expression of the model of Cobuci, peak yield was 16.7 kg. If the aim is only to determine the quality of fit and the predictive capacity of a model, the linear model, which has more easily estimated parameters and optimal properties, may be sufficient, but it makes it difficult to infer practical parameters of the lactation curve, such as persistence.
The tests of Shapiro-Wilk, Breusch-Pagan, and Durbin-Watson were nonsignificant at 1% probability (p-value >0.01) for almost all lactation curve models, except for that of Papajcsik & Bodero, i.e., all other models showed no violation of residual assumptions, indicating that the residuals presented normality, constant variances, and independence. Therefore, in the model of Papajcsik & Bodero, it is necessary to model and incorporate the residual autocorrelation.
The confidence intervals of the model of Cobuci, as highlighted by Draper & Smith (1998), are exactly obtained, without the use of approximations. In this model, from the first 10 days of lactation to 50 days of lactation, there was a slight increase in milk yield volume, ranging from an average of 16.0824 kg on the tenth day to 16.6831 kg on the fiftieth day. To facilitate the understanding of data, the lower and upper limits of the confidence intervals of 98, 80, and 60% were obtained (Table 3).
Based on the estimates, the approximate confidence interval of the model of Wood showed that, for 98% of the cows, milk yield was between 15.1767 and 16.9208 kg in the first 10 days of lactation, with an estimated average of 16.0487 kg. The same interpretation is valid for the confidence interval limits of 80 and 60% (Table  3). As the model of Wood model is nonlinear and interval estimates are only approximately obtained (Draper & Smith, 1998), its confidence interval limits are greater than those of the model of Cobuci. Through the graphic representation, it was possible to visualize the behavior of the curve that adjusts milk yield data over the period under analysis (Figure 1). In this sense, the actual yield data are arranged around the average of the estimated values, showing the quality of fit of the model of Cobuci. In addition, the amplitude of the 98% confidence interval is greater for the model of Wood, compared with that of Cobuci, since the former is a nonlinear model and its intervals are obtained in an approximate way as pointed out by Seber & Wild (1989  Conclusions 1. To describe the lactation curves of the Girolando breed, the most suitable linear model is that of Cobuci and the nonlinear model is that of Wood. 2. Nonlinearity measures are the most appropriate for selecting the most suitable nonlinear model for the description of the lactation curves of the Girolando breed.
3. The parameters of the nonlinear models allow of estimating the persistence in the lactation curves of the Girolando breed.