Fitting mathematical models to lactation curves from holstein cows in the southwestern region of the state of Parana , Brazil

The objective of this study was to evaluate four mathematical models with regards to their fit to lactation curves of Holstein cows from herds raised in the southwestern region of the state of Parana, Brazil. Initially, 42,281 milk production records from 2005 to 2011 were obtained from “Associação Paranaense de Criadores de Bovinos da Raça Holandesa (APCBRH)”. Data lacking dates of drying and total milk production at 305 days of lactation were excluded, resulting in a remaining 15,142 records corresponding to 2,441 Holstein cows. Data were sorted according to the parity order (ranging from one to six), and within each parity order the animals were divided into quartiles (Q25%, Q50%, Q75% and Q100%) corresponding to 305day lactation yield. Within each parity order, for each quartile, four mathematical models were adjusted, two of which were predominantly empirical (Brody and Wood) whereas the other two presented more mechanistic characteristics (models Dijkstra and Pollott). The quality of fit was evaluated by the corrected Akaike information criterion. The Wood model showed the best fit in almost all evaluated situations and, therefore, may be considered as the most suitable model to describe, at least empirically, the lactation curves of Holstein cows raised in Southwestern Parana.


INTRODUCTION
The lactation curve is a graphic representation of the milk production from an animal throughout a defined period (Yadav et al. 1977).Accurate fitting of lactation curves contributes towards making decisions about management and breeding of animals that can be adjusted according to predictions of milk production at some stage of lactation (Cobuci et al. 2000(Cobuci et al. , 2012)).
The first mathematical representation of the lactation curve of cattle was made by (Brody et al. 1923).This first model (equation 1) describes a declining curve whose peak lactation is expected to occur on the day of parity.Therefore, this model is not suitable to describe the entire lactations of cows ABÍLIO G.T. FERREIRA et al. that are highly specialized in milk production, but it is useful to outline only the declining phase of the curve (Brody et al. 1924, Cobby andLe Du 1978): (1) wherein: M = the milk flow in time "t" after the start of the process; A = theoretical value of maximum milk flow; k 1 = constant of declining or specific reaction rate (t −1 ).
Subsequently, Brody et al. (1924) proposed another model (equation 2) with two phases: an initial rising phase proceeded by a declining phase, which Brody and co-workers assumed to occurs after the second month of lactation: (2) wherein: k 2 = constant of decline of the rising phase of the curve of milk secretion; the subtraction A -B indicates the milk flow at the time of parturition.
Wood (1967) proposed the use of the incomplete gamma function to describe the lactation curve and obtained a better fit of the ascending and descending phases of the curve, than the models used until that time.According to Guimarães et al. (2006), the incomplete gamma function, as suggested by Wood (1967), is the most common model used to estimate the lactation curves.The wide use of this model is, probably due to its ability to fit relatively well to a broad kind of data such as animals of different breeds with diverse production potential, raised in varied production systems and in several locations (Cobby and Le Du 1978, Papajcsik and Bodero 1988, Dijkstra et al. 1997, Landete-Castillejos and Gallego 2000, Cobuci et al. 2000, Pollott 2000, Cunha et al. 2010, Gloria et al. 2012).Another important feature of Wood's model is its simplicity and a reduced number of parameters (Wood 1967).
Despite the good fit of the model presented by Wood (1967), its parameters have no direct biological explanation.These models are known as empirical models, because they only describe the relationships between the variables without establishing cause-and-effect relationships.Over the last decades, improvements on computer processing power has allowed the development of more complex models whose parameters may represent biochemical and physiological aspects of the processes to be modeled.These models are called mechanistic models and, unlike empirical models, are formulated in order to establish the cause-andeffect relationships between the variables.
The first model of the lactation curve with parameters that have direct physiological explanation and whose numerical values can be estimated with data obtained in field situations were proposed by Dijkstra et al. (1997).This mechanistic model was developed to describe the growth pattern of the mammary glands during pregnancy and lactation.The model consists of a single pool (cell of mammary tissue) that is represented by DNA accumulation during the periods of pregnancy and lactation.The cell proliferation and death represents influxes and effluxes from the single pool, respectively.The authors adjusted the model to the data of mammary gland development (represented by the accumulation of DNA) in several species (mouse, rat, guinea pig, hamster and goat) and then made some redefinitions of parameters of the original model to represent the lactation curve in cattle.
Another mechanistic model has been proposed by Pollott (2000) based on the fit of two logistic curves to represent the major biological processes involved in the lactation process such as parenchyma cell proliferation, differentiation into cells with ability to secrete milk, and reduction in number of cells due to apoptosis (Pollott and Gootwine 2000).According to Pollott (2000), in addition to the advantage of biological meaning of LACTATION CURVES OF HOLSTEIN COWS the parameters, the model can be modified by using additional terms that allow predicting the effect of factors that change the course of milk production, such as improvements in the cow nutrition or the beginning of pregnancy.However, Pollott's model is more complex than Dijkstra's model and has a larger number of parameters.
Several authors have assessed the quality of fit of the mentioned models at different scenarios (Papajcsik and Bodero 1988, Dijkstra et al. 1997, Cobuci et al. 2000, Val-Arreola et al. 2004, Guimarães et al. 2006, Cunha et al. 2010, Souza et al. 2014).The results from these tests varied, mainly due to differences in the level of milk production and the order of parity, thus the ranking of models according to quality of fit differed significantly from one study to another (Papajcsik and Bodero 1988, Dijkstra et al. 1997, Cobuci et al. 2000, Val-Arreola et al. 2004, Guimarães et al. 2006, Cunha et al. 2010).
Some authors evaluating the model of Brody (Papajcsik and Bodero 1988, Cobuci et al. 2000, Guimarães et al. 2006, Cunha et al. 2010) represented the lactation curve using the following equation: (3) Equation (3) predicts milk yield equal to zero at day of parity, as stated in Papajcsik and Bodero (1988).However, the original model of Brody (Equation 2) does not produce this result.Simulations of lactation curves using the values from the parameters of equation (2) estimated by Brody et al. (1924) with data from Holstein cows milked two, three or four times daily, predicted initial yield of 5.9, 7.3 and 18.1 kg day −1 , respectively.Therefore, the evaluated equation (3) actually differs from that originally published by Brody et al. (1924) and this mistake has been repeated for two decades at least!Therefore, the question of which model would apply to all situations cannot be answered with the results obtained until now, and possibly, this model does not exist.However, the models must be evaluated in each condition in which it is intended to be used.A particular situation is the evaluation of the model of Brody et al. (1924), which has, so far, been described erroneously and certainly invalidates the results described in literature.In this sense, a data set of milk production obtained in the southwestern region of Parana was used to evaluate four (Brody et al. 1924, Wood 1967, Dijkstra et al. 1997, Pollott, 2000) of the five mathematical models described previously, to establish which one would be more suitable to be used in selection programs of dairy cows in the region, which is currently the largest milk producer in the state of Parana (SEAB 2012), but still lacks technical information to improve productivity per cow (IPARDES 2009).

MATERIALS AND METHODS
Data provided by "Associação Paranaense de Criadores de Bovinos da Raça Holandesa" (APCBRH) from dairy herds located in the southwestern region of the state of Parana, with 42,281 records between the years 2005-2011, were obtained.The records contained milk production at the date of daily control, recorded monthly, the animal identification, dates of birth of the cow in addition to parity, order of parity, partial production and total production at 305 days of lactation.Information from animals that did not have the date of dry off and the total production at 305 days were excluded from the data set to be analyzed, resulting in 15,142 records concerning 2,441 Holstein cows (Table I).
The data were ranked according to the order of parity of the cows (which ranged from one to six) and within each parity order, the animals were divided into quartiles (Q25%, Q50%, Q75% and Q100%) according to total milk yield at 305 days.This classification was performed using the univariate procedure of SAS (version 9, SAS Institute Inc., Cary, NC, USA).Four mathematical models were fitted to each quartile belonging to ABÍLIO G.T. FERREIRA et al.  each parity order: two empirical models (Brody and Wood) and two mechanistical models (Dijkstra and Pollott).The Marquardt's algorithm implemented in the NLIN procedure was used to fit the models to the data (version 9, SAS Institute Inc., Cary, NC, USA).

Milk production at
The description of the models used in this work, and their parameters, were as similar as possible to the ones found in the original papers.b and c must be smaller than 1; t = time in days (in the original paper, time was given in weeks).
Model of Dijkstra: The quality of fit were verified by the Akaike information criterion (Akaike 1974, Burnham andAnderson 2004) using the recommendations suggested by Vieira et al. (2012).
After the estimation of parameters, the total milk production over the lactation was estimated, by means of the definite integral of milk production over the time interval [0,305] for each mathematical model in each order of parity (OP1 to OP6) and quartile (Q25% to Q100%) as described in the general integral function below: The yield at the peak and the time of peak production were also calculated for the situations described above, by determining the point of maximum of the lactation curves and the value of t in the point of maximum.

RESULTS
The Pollott model used in this study was modified to reduce the number of parameters from seven to five, according to the suggestions of Pollott (2000) and Pollott and Gootwine (2000).Despite the simplification procedure, the model did not fit the data in any of the parity orders (OP) or quartiles (Q).
The model of Brody (1924) obtained values of Δr ⊂ [0, 2] in 11 cases (Table II).Brody's model was similar to that of Wood's and Dijkstra's in six situations (OP1 Q50%; OP2 Q25%; OP3 Q25%; OP3 Q50%; OP3 Q75% and OP4 Q50%) to reproduce data behavior and minimize loss of information (Table II).Brody's model was similar exclusively to the model of Dijkstra in two circumstances (OP1 Q25% and OP1 Q75%) and in two other cases it was similar only to the model of Wood (OP5 Q50% and OP6 Q75%).The model of Brody failed to reproduce the behavior of the observed data (Δr > 10 -Table II) in only one situation (OP1 Q100%).However, observing the values of w r (Table II -all less than 0.8) the model of Brody was not considered a likelihood representation of reality in any situation.However, that model has an intermediate probability (w r ⊂ (0.5, 0.8]) to represent reality in two cases (OP1 Q75% and OP5 Q75%).Brody model was considered the best choice among the models tested (ER r = 1 -Table II) in OP1 Q75% and OP5 Q75% and the worst choice (ER r > 20 -Table II) in OP1 Q100%, OP2 Q75% and OP3 Q100%, presenting an intermediate performance in other cases, with the exception of OP4 Q75%, OP4 Q100% and OP6 Q50% in which the model failed to converge.
The model of Dijkstra et al. (1997) presented similar performance to the model of Brody et al. (1924).The model of Dijkstra et al. (1997), obtained Δr ⊂ [0, 2] in ten cases.This model was similar to both models (Brody and Wood) in mimicking observed data (Table II) in six situations (OP1 Q50%; OP2 Q25%; OP3 Q25%; OP3 Q50%; OP3 Q75%; OP4 Q50%), being similar exclusively to the model of Brody in two other cases (OP1 Q25% and OP1 Q75%), and exclusively to the model of Wood in another two (OP4 Q75% and OP4 Q100%).Just like the model of Brody, the model of Dijkstra failed to emulate the observed data in Q100% OP1 (Δ > 10 -Table II) and was not considered a likelihood representation of reality in any studied situation (w r < 0.8 -Table II).Only in OP4% Q75 the model of Dijkstra presented intermediate probability w r ⊂ (0.5, 0.8] for representing the reality.This model was the best choice to represent the data in OP1 Q25% and OP4 Q75% (ER r = 1).The model did not converge in OP5 Q50%, OP5 Q75% and OP6 Q75%.
The model of Wood was the only model capable of representing the data with likelihood, i.e., values of w r > 0.8.This occurred at OP1 Q100%, OP2 Q75%, OP2 and OP3 Q100% (Table II).Moreover, the Wood's model was considered the best choice (ER r = 1) in 20 occasions (Table II) and the worst choice in only one case (OP1 Q75% -Table II).
The functions to estimate total milk yield at 305 days, peak production and time to peak production (days) for the model of Wood is represented by equations ( 8), ( 9) and ( 10) respectively: in which M max is the maximum milk production and t p is the day of peak of lactation.The other parameters were previously defined.
The same estimates can be obtained for the model of Dijkstra by equations ( 11), ( 12) and ( 13): (11) And for the model of Brody by equations ( 14), ( 15) and ( 16): ( 14) Estimates of peak production and time of peak (days) obtained with the three models (Brody, Wood and Dijkstra) were fairly similar (Table III), as well as the estimate of total milk yield at 305 days (Table III).The values estimated by Brody and Dijkstra models were always closer to one another than the values estimated by the model of Wood (Table III).
The data of 1606 cows were used to estimate the persistency factor according to the exponential law that described the lactation curve between the peak production and the mid-gestation (Lucas, 1960).This data was used to estimate the correlation between persistency, M max , M tot , and M 1 (milk production in the first day of lactation).
High correlation values (Pearson correlation coefficients) were estimated between M max and M tot , M max and M 1 and M tot and M 1 (Table IV), but low values for M max and persistency, M tot and persistency, and M 1 and persistency (which was the only relationship that had a negative and very low correlation value).Nevertheless, all the correlations tested were significant (P<0.05).
Figures 1, 2 and 3 present the three lactation curves generated by the three models (Brody, Wood and Djikstra) for data of two different order of parity (OP1 and OP3), the curves obtained by the models of Djikstra and Brody (Fig. 1 and 3) are very similar, whereas the model of Wood produced slightly different curve shapes (Fig. 2).

DISCUSSION
According to Burnham and Anderson (2004), the Akaike information criterion (AIC) has solid statistical and philosophical basis for assessing the quality of fit of mathematical models and for assisting in the selection of the most suitable model.The AICc r (corrected AIC) is a version of the AIC to be used in practice because it can be used even when the number of observations (n) is not sufficiently large (unlike the AIC) and AICc r converges to AIC as "n" increases.The AICc r is determined from the sum of squares of residuals, the number of observations and the number of model parameters (including the parameter of the error variance).The individual value of AICc r does not have an interpretation.To select the best models, we must calculate the Δ r (the difference among AICc r ), the likelihood probabilities (w r ) and the relative likelihood (ER r ), which is the ratio between the maximum w r and the w r of the model in concern (Burnham andAnderson 2004, Vieira et al. 2012).Δ r values between 0 and 2 indicate that the models are similar in reproducing the observed data behavior 1 M max = milk production at peak, kg day -1 ; M tot = total milk yield at 305 days, kg; Persist = persistency factor; M 1 = Milk production at the first day of lactation kg day -1 .* Significant (P < 0.05), *** Significant (P<0.001).
and reduce the loss of information (in this case the model with the fewest number of parameters must be preferred).Δ r values greater than 2 and less than or equal to 10 mean that the model performance was not as good and values greater than 10 indicate that the model either failed to reproduce the data and minimize the loss of information (Burnham andAnderson 2004, Vieira et al. 2012).Regarding w r , when its value exceeds 0.8 the model can be considered a plausible representation of the reality; for a w r greater than 0.5 and less than or equal to 0.8 the model in not so much representative; and LACTATION CURVES OF HOLSTEIN COWS   2010) it is possible that mechanistic models, such as the Dijkstra and Pollott, do not fit well to lactation curves of animals characterized by low milk production potential.However, in the present study the parameters of the model of Pollott did not converge even to curves of high production level cows (Q100%) and the model of Dijkstra fitted well to some groups of low production level (Q25% -Table II), being the best option for the group OP1 Q25% (Table II).Therefore, the assumption of Cunha et al. ( 2010) is not supported by our results.
The estimates for peak day of lactation calculated by the models of Brody et al. (1924), Wood (1967) and Dijkstra et al. (1997) fitted by Cunha et al. ( 2010) are quite divergent, for example, the peak day calculated by the model of Brody was closer to the beginning of lactation, followed by Wood, and the later peak is estimated by the model of Dijkstra (Cunha et al. 2010).But, as previously mentioned, these authors, as well as several others, described the model of Brody differently from the original paper, resulting in mistaken inferences about it.Estimates made in the present work regarding the day of milk production peak, calculated with the three models (Brody et al. 1924, Wood 1967, Djikstra 1997) were quite similar (Table III).The models of Brody and Dijkstra had the most similar values (Table III and Figs. 1 and 3), and in almost all situations the day of peak milk yield calculated with the model of Wood was closer to the parity date than the other two models (Table III and Fig. 2).
None of the models analysed in the present study showed a best fit to some specific situations, such as, a better fit to data of high production animals (as found by Cunha et al. 2010 for the model of Djikstra) or to a smoother peak production curve shape (as the result found by Dijkstra et al. 1997 for the model of Wood).The most relevant and general results, in summary, were: the model of Pollott was not a good fit to any data, regardless of the parity order (OP) or the production level (Q); the model of Wood was the best choice in most situations (Table II) and the models of Brody and Dijkstra presented intermediate and similar quality of fit (Table II), with estimates of milk production peak, day of peak and total milk yield at 305 days, very close (Table III).The number of parameters of each model (five to Pollott, four to Brody and Dijkstra, and three to Wood) was the only aspect that influenced the results obtained by the Akaike criterion; apparently, the quality of fit was inversely proportional to the number of parameters.And, although the model of Wood presented the best performance in almost all studied situations, and had the additional advantage of fitting very well to the data with a reduced number of parameters (only two), this model did not have parameters with biological meaning.
The model of Wood was the best choice to fit the data in most circumstances evaluated in the present work, probably due to its simplicity and smaller number of parameters.Therefore, when the biological description of parameters is not required for inferences, it is the most recommended to fit the lactation curve in practical situations.Nevertheless, functions of the parameters may be of interest and have biological interpretation, such as those described by equations ( 8), (9), and (10).A very important point is that the quality of fit of available models must always be evaluated according to the different situations.Palavras-chave: gado leiteiro, produção de leite, seleção de modelos matemáticos, modelos não lineares.

M
= At b e − ct (4) M = milk production; A, b, c = constants (as described by Wood, 1967); production in time "t" (d −1 ); M 0 = theoretical initial milk production (kg day −1 ); µ T = specific rate of cell proliferation at parturition; k 2 = decay parameter; λ = specific rate of cell death; t = time in days.Model of Pollott:M = daily milk production; LACTATION CURVES OF HOLSTEIN COWS P 0 = the total proportion of parenchymal cells which become active during early lactation since the first day of lactation; G = relative growth rate in the number of active cells; t = time in days; Q 0 = proportion of the total parenchyma cells that are dead, dying on the first day of lactation; D = mortality rate for cells; MS = total milk secretion potential during lactation.

TABLE II Difference among AICc r values (Δ r ), likelihood probabilities (w r ), evidence ratio or relative likelihood (ER r ) and root of mean square prediction error
(RMSPE) for each parity order (OP) and production level quartile (Q).* kg day