MULTIVARIATE APPROACH IN THE EVALUATION OF THE PRODUCTION AND COMPOSITION OF BUFFALO MILK IN NORDESTINE SEMIARID

The objective was to evaluate the production, chemical composition and hygienic -sanitary parameters of the milk of Murrah buffaloes raised in the northeastern semi-arid region. Data from 13,752 observations of lactations collected from the years of 2013 to 2016 from a herd in Rio Grande do Norte were used. The milk quality through chemical composition (contents of fat, protein, lactose, and total solids), somatic cell counts (CCS), and milk production was evaluated. Data were submitted to Pearson correlation, factorial analysis, multivariate analysis of variance per year, and per period (dry, rainy and transition) and canonical discriminant analysis by year and period. A high and positive correlation was observed for total solids and fat (0.91) and positive correlation for protein and total solids, yield, and lactose. Through factor analysis, four factors were selected that explained 88% of the total variation. The first factor was considered "milk quality factor, " the second factor as "milk production factor," and the third factor as " sanity factor." There was significant effect based on year but not significant effect based on period for the studied variables. In the canonical discriminant analysis per year, the discriminant power variables were (CCS), milk production and total solids. For periods were protein, production, lactose, fat, and total solids. Multivariate analysis was efficient in evaluating the production and chemical composition of buffalo milk and can be safely used in future studies.


INTRODUCTION
The buffaloes have been highlighting out and gaining space in Brazilian livestock sector. The northeast region has also advanced in the production of buffalo in recent years; the herd had a herd of 83,018 heads, with Maranhão being the state with the most significant number of heads, with a total of 57,305 buffalo, followed by Bahia with 16,033, Pernambuco with 5,239 and Rio Grande do Norte with 1,311 buffalo heads (IBGE, 2017). This growth in the region is mainly characterized by the excellent adaptability of these animals to the edaphoclimatic conditions of the northeastern semi-arid region, being used for meat and milk production. (ARAÚJO et al., 2011).
Buffalo milk is considered to be of high nutritional quality, presenting with high levels of lipids, protein, caloric value, vitamin A and calcium, in addition to the presence of levels of conjugated linoleic acid, which helps in the fight against diseases, such as obesity and diabetes characteristics that confirm the great nutritional richness of this raw material (FIGUEIREDO; JÚNIOR; TORO, 2010). However, this nutritional configuration may be variable considering the processes of environmental, genetic, and nutritional influence. (SOARES et al., 2013;SOUZA et al., 2010).
Another crucial parameter to evaluate the milk quality is the somatic cell count (SCC), where high rates of this count may indicate the presence of pathogens in the udder, usually causing mastitis and leading to reduced milk production, and it may interfere in cheese processing that can lead to production losses (CASTRO et al., 2014).
Significant correlations are found in these cases where milk yield, chemical composition, and SCC levels are influenced by food management, times of the year, and increase of daily management dirt (BARRETO et al., 2010;VARGAS et al., 2014).
It is common to observe studies in the literature that evaluate the production and quality of bovine milk in the northeast region, but for buffalo, the studies are still scarce. Given the multivariate nature of the data, it is necessary to use multivariate analysis to evaluate the milk production characteristics.
In this context, this study aimed to evaluate the production of buffalo milk; it is chemical composition and somatic cell count in the northeastern semi-arid region, using multivariate analyses.

Study area
The data used came from the production, control, and milk quality records of the buffalo herd of an agricultural company, which has been engaged in agribusiness for 27 years, where one of its main activities is the production of buffalo milk and its derivatives over 15 years.
The farm is located in the municipality of Taipu, 50km from Natal, and in the Agreste region of the state of Rio Grande do Norte. The region has a warm semi-arid climate (BSh), with a dry season from August to January and a rainy season from February to July. The region's average rainfall is approximately 962 mm per year, with November being the driest month with 7 mm and April having the highest rainfall with an average of 180 mm. The average temperature on the property is 25.8 °C, with a maximum and minimum of 31.6 °C and 19.7 °C, respectively, with an average relative humidity of 79% (CLIMATE-DATA.ORG, 2018).
The production system adopted on the property is semi-intensive, with animals managed in rotational grazing using pasture type Panicum maximum cv. Massai and Brachiaria brizantha, with supplemented utilization, in the dry season, of forage palm, associated with other sources of roughage and concentrate in the trough, as an alternative for improving the nutritional status of animals during periods of low nutrient availability.

Sample and data collection
Data were obtained from 2013 to 2016, totaling four years of collections and 13,752 observations of lactations of species Bubalus bubalis, belonging to the Murrah breed. Information from an average of 273 monthly observations was used, with a 12-month average of 3,276 observations for 2013; there was a monthly average of 311 and a 12-month average of 3,732 for 2014; 304 and 3,648 for 2015 and 258 and 3,096 for 2016. Milking was performed twice a day using mechanical milking machines.
The collection of the milk samples was zonce a month during the 12 months of the year, over the four years. All samples were taken directly from the property's cooling tank and placed in sterile plastic vials containing a Bronopol reagent pill (2-Bromo-2nitropropane-1-3-diol) to keep milk characteristics intact. Subsequently, the samples were sent to the milk analysis laboratory of the Federal Rural University of Pernambuco in the Department of Zootechny, where the chemical composition analyses of the milk were performed; this included protein (PROT), fat, lactose (LAC) and total solids (TS), in addition to somatic cell count (SCC), a parameter that has a high relationship with the hygienic safety of milk and the health status of the mammary gland of animals. Milk yield (MP) was evaluated according to the monthly average. There were 3 determined periods the dry period, represented by the months of October to December, the rainy period from March to July and the transition period, which included the months of January, February, August and September ( Figure 1) used to verify the effect of rainfall on yield, milk quality and sanitary hygienic profile of milk of buffaloes raised in the northeastern semi-arid region by year and period.

Data analysis
The data were submitted to Pearson correlation analysis, factorial analysis, multivariate analysis of variance (MANOVA), and canonical discriminant analysis. Pearson correlation analysis was used to verify the correlations between rainfall, milk yield and milk composition. The principal component-based factor analysis was applied to summarise the original set of variables in a few factors and to indicate the variables that most contribute to the evaluation of milk yield and composition. Multivariate analysis of variance (MANOVA) was used to compare the vectors of the mean for variables evaluated by year (2013 and 2016) and period (dry, rainy, and transition) using the Wilks lambda test (MORRISON, 1976). Canonical discriminant analysis was adopted to verify the existence of differences between the years and periods, based on the evaluated parameters and to find functions of the observed variables that could explain these differences. The discriminant process used year and period as group variables to analyze production and milk composition.
The most discriminate traits were choose based on the F statistic (HAIR JR et al., 2009) in a stepwise method.
Data were analyzed using Statistical Analysis System® (SAS) version 9.0 statistical software using PROC CORR procedures for Pearson correlation analysis, PROC FACTOR for factor analysis using Varimax orthogonal rotation, PROC MANOVA H for multivariate analysis of variance and PROC DISCRIM for canonical discriminant analysis. For the extraction of factors, the methodology of Jolliffe (1972Jolliffe ( , 1973, whose extracted eigenvalues must be above 0.70, was considered. For the formation of the canonical discriminant graph and factor analysis, the Statistica® version 6.0 statistical program was used.

RESULTS AND DISCUSSION
High positive and significant correlations were found only between solids and fat percentage. Negative and significant correlations were found between fat and lactose, lactose and SCC, SCC and protein, SCC and milk yield, and SCC and rainfall (Table 1). Almost zero correlations were verified for the other parameters, highlighting milk yield and total solids (-0.01), which corroborates the results found by Soares et al. (2013) in their evaluating the composition and milk yield of buffaloes and finding low correlations between milk and protein production, protein and lactose and total solids and rainfall.
The high correlation found between fat and total solids variables can be explained by the fact that the variation in total solids content is mostly dependent on the variations in milk fat content, a fraction that has the largest range of variation (GONZÁLEZ et al., 2001) and proportion, oscillating as a consequence of several factors, including genetic, nutritional and environmental (SOARES et al., 2013;SOUZA et al., 2010). Thus, the amount of total solids present in milk tends to be more dependent on the amount of fat than on the amount from other solid components of milk.
Following the effects of positive linearity, as milk fat increases, total solids are likely to increase in the same direction and proportion. In contrast, negatively correlated variables will vary in the opposite direction and be proportional to the value of the correlation obtained.
Through factor analysis, it was possible to form 6 factors that represented 100% of the accumulated variance, of which were selected 3 factors, by the Jolliffe method that explained 85% of the total data variation; that is, 50% of the factors explained most of the variation, allowing a reduction of the sample space by 50%, with a loss of explanation of only 15% (Table 2). 1 It is observed that the commonalities (Table  3) ranged from 0.72 to 0.97, with lactose and fat being the variables with the lowest and highest contribution, respectively, to explain the total variability of the extracted factors.
The commonality is an index of total variability and represents the proportion of variance for each variable used in the analysis-that is, how much a given variable contributed to explaining the total variance of the factors considered (MORRISON, 1976;JOHNSON;WICHERN, 2007;MINGOTI;. Usually, the minimum acceptable commonality value is 0.50, with a low commonality between a set of variables, indicating that they are not linearly correlated and thus cannot be included in the factor analysis (HAIR JR et al., 2009). All variables presented commonalities above 0.70, which suggests a good fit of the model. The fat variables (0.97) and total milk solids (0.94) presented higher eigenvectors (Factor 1 - Table 3), and this factor explained 39.62% of the total data variation. High correlation variables tend to share the same factor. The component fat is the one that most varies within the fraction of milk total solids; these variables have a high correlation (0.91) ( Table 1).
The components present in milk are responsible for its nutritional value, being that fat is the most critical fraction for the yield in the manufacture of derivative products. (NASCIMENTO et al., 2016). Factor 1 was then called "milk quality factor," due to the highlighted variables being directly related to milk quality.
Factor 2 explained 31% of the total variation of the data, and milk yield was the variable with the highest weight or factorial charge in this factor, denominated "milk production factor." Another variable that also stood out in the second factor was lactose (-0.70), a variable directly linked to the amount of milk produced since lactose is related to the regulation of osmotic pressure in the mammary gland. In the process of milk synthesis, lactose attracts water to the mammary epithelial cells, so the higher lactose production determines higher milk production. (GONZÁLEZ et al., 2001;ROSA et al., 2012). Milk production is responsible for the income generation of hundreds of producers and also for their high participation and their derivatives in the essential human nutrition and, consequently, in the indexes that calculate inflation (GOMES, 2017).
Milk protein was the variable with the highest factor load in the third factor (0.84), and together with SCC (-0.75), explains 14.65% of the total data variability, being termed as "health factor." The relationship between increased somatic cell count and total milk protein has variable results. This occurs when the somatic cell count (SCC) is elevated, then the protein composition of the milk changes, and the true protein synthesis, especially casein, is reduced as the secretory tissue is compromised. However, total protein levels are maintained by the presence of blood proteins, which are released into the alveolus mammary due to increased vascular permeability. (AULDIST; HUBBLE, 1998). This result is confirmed by the correlation value observed in this study (-0.37). As the amount of SCC increases, the amount of protein decreases.
Plasmin, an endogenous protease enzyme naturally present in milk, presents a significant increase in milk when there are infectious processes, which in turn increases the number of somatic cells in milk. These enzymes rapidly degrade the proteins, α1-and β-caseins, lowering protein concentration and increasing somatic cell concentration in milk (CRUZ et al., 2016).
Based on the multivariate analysis of variance (MANOVA), it was observed that there was a significant difference between the vectors of mean, both the Wilks test and the other tests (Pillai, Hotelling-Lawley) (Table 4). Thus, it can be inferred that there was a difference between the different years based on the evaluated characteristics of production, fat, protein, lactose, total solids, and SCC. The linear functions generated from the discriminant analysis are shown in Table 5. The results obtained through the stepwise method indicate that the most discriminant variables were the somatic cell count (SCC), milk yield and total solids. Only 2016 was classified fully (100%) in its respective group, indicating a certain homogeneity of data for that year. The year 2013 was classified in its group. The years 2014 and 2015 were the years of explaining the most variation: 42% and 55%, respectively. It has been indicated that these years present different behavior when compared to the other years.  1 TS-total solids, MP-milk production, SCC-somatic cell count.
It is possible to observe that the variables of more considerable significance or higher discriminatory power (CCS, MP and ST) are those associated with milk production, milk quality and sanitary hygienic quality, indicating that these parameters are essential to be included in characterization studies in the production of buffalo milk, taking into consideration the year and the periods of the year (Table 5).
Somatic cell count has been considered as a standard measure of milk quality because it is related to milk composition, industrial yield, and food safety of milk. It is also of high relevance to producers because it indicates the health status of the mammary glands of cows and may signal significant losses in production and changes in milk quality (BUENO et al., 2005;SABEDOT et al., 2011).
As shown in Figure 2 graphs, the differences between the years were evaluated based on milk yield, milk composition, and somatic cell count through canonical discriminant analysis. The first two canonical variables (CAN1 and CAN2) were sufficient to explain 83.14% and 11.73% of the total data variation, respectively. These results allow us to study the behavior of the variables through 2 canonical variables with information security, since the two linear combinations explain most of the total variation (95%), with only 5% loss of explanation of the studied phenomenon. The higher the proportion of variation explained in the first two canonical variables, the more efficient the analysis and the better the explanation of the multivariate phenomenon. In the canonical representation of the different years of production studied, it turns out that there was a distinction between 2013 and 2016, indicating differences in the behavior of production variables, milk quality (protein, fat, lactose and total solids) and CCS in the different years studied. Already, the years 2014 and 2015 were close to each other, without clarity in the separation, which is a reflection of the considerable variation that occurred in these years for the parameters evaluated in the study, generating a low percentage of correct classification (Table 5) Through multivariate analysis of variance (MANOVA), it was observed that there was no significant difference between the vectors of mean by the Wilks test and other tests (Table 6). Thus, it is hypothesized that the vectors of mean for the parameters evaluated in the different periods studied are equal. The variables with the highest discriminatory power (p < 0.001) among the dry, rainy, and transition periods were protein, production, lactose, fat, and total milk solids (Table 7). A higher percentage of classification in the rainy season was observed, followed by those evaluated in the dry and transition periods with 68%, 50%, and 33%, respectively.  1 PROT-protein, TS-total solids, MP-milk production, LAC-lactose.
It is possible to observe that the variables that presented higher discriminating power between the evaluated periods were those related to milk production and quality. The SCC was not included in the linear discriminant function, indicating that this variable had little contribution to the discrimination of the evaluated periods (Table 7).
It is common to observe fluctuations in production and chemical quality in milk throughout the year. The higher forage supply in the rainy season probably affords higher volumes of milk. In the driest periods, there is a decrease in the amount and quality of available food and thermal discomfort, among other factors, affecting the producer income due to the lower supply of milk to the market.
However, by decreasing the amount of milk produced, in some cases, the nutritional quality of the milk is increased, and, in the case of buffalo, this concentration is even higher than that of cattle. Araújo et al. (2011) evaluated the influence of seasons of the year on the composition of buffalo milk and found higher fat, lactose, and total solids content in milk produced in the dry season.
The canonical discriminant analysis based on the studied variables of production, quality and SCC in the different periods allowed the formation of 2 canonical variables (CAN1 and CAN2), representing 93.01% and 6.99% of the total variation, respectively (Figure 4). The two combinations represent 100% of the total variation, with no loss of explanation, which demonstrates that the first two axes are sufficient to explain the total variation between the periods studied. It is possible to observe, through the canonical representation (Figure 4), that there is no clear separation of the evaluated periods, which is a result already expected when observing the percentage of classification within the evaluated periods (Table 7). This result may also be related to the considerable rainfall variation in the evaluated periods, and variation in milk characteristics may or may not occur.
Rain irregularity in Northeastern Brazil is common (COSTA; SIMÕES; RIET-CORREA, 2009;MARENGO, 2006), representing a constant obstacle to the development of agricultural activities. This phenomenon affects the quantity and quality of forage (LUNA et al., 2012), often forcing the dairy farmer to use supplementation during periods of food shortages (FERREIRA et al., 2009), increasing the risks of ectoparasite and endoparasite infestation in the highest rainfall months (COSTA; SIMÕES; RIET-CORREA, 2009). All these factors can be considered as determinants for the variation in milk production and, consequently, for the compounds present in milk.

CONCLUSION
Factor analysis was useful for identifying a set of 3 latent constructs to characterize the buffalo milk production system. Total milk fat and solids are crucial in studies of this nature since these characteristics have more significant similarities and greater factor loading in the first factor.
The discriminant analysis allowed the verification of the differences existing by year and period of the year, based on the evaluated parameters. SCC, milk yield and total solids were the most discriminating variables per year, and protein, milk production, lactose fat, and total solids were the most important for discrimination by period of the year.

ACKNOWLEDGMENT
This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior -Brasil (CAPES) -Finance Code 001.
To Francisco de Assis Veloso Junior and Claudio Coutinho Bartolomeu for their research support.