Milk fraud by the addition of whey using an artificial neural network Identificação de fraude de leite por adição de soro de leite usando redes neurais artificiais

The adulteration of milk by the addition of whey is a problem that concerns national and international authorities. The objective of this research was to quantify the whey content in adulterated milk samples using artificial neural networks, employing routine analyses of dairy milk samples. The analyses were performed with different concentrations of whey (0, 5, 10, and 20%), and samples were analyzed for fat, non-fat solids, density, protein, lactose, minerals, and freezing point, totaling 164 assays, of which 60% were used for network training, 20% for network validation, and 20% for neural network testing. The Garson method was used to determine the importance of the variables. The neural network technique for the determination of milk fraud by the addition of whey proved to be efficient. Among the variables of highest relevance were fat content and density.


INTRODUCTION
The adulteration of milk by the addition of whey is a problem that concerns authorities; it takes place because whey has a low commercial value and is thus little used in the manufacture of dairy products. Milk that conforms to the regulations may be confused with adulterated milk, depending on the evaluation technique employed. The average composition, range, and properties of milk as shown in table 1 must be taken into account (IBANEZ et al., 2001).
The regulations for the industrial and sanitary inspection of products of animal origin, consider milk fraud to occur when there is substitution of its characteristic components by other aggregates. This includes the addition of substances of any nature to conceal alterations; or any deficiency in the quality of the raw material; any defect in the elaboration; or an increase in the volume/weight ratio of the product. (BRASIL, 2017).
The main substances used to defraud milk customers include water, constituents such as sugar, flour, and whey; preservatives such as chlorine, hypochlorite, hydrogen peroxide, or formaldehyde; and neutralizing agents such as sodium hydroxide. Although, legislation requires daily testing for these substances, milk evaluation in the industry is routinely performed by using only physicochemical analyses 1 Condé et al.
such as density and freezing point (AFZAL et al., 2011). However, these frauds are often calculated to prevent their identification by non-specific routine proofs.
The whey from manufacturers of cheese can be considered sweet or acidic. It is sweet when enzymatic coagulation occurs in the 5.9-6.3 pH range and an example of this type is the whey from the manufacture of mozzarella or cheddar. Whey is acidic when casein precipitation is occasioned by adjusting the pH to 4.6 with organic acids (SADAT et al., 2006;DI DOMENICO et al., 2017;ZHAO et al., 2017;LOPES et al., 2018).
Fraud by adding whey to milk is extremely wide-spread because it promotes an increase in the volume of milk marketed without significantly changing the percentage of proteins or effecting noticeable sensory changes for most people. These advantages stimulate the ambition of stakeholders to increase profits rapidly and illicitly and also of fraudsters who benefit from a lack of proper oversight, flaws in the legislation, and the resulting impunity (ROCHA et al., 2015).
The amount of cheese-whey added to milk and, depending on the schemes used to mask this type of fraud, official, routinely applied industry analytical procedures that lack the necessary sensitivity, often lead to misjudgments of the required milk quality (FUENTE & JUAREZ, 2005).
There are several standard techniques used to detect fraud, and new methodologies are being developed in response to new types of fraud. Whey intentionally added to milk is commonly detected and quantified by determining the casein macropeptide (CMP) content. This is a hydrophilic fragment of K-Casein which remains soluble in whey and should be absent in milk. It is released by the action of chymosin during the enzymatic coagulation of milk, Brazilian legislation uses the quantitative determination of CMP as a criterion for assessing milk quality resulting from the proteolytic action of enzymes and considers that milk with CMP concentrations above 75 mg.L -1 as not suitable for human consumption (BRASIL, 2006a(BRASIL, , 2006b).
An alternative method for classifying samples is the use of artificial neural networks (ANNs). They can operate as predictive models that describe the functional relationship between the input and output variables of a system. The ANNs have several advantages over traditional or empirical models. They map input and output variables that can be used to predict system output parameters (VALENTE et al., 2015).
An artificial neural network is a computational model consisting of simple processing elements (artificial neurons) that apply a given mathematical function to the data, generating a single response. They are arranged in layers and are linked together. These connections are usually associated with denominated weights. A process called training or learning accomplishes the adjustment of these weights. It is also responsible for extracting data characteristics and storing the networks' knowledge. The application of a network consists of a generalization process, which is the use of a network trained to respond to unpublished data (ROCHA et al., 2015).
The ANNs have been defined as parallel systems consisting of simple processing units, also called artificial neurons or nodes, connected in a specific way to perform a mathematical function that is usually nonlinear. Artificial neurons are simplified mathematical models of biological neurons, and they process the information received and weighted by synaptic weights, providing a single output (VALENTE et al., 2014b). The most frequently used networks are those with three layers, being an input layer, in which the data are introduced to the network, an intermediate layer, where the entire calculation process occurs, and an output layer, where the result is obtained. The use of ANNs has become an alternative means to pinpoint fraud by the addition of whey to milk (VALENTE et al., 2014a).
The Garson method is used to classify the relative importance of the input variables. This method mainly involves dividing the hidden output connection weights of each hidden neuron into components associated with each neural network input. In contrast to the general method of connection weights, this method uses the absolute values of the connection weights (VALENTE et al., 2014b).
The objective of this research was to evaluate the use of the Garson method to classify the relative importance of the variables for the indication of milk whey fraud, using ANNs.

MATERIALS AND METHODS
The procedure for analyzing the relative importance of the variables was performed according to the sequence presented in figure 1.
The 164 milk samples were collected over 6 months on different production days. The routine analyses at the dairies were performed using an ultrasound milk analyzer (Master complete -Akso ® ). The routine analyses were of fat content (FAT), non-fat solids (SNF), density (DST), protein (PTN), lactose (LCT), minerals (MNL), and freezing point (FP).
The whey used to adulterate the milk was from dairy cheese production. A 50 mL portion of milk was used for the analyses. Control samples were analyzed without adulteration. For milk fraud, whey concentrations of 5, 10, and 20% were used to determine the amount of milk. The data obtained in the analyses were normalized for the scales of the variables to be equalized using equation 1: (1) where X i is the normalized value, X is the value observed for each variable, max (x) is the maximum value of the variable, and min (x) is the minimum value of the variable.
The samples were randomly divided with 60% being assigned for the training of the artificial neural network, 20% for validation, and 20% for the test (VALENTE et al., 2014b). The network training function was TRAINLM, which applies Levenberg-Marquardt optimization.
The number of neurons in the input layer was defined by the variables under investigation, these being the fat content, non-fat solids, density, protein content, lactose content, mineral content, and freezing point. The output layer was the percentage of whey in the sample (control, 5, 10, and 20%). The number of neurons in the hidden layer was defined by trial and error, using the correlation coefficient (r) for the training, validation, and test data groups. The smallest mean quadratic error is given by equation 2: (2) where I j corresponds to the relative importance of the jth input variable in the output variable; Ni and Nh are the numbers of neurons in the input and hidden layers, respectively, W is the weight of the connections; the superscripts i, h, and o refer to the hidden input and output layers, respectively, and the indexes k, m, and n refer to neurons respectively of the input, hidden, and output layers.

RESULTS AND DISCUSSION
According to the analyses performed in the samples of adulterated milk (5%, 10%, and 20% of whey) and unadulterated milk, the mean was obtained for each variable, as shown in table 2.
Both the adulterated and unadulterated milk values are in agreement with table 1, which according to established standards are at least 7.9% for SNF, 1.028 to 1.034 g.mL -1 for density, a minimum of 2.3% for protein, and a minimum of 0.57% for salts.
Only the cryoscopy index, that is, freezing point, of the sample with whey adulteration of 20% was outside the parameters recommended by Brazilian legislation. According to table 2, the salt content in the milk decreased by 10 % and the whey by 20%. Ribeiro-Santos et al. (2015) reported that the whey salt content is lower than that of milk.
The best network presented 15 neurons in the hidden layer ( Figure 2). The feed-forward ANN, which is a network that carries out the entire calculation process until it reaches a result with the lowest possible error, was used.
The amount of whey in the samples from the laboratory´s routine analyses, following table 3 could be simulated using the ANN results The result calculated by ANN is the mean of each assay used in the study. According to table 3, there is a relative error requiring a precise assessment method to confirm fraud. The quality management records of a company allow one to monitor the process. The construction of an artificial neural network that uses the historical data of a company will make it possible to establish a more reliable network for detecting fraudulent food production thereby providing an opportunity for companies and supervisory bodies to manage possible fraud.
The weights of the connections between the layers are listed in table 4. A method for evaluating the relative importance of each input variable in the output layer result of ANN is to use equation 3. The order of relative importance of the variables analyzed to establish the whey content in the samples are shown in figure 3 According to figure 3, the variables most affected when whey is added to milk are the fat content and density. Although, the fat content is the most significant, the relative importance of each variable is very close when the whey is added in low proportions. This frequently prevents whey fraud from being identified. The composition of bovine whey has approximately 0.5% fat, justifying the decline in percentage when whey is added to milk (ANTUNES, 2003). However, the change in fat content alone does not always allow fraud by the addition of whey to be identified; instead, the variations that co-occur in all variables are more revealing.
It is challenging to detect whey-adulterated milk because these two substances have similar physical and chemical characteristics. Traditional methodologies for monitoring this fraud are based on caseinomacropeptide analysis (MENDES et al., 2016). The use of neural networks to identify possible whey fraud from routine dairy analyses is based on minimal changes to these characteristics. The electrical conductivity of milk was described by NIELEN et al. (2010). The use of electrical conductivity as a mastitis detection tool is discussed in the same article. Systems that combine multiple data and perform multifactorial analyses will be of interest to the dairy industry. Therefore, in combination with the attributes analyzed in this developed a methodology for detecting bovine milk adulteration by applying electrical impedance measurements. This parameter allows samples of raw and UHT milk to be characterized when adulterated with different proportions of drinking water, deionized water, hydrogen peroxide (H 2 O 2 ), sodium hydroxide (NaOH), and formaldehyde. The samples were electrically analyzed by electrical impedance spectroscopy measurements. A classification of the results by using the k-nearest neighbor algorithm, which allows the samples of pure and adulterated milk to be assessed quantitatively, was proposed.
In another study, a sequential strategy was proposed to detect adulterants in milk using a technique based on mid-infrared spectroscopy and soft independent modeling of class analogy (GONDIM et al., 2017). Models were established with low target levels of adulterations, including formaldehyde, hydrogen peroxide, bicarbonate, carbonate, chloride, citrate, hydroxide, hypochlorite, starch, sucrose, and water.
The use of different techniques may contribute to the development of a methodology to identify whey fraud using ANNs or even to quantify whey content added to milk. MENDES et al. (2016) proposed a new approach to detect and quantify this fraud using the fatty acid profiles of milk and whey. Fatty acids C14:0, C16:0, C18:0, C18:1, C18:2, and C18:3 were selected by gas chromatography associated with discriminant analysis to differentiate milk and whey, as they are present in significantly different amounts. These  six fatty acids were rapidly quantified by capillary zone electrophoresis in a set of adulterated milk samples. The technique was useful for the evaluation of milk adulterated with whey. However, this technique requires more sophisticated equipment and is not ordinarily available to the dairy industry in Brazil. One suggestion is that the 5% whey limit is probably not enough; therefore, it is necessary to conduct research with less added whey and include other milk properties. A combination of artificial intelligence and the analysis of industrial routines and the incorporation of analyses such as color, electrical conductivity, and impedance, for example, could be efficient strategies for screening and reducing the number of samples subjected to confirmatory analysis. The exceptional contribution of this article is the combination of rapid and routine techniques with ANNs to identify whey fraud.

CONCLUSION
The addition of whey to milk decreases the levels of some components, with fat and density most affected by this addition, but constituent levels continue to meet legislative standards. A neural network architecture was found to determine milk fraud, since it has 15 hidden layer neurons, specifying possible fraud in the addition of whey to milk. The assessment of the relative importance of input variables by Garson method showed that the analyzed parameters that had the highest relative importance were fat content and density relative.