Predicting Partition Coefficients of Migrants in Food Simulant / Polymer Systems using Adaptive Neuro-Fuzzy Inference System

A contaminação de alimentos pela migração de aditivos de baixo peso molecular em alimentos processados industrialmente pode ser resultado do contato direto entre a embalagem e o alimento. A concentração do aditivo que migra do material da embalagem para o alimento está relacionada com as propriedades estruturais do aditivo, bem como com a natureza do material empregado na embalagem. O objetivo deste estudo é desenvolver um modelo QSPR pela adaptação do sistema de interferência neuro-fuzzy (ANFIS) a fim de predizer o valor do coeficiente de partição K, no sistema de estudo, embalagem/alimento. Para tal, foram investigados 44 coeficientes de partição em vários sistemas, assim constituídos: 4 de simuladores alimentares, 6 de migrantes alimentares e 2 de embalagens. Um conjunto de 6 descritores moleculares, representando várias características dos simuladores de alimentos (2 descritores), dos migrantes (3 descritores) e de polímeros (1 descritor) foi empregado como a série de dados para avaliar esse estudo. Esta série de dados foi dividida em três subconjuntos: treinamento, teste e predição. A técnica de modelagem ANFIS foi aplicada pela primeira vez neste campo de estudos relacionado com alimento/embalagem. O resultado desta modelagem forneceu um RMSE de 0,0006 e o coeficiente de correlação (R) para o ensaio da predição foi de 0,9920.


Introduction
Continuous efforts in food matrix preservation, distribution and marketing are being made worldwide to supply consumers with high quality products and foods.To avoid food packaging contamination, one should first find the source of contamination.Various interactions between food and packaging materials can contaminate food.Migration of low molecular weight additives from packaging materials into foodstuffs can also contaminate them. 1 Types and levels of solvents and migrating monomers from polymers into foods are important factors of food contamination.3][4] Also, the migration of low molecular weight compounds from a food into polymer has been subject of considerable attention. 5he thermodynamic equilibrium (partition) of the migration process can be defined as an exchange of mass/ energy between the packaging material and food. 6For quality control of food packaging, the partition coefficients between polymer packaging and the food matrix should be known.Fortunately, quantitative structure-property relationships (QSPR) based on computational methods have made possible calculating these partition coefficients.Indeed, QSPRs represent predictive models derived from application of statistical tools correlating chemical property, such as partition coefficient, with descriptors representative of molecular structure and/or property.The success of any QSPR model depends on the accuracy of input data, selection of appropriate descriptors and statistical tools. 7inally the developed model is subjected to validation step.The validation strategies check the reliability of the developed model for its possible application on a new data set, and confidence of prediction can thus be judged.In the current work, we have validated model using three techniques: leave-one-out and leave-multiple-out cross validation techniques and Y-randomization test.
The objectives of the present paper are twofold: i) to explore the structure property relationships of partition coefficient of diverse systems and ii) to compare the developed ANFIS model with the quadratic model reported previously. 8

Theory Adaptive neuro-fuzzy inference system
The proposed neuro-fuzzy model in ANFIS is a multilayer neural network-based fuzzy system. 9-10Its topology is presented in Figure 1.As shown, the system has a total of five layers.In this connectionist structure, the input (layer 0) and output (layer 5) nodes represent the descriptors and the response, respectively.In the hidden layers, there are nodes functioning as membership functions (MFs) and rules.This architecture eliminates the disadvantage of a normal feed forward multilayer network, which is difficult for an observer to understand or to modify.ANFIS simulates Takagi-Sugeno-Kang fuzzy rule 11 of type-3 where the consequent part of the rule is a linear combination of input variables and a constant.For a Sugeno fuzzy model a common rule set with the fuzzy if then rule is as follow: If x is A i and y is A i , then For simplicity, we assume here that the examined fuzzy inference system has two inputs x and y and one output, although the ANFIS contains five layers as shown in Figure 1: Layer 1.The fuzzy part of ANFIS is mathematically incorporated in the form of membership functions (MFs).A membership function µ Ai (x) can be any continuous and piecewise differentiable function that transforms the input value x into a membership degree, that is to say a value between 0 and 1.The most widely applied membership functions are the generalized bell (gbell MF) and the Gaussian function (equations ( 2) and (3), respectively) which are described by the three parameters, a, b, and c.Therefore, layer 1 is the fuzzification layer in which each node represents a membership: (2) As the values of the parameters {a i , b i and c i } change, the bell-shaped functions vary accordingly, exhibiting various forms of membership functions on linguistic label A i .Parameters in this layer are referred to as premise parameters.
Layer 2. Every node in this layer is a fixed node labeled, whose output is the product of all the incoming signals: Every node in this layer computes the multiplication of the input values and gives the product as the output.The membership values represented by µ Ai (x) and µ Bi (y) are multiplied in order to find the firing strength of a rule where the variables x and y has linguistic values A i and B i , respectively Layer 3.This layer is the normalization layer which normalizes the strength of all rules according to equation ( 5): where w i is the firing strength of the ith rule which is computed in layer 2. Node i computes the ratio of the ith rule's firing strength to the sum of all rules' firing strengths.
For convenience, outputs of this layer are called normalized firing strengths.
Layer 4. Every node in this layer is an adaptive node with a node function: where w i is a normalized firing strength from layer 3 and {p i , q i , r i } is the parameter set for this node.Parameters in this layer are referred to as consequent parameters.
Layer 5.The single node in this layer is a fixed node labeled Σ, which computes the overall output as the summation of all incoming signals: (7)   Thus we have constructed an ANFIS system that is functionally equivalent to Sugeno fuzzy model, which was used in the present QSPR study due to its transparency and efficiency.

Cross-validation techniques
The consistency and reliability of a method can be explored using the cross validation technique. 12Two different strategies including leave-one-out (LOO) or leave-multipleout (LMO) can be employed.In LOO strategy, by deleting each time one object from the training set, a number of models are produced.Obviously, the number of models produced by the LOO procedure is equal to the number of available samples (n), e.g.n = 44.Prediction error sum of squares (PRESS) is a standard index to measure the accuracy of a modeling method based on the cross-validation technique.Based on the PRESS and SSY (sum of squares of deviations of the experimental values from their mean) statistics, the Q 2 can be easily calculated by equation ( 8): (8)   In this sense, a high value for the statistical parameter is considered as proof of high predictive ability of the model. 13However, several authors suggest that a high value of Q 2 LOO appears to be necessary but not sufficient. 14For this reason, we also used LMO cross validation technique.In the case of LMO, M represents a group of randomly selected data points which is left out at the beginning and would be predicted then by the model developed using the remaining data points.So, M molecules are considered as a prediction set.The R 2 LMO can be calculated by equation ( 9): ( This algorithm is shown in Figure 2. It is common choosing 10-30% of the total number of molecules to leave-out.In the present work, calculation of R 2 LMO was based on 1000 randomly selections of groups of 8 and 12 samples.The higher value of Q 2 LOO or R 2 LMO indicates the higher predictive power of the model.

Data set and descriptors
The equilibrium distribution of migrants is affected by the partitioning behavior of compounds between polymer packaging and the food matrix.Therefore the nature of food simulant, polymer and migrant are important to avoid food contamination.The data collected by Tehrany et al. 8 was used to develop a QSPR model using ANFIS method.The total data set consists of 44 systems of simulant/polymer/migrant together with their partition coefficient (K).The partition coefficients (K) were used as dependent variable in our QSPR study.As the equilibrium distribution of migrants is dependent on the nature of food simulant, polymer and migrant, the data set consists of systems including three components: (i) Food simulant; (ii) polymer; (iii) migrant.In order to simplify for each system a code (I) was defined by the following equation: 8 I = 100 × L Food + 10 × L Polymer + L Migrant where L Food , L Polymer , and L Migrant are levels for food, polymer, and migrant components, respectively.These levels are given in Table 1 for each compound.Therefore, as an example, a system with I = 224 consists of 10% ethanol/PA/IP.A set containing six molecular descriptors was used.The values of all descriptors are listed in Table 2.As this table shows, these descriptors are polymer polarity, food simulant polarity, simulant molecular weight, migrant molecular weight, migrant LUMO (lowest unoccupied molecular orbital) and migrant HLB (hydrophilicity, lipophilicity balance).

Model development by ANFIS
To develop ANFIS model the data set was divided into three subsets: training, test and prediction.All molecules were randomly placed in these sets.The training set consisted of 22 molecules used to generate the model.The test set containing 11 molecules was employed to take care of the overtraining.The prediction set comprised of 11 molecules was used to evaluate the model.
The compounds included in each set are specified in Table 1.The six simulant/polymer/migrant descriptors were used as inputs for development of the ANFIS model.The model building involves two stages: structure identification and parameter identification.The former is related to finding a suitable number of rules and a proper partition of the feature space.The latter is concerned with the adjustment of system parameters, such as MF (membership function) parameters, linear coefficients, and so on.It is concluded that by increasing the number of MFs per input, the number of rules increases accordingly.For the first stage of ANFIS modeling, grid partitioning should be used for partitioning the features.The number and type of membership functions should be optimized by using RMSE as a criterion for the test set.All ANFIS models were produced using MATLAB 7.0 Fuzzy Logic Toolbox (MATLAB, Mathworks Inc. software, Natick, USA, 2008).

Statistical parameters of ANFIS model
Prediction results of the ANFIS model for all data sets are shown in Table S1 (available as Supplementary Information).The statistical parameters of the resulted model are given in Table 3.In this table, the model is also compared to the quadratic model previously reported on the same data set by Tehrant et al. 8 which is as follows: where x 1 is the polarity of food simulant, x 2 is the polarity of polymer, x 3 is the molecular weight of migrant, x 4 is LUMO, x 5 is the molecular weight of food simulant and x 6 is the HLB of migrant.
It can be seen that the RMSE prediction value has improved from 0.0248 for the quadratic model to 0.0006 for the ANFIS model.It shows that the ANFIS model is (0.248/0.0006 = 41.3)times more precise than the quadratic model.In the other words, this nonlinear model is able to predict the variances of the partition coefficients.The correlation between the experimental and calculated values of the partition coefficients is shown in Figure 3.The residuals of the calculated values of the partition coefficients are plotted against the experimental ones in Figure 4.The propagation of the residuals in both sides of zero line indicates that no symmetric error exists in the proposed QSPR model.

Evaluation of ANFIS models
The models were also subjected to the test for criteria of the validity of the generated model.The cross validation techniques such as leave-one-out (LOO-CV) and leave-multiple-out (LMO-CV) were used to prove the consistency of the model.In particular, the leave-one-out (LOO), leave-eight-out (L8O) and leave-twelve-out (L12O) procedures were utilized in this work for both the ANFIS and quadratic models.The results are shown in Table 4.Note that calculations of R 2 L8O and R 2 L12O were based on 1000 randomly selections of groups containing eight and twelve samples from the original training set.The high values of the R 2 for LOO, L8O and L12O indicate that the proposed model is reliable.
Moreover, to assess the robustness of the ANFIS method the Y-randomization test was applied.The dependent variable vector K was randomly shuffled and a new QSPR model was developed using the original descriptor matrix.The new QSPR model is expected to show a low value for R 2 prediction and Q 2 LOO .Several random shuffles of the K vector were performed for which the results are shown in Table 5.The results tabulated in Table 5 indicate that the ANFIS model is not due to a chance correlation or structural dependency in the training set.

Conclusions
Quantitative structure property relationships (QSPR) were developed for the calculation of K values based on molecular descriptors.Our model was based on the six molecular descriptors: polarity of food simulant, polarity of polymer, molecular weight of migrant, LUMO (lowest unoccupied molecular orbital), molecular weight of food simulant and HLB (hydrophilicity, lipophilicity balance)

Figure 2 .
Figure 2. Scheme of leave-multiple-out algorithm used in this study.

Figure 3 .
Figure 3. Plot of the ANFIS calculated partition coefficient vs. the experimental values for the training, test and prediction sets.

Figure 4 .
Figure 4. Plot of residuals vs. experimental values of partition coefficient for the ANFIS model.

Table 2 .
Physico-chemical properties of food simulant, polymer and migrant a Data extracted from Tehrany et al.8 bHansen polarity: delta / sqr (Mpa)

Table 3 .
Statistical parameters of MLR (multilinear regression) and ANFIS models 8 the MLR model developed by Tehrany et al.8

Table 4 .
Statistics using LOO-CV and LMO-CV methods to compare the results of ANFIS method with quadratic method for prediction of distribution constants LMO was based on 1000 random selections of groups of 8 and 12 samples.Q 2 was calculated by equation 8.Vol.22, No. 8, 2011 a Calculation of R 2

Table 5 .
8 2 and Q 2 LOO values after several Y-randomization tests Forty four different systems of food/migrant/ packaging were predicted using these descriptors.ANFIS as a powerful nonlinear tool was used to develop a model between descriptors and K values.We validated our model using the cross validation techniques of leave-one-out, leave-multiple-out and also Y-randomization test.The theoretical values of partition coefficients showed that there is a good correlation between the physico-chemical and structure of molecule.As final conclusion, ANFIS produced substantially better model than the quadratic model reported recently.8