Preliminary investigation of Terahertz spectroscopy to predict pork freshness non-destructively

mczhao@njfu.edu.cn Abstract Freshness, a very important criterion for pork quality control, is normally assessed by the index of K value. On this paper, Terahertz (THz) spectroscopy was employed to predict K value of pork nondestructively. The THz spectra (0.2~2.0THz) of 80 pork samples with different freshness in the attenuated total reflectance (ATR) mode were acquired. Simultaneously, their K values were determined by high performance liquid chromatography (HPLC). A back propagation artificial neural network (BP-ANN) prediction model of K value was established. The precision of BP-ANN was further improved after optimization by the algorithm of Adaptive boosting (AdaBoost), whose root mean square error of prediction (RMSEP) and correlation coefficient ( R P ) were 9.89% and 0.84 respectively in the prediction set, indicating that the non-linear models (BP-ANN and BP-AdaBoost) were superior to the linear principal component regression (PCR) model. The topological neural network architecture was much more suitable for analyzing complicated regression relationship between K value and THz spectra. Ot can be concluded that the THz spectral coupled with BP-AdaBoost algorithm is capable of predicting the pork K


Introduction
Pork is delicious and rich in nutrients. Ot is the main consumption type of meat products, accounting for 37% (Food and Agriculture Drganization of the United Nations, 2014). The food safety of pork products, especially freshness has increasingly been addressed. Generally, there are two main ways to assess meat freshness: Sensory evaluation and physical or chemical analysis (Gil et al., 2011). The former is subjective and boring, while the latter is accurate and reliable with high repeatability. The physical and chemical indexes include flesh color (Chun et al., 2014), total viable count (Li et al., 2016;Tao & Peng, 2015), total volatile basic nitrogen content (Huang et al., 2014), K value (Qiu et al., 2016), pH (Liu et al., 2014) and biogenic amine content (Wang et al., 2014), etc. Among those, the K value has attracted wide attention as an index of meat freshness, and is proved to be feasible in pork freshness detection in recent years (Cheng et al., 2016;Gil et al., 2011).
After slaughter, ATP is gradually decomposed according to the following sequence: ATP→adenosine diphosphate (ADP)→adenosine monophosphate (AMP)→inosine monophosphate (OMP)→inosine (HxR)→hypoxanthine (Hx) (Shahidi et al., 1994). Based on ATP-related breakdown compounds, the K value is obtained by Equation 1, indicating the degree of ATP decomposition: HxR Hx K 100% ATP ADP AMP IMP HxR Hx K value is usually obtained by high performance liquid chromatography (HPLC) (Mora et al., 2010). This method is accurate and reliable, but time-consuming, destructive, and consumes large amounts of chemical reagents, which can not meet the requirements of rapid measurement in production. Therefore, the determination of K value in pork by non-destructive detection techniques is becoming an important research focus. On this paper, terahertz (THz) spectroscopy was used to determine K value, which can perform nondestructive detection of pork freshness.
THz is situated between infrared light and microwave radiation, with both photons and electron properties (Ferguson & Zhang, 2002). Ot belongs to far infrared band and the frequency is in the range of 0.1THz~10THz. THz wave diversity makes many chemical molecules exhibit molecular motion characteristics in THz band different from other wavelengths. Biological macromolecules, such as amino acids (Li et al., 2017), peptides (Zhang et al., 2016), DNA  and biological small molecules, such as nucleotides (Shen et al., 2003), can absorb THz wave due to molecule rotation, molecule vibration and overall vibration of molecular clusters. Therefore, polymeric biomaterials can be analyzed nondestructively by characteristic peaks and data of biomaterial THz spectra. THz spectroscopy is widely used for detection of food quality, such as deterioration of wheat grains (Ge et al., 2014), geographical origin discrimination of olive oils (Liu et al., 2018), tetracycline hydrochloride residues in milk (Qin et al., 2017) and detection of transgenic food . There are some studies on the identification of biological tissue in THz spectrum. Moisture content of muscle is different from that of fat tissue, which causes different absorption of THz wave. Ot could be used to detect water content distribution diagram of fresh pork, mutton, chicken and deli meats in THz band (Hoshina et al., 2009;Singh et al., 2008;Wang et al., 2010). However, there is no report available on the detection of meat freshness by THz spectra.
Therefore, the objective of this study was to explore the THz spectroscopy technique to detect pork freshness rapidly and nondestructively. On the meantime, the different classification algorithms were applied for qualitative analysis of THz spectra. The specific research work was carried out according to the following four steps: (1) THz spectra data acquisitions, (2) spectra preprocessing, (3) principal component analysis (PCA), and (4) developing K value prediction models. On step (4), three different prediction algorithms, namely principal component regression (PCR), back propagation artificial neural network (BP-ANN), and BP net adaptive boosting (BP-AdaBoost), were used to develop the prediction models, respectively, and the optimal model was obtained from the three models.

Preparation of pork samples
Fresh pork's longissimus muscles from 8 Landrace pigs (approximately 24 h postmortems) were purchased from Nanjing Metro Supermarket and taken to Non-Destructive-Testing Laboratory of Nanjing Forestry University in 30 min by ice boxes with an inner temperature of 2~6 °C. Test samples were trimmed into 80 pieces of 2.5×2.5×0.5cm (length×width×thickness) on a sterile surface and packed separately in commercial food grade polyethylene bags. The samples were placed orderly in a lab refrigerator (Siemens Company, Chuzhou, China) and stored at 4 °C for 0-7 days. Dn each day of the experiment, 10 samples were withdrawn randomly for THz spectra and reference K value analysis. Day 0 samples were used immediately before storage.

Spectra acquisitions
The THz spectra acquisitions were performed in the attenuated total reflectance (ATR) mode using the TAS7500 spectrometer (Advantest Co., Kitakyushu, Japan). Each spectrum was the average of 2048 automatical scans to improve spectral signal noise ratio (SNR). The spectral range was 0.2~2THz and frequency resolution was 7.6GHz; thus, each spectrum is consisted of 250 spectral variables (i.e., data points). Each sample was placed on the ATR inspection window surface and collected three times by each side to reduce the random error. Six spectra collected from the same meat sample were averaged for further analysis. During spectra acquisitions, THz spectrometer is sensitive to the change of outer temperature and humidity. Therefore, all measurements were carried out at 25±1 °C, under the circumstance of a dry air purged container with the relative humidity less than 5% (±0.1%).

ATP-related compounds extraction and HPLC analysis
After scanned by the THz spectrometer system, ATP-related breakdown compounds (ATP, ADP, AMP, OMP, HxR, and Hx) of the pork sample were determined immediately according to HPLC procedure of Özogul et al. (2010) with some modifications. Two grams of minced pork meat was homogenized for 1 min with 20 mL of 10% chilled perchloric acid at 4 °C. The obtained homogenate was centrifuged for 10 min at 4 °C with the revolving speed of 8000 rpm and the supernatant was decanted. Then, the precipitate was re-extracted by another 20 mL of 5% chilled perchloric acid by repeating the above operations. The supernatant from the two extractions were merged and neutralized to pH of around 6.5 with 1 mol/L sodium hydroxide solution. The precipitated sodium perchlorate was removed by filtration after centrifugal process at 8000 rpm for 10 min at 4 °C.Finally, the filtrate was diluted to 50 mL with ultrapure water prior to storage at -20 °C until further HPLC analysis.
The HPLC analysis was performed on a AQ-C18 column (4.60×250mm) (Hypersil GDLD, Thermo-Fisher Co., MA, USA) with ultraviolet detection at 254 nm equipped in a Finnigan Surveyor (Thermo-Fisher Co., MA, USA) HPLC apparatus. The injection volume was 1μL, and the flow rate was modified at 200μL/min. The chromatographic separations were achieved by using phosphate buffer solution (0.05 mol/L tripotassium phosphate dissolved in ultrapure water). The contents of the ATP-related compounds were determined according to the standard curve by the peak area of each compound in the range of 0~0.5 mmol/L, and the K values were calculated. ATP-related compounds standards were purchased from Sigma-Aldrich (St. Louis, MD, USA). All other reagents used were analytical grade with HPLC reagents being exception which were chromatographic grade. Figure 1a presented the raw spectra profile of pork samples, and raw spectra data needed further preprocessing. A spectra preprocessing method named first order derivative (FD) was applied in this study. Ot could remove slope variation and reduce the background interference. This transformation was done for each spectrum individually as illustrated in the following equation 2:

Spectral preprocessing
where is the variable in spectrum after FD preprocessing, is the variable in raw THz spectrum, and is the width of differential window. The spectra after FD preprocessing were presented in Figure 1b.
On order to weaken noise generated by derivative calculation, the Savitzky-Golay (SG) polynomial smoothing was adopted after the derivative pretreatment. On the derivation process, differential width selection is very important: if the width is too small, the noise will be great, affecting the quality of the model built; if the width is too large, the smooth will be great and lose a lot of detailed information (Chen et al., 2011a;Chia et al., 2012). This paper studied the modeling performance of spectral data within the differential width of 25.

Data analysis method
The large amount of information provided by spectral data required advanced data analysis approaches. This could be achieved through the integration of modern analytical platforms with computational and chemometric techniques (Miller & Miller, 2005). On this study, the multivariate statistical analysis methods including the linear regression methods of PCR and the non-linear method of BP-ANN, BP-AdaBoost were used to develop the prediction models, as considering that the growth of microorganisms in meat is a complex process.
BP-ANN simulates the cognitive function of human brain through a feed-forward multilayer network. Ot is a powerful tool to explore and reveal complex relationship between inputs and outputs (Hong et al., 2015;Timsorn et al., 2016). The topological structure of neurons is usually designed with 3 layers (an input layer, a hidden layer and an output layer) of unidirectional connections from input to output . Weights of connections are modified by several iterations according to the minimization of the output error (Prevolnik et al., 2009).
The Adaptive Boosting (AdaBoost) algorithm was introduced by Freund & Schapire (1997) and Zhang et al. (2005), and a lot of practical problems have been solved in the past few decades. Ot is one of the most popular techniques for generating and boosting ensembles due to its adaptability and simplicity. BP-AdaBoost algorithm was used to optimize BP-ANN prediction model. Calibration set samples were trained in BP-ANN model and several predictors with different prediction errors were obtained. Ontegration weights were calculated according to prediction error. On a word, the higher the prediction accuracy, the bigger weight ratio used; Dn the contrary, the lower the prediction accuracy, the smaller weight ratio adopted. Although the weak predictor had poor performance, a stronger predictor could still be formed by integration of weak predictors with bigger weights.
The BP-AdaBoost algorithm is presented as following 5 detailed steps (Cao et al., 2012): (1) Onitialization: where n indicates the size of training dataset.
(2) Train weak predictor: Use the training dataset to trained the th t weak predictor of BP-ANN and obtain the predicted value t y (i) of example, and then calculate the error of er(i) illustrated in the following equation 4: where T is the size of the weak predictors, y(i) is the actual value.
(3) Calculate the sum error t ε and the weight of weak predictor t w as following equations 5 and 6: (4) Set the t 1 D + according to t D by following equations 7 and 8:  (5) Dutput strong predictor: Steps between (2) to (4) are repeated by T times, T weak predictors are obtained as t f ,(t 1, 2, , T =  )and they are combined to strong predictor F(x) as following equation 9:

Software
THz spectra of the pork meat samples were acquired and stored by software (Spectroscopy analysis system, Advantest Co., Japan). All algorithms were implemented in Matlab R2009b (Mathworks, MA, USA) under Windows 7.

Reference HPLC analysis
The statistics of reference K values measured by traditional HPLC method were shown in Table 1. Ot could be concluded that, the mean values of K index increased gradually as storage time extended. The freshness loss was caused by microbial growth and activity in pork. As the K value covered a wide range (from 22.89% to 96.26%), results in Table 1 should be appropriate to achieve a robust model for K value prediction.
All 80 samples were divided into two subsets randomly. The division of samples in the calibration and prediction sets was 2/1 (Cai et al., 2011). The first subset was called calibration set to be used for building model, while the other was called prediction set to be used for testing the robustness of the model. As shown in Table 2, the ranges of reference K values in the calibration set almost covered the range in the prediction set, and their standards deviations in the calibration and prediction sets exhibited no significant differences. Therefore, their distributions of the samples were appropriate in the calibration and prediction sets.

Prediction models of K value in pork meat
From the above discussion, the content of ATP related product would change during the pork corruption, and these biological molecules had sensitive spectral response of THz wave, the THz spectra could reflect the change of molecular content. Therefore, there was an indirect correlation between THz spectral data and pork freshness. This work used nonlinear algorithm named BP-ANN to verify this relationship, and used the BP-AdaBoost to optimize the performance of BP-ANN, by comparing with the linear PCR algorithm, constructed a more effective THz spectra model for predicting pork K value.
For each spectrum, there were 250 variables (data points), the number of these variables was much larger than the number of samples. Of it was used directly for regression analysis, there would be over fitting, which would reduce the prediction accuracy and stability of the model. At the same time, there was some redundant information, such as collinear variables, which would cause severe difficulty to build the regression model. This problem could be solved by principal component analysis (PCA) which compressed spectral information by data reconstruction and dimensional reduction (Aït-Kaddour et al., 2018). After compression, several top principal components (PCs) were extracted from the original spectral data.
On this study, BP-ANN was used to construct a prediction model for the K value in pork meat. The PCs resulting from the above-mentioned PCA analysis were subjected to the BP-ANN model as the input layer, and the output layer contained one node for the prediction of K value. The number of nodes in the hidden layer was optimized based on the empirical equation 10. where m is the number of nodes in the hidden layer, n is the number of nodes in the input layer, l is the number of nodes in the output layer, and a is a constant from 1 to 10 (Xu et al., 2013). The transfer function was 'logsig' for the hidden layer nodes and 'tansig' for the output layer nodes in constructing the BP-ANN models in this study. The learning rate and momentum factor were set as 0.1, the initial weight was set as 0.3.
The number of nodes in the hidden layer was optimized by cross-validation, and determined by the lowest root mean square error of cross validation (RMSECV). Figure 2 shows the RMSECV of BP-ANN model according to the different number of nodes in the hidden layer and differential width by cross-validation.
As shown in Figure 2, the maximum discrimination rate by cross-validation is 18.1% when number of nodes in the hidden layer is 9 and differential width is 13. The result of optimal BP-ANN model is showed in Table 3.
We proposed a strong algorithm BP-AdaBoost to improve the BP-ANN model performance for K value prediction. The prediction error threshold (Φ) had a significant influence on the accuracy of BP-AdaBoost model, thus, it was determined by the minimum of RMSECV during cross-validation. Firstly, the threshold (Φ) was optimized in a lager scope (0.05~0.23) by the step of 0.01. Change of RMSECV is shown in Figure 3a. We found that when the parameter (Φ) was selected within optimum BP-AdaBoost model was achieved with Φ= 0.136. The parameter was substituted into the BP-AdaBoost algorithm, and 10 weak BP-ANN predictors and 1 strong predictor were obtained. The performance is shown in Table 4. Ot was clear that the performances of these weak predictors were different, the 2 nd and 8 th weak predictor had better prediction performance on small RMSEC and sum error t ε , so these two weak predictors have greater weight in strong predictor after iteration. Dn the contrary, the performance of the 6 th weak prediction was worse with large RMSEC and sum error t ε , so it had minimum weight with little contribution on the strong predictor after integration. Although prediction performances of these 10 weak predictors were not ideal, strong predictor obtained the best prediction performance after weighted integration. Figure 4 is the scatter plot between reference HPLC measurements of K values and BP-AdaBoost predicted results.
Principal component regression (PCR) can also estimate calibration model between the THz spectra and reference K values using cross validation. PCs from the above-mentioned PCA analysis were subjected to the PCR model as the input data. The FD width of the spectral preprocessing is selected according to the model prediction, such as the maximum correlation coefficient and the minimum RMSEP. The performance of the model is shown in Table 3. Table 3, the appropriate difference width of spectral data preprocessing is 13 (BP-ANN or BP-AdaBoost model) or 15 (PCR model), too small or too large differential width would reduce the prediction performance of the model. The nonlinear model BP-AdaBoost combined with AdaBoost algorithm improved the K value prediction performance, which was better than the linear model PCR.

Discussion
As for the reasons why THz spectroscopy with BP-AdaBoost algorithm could obtain such good prediction results, we could give detailed explanations from the following three aspects.
Firstly, K value was determined by the ratio of 6 kinds of ATP content. According to Shen et al. (2003), the correlation between these contents and spectral data in spectrum is nonlinear, so the fitting effect of nonlinear model is better than that of linear model. Secondly, the pork deterioration is a complex chemical process. Under the action of several kinds of spoilage bacteria, the protein in muscle is hydrolyzed into polypeptide, and then into amino acid, and further decomposed into various organic substances. THz spectra can reflect the content changes of biological molecules such as protein, polypeptide and amino acid. However, there were so many kinds of biological molecules in pork that the characteristic spectra of various molecules will overlap at room temperature. THz spectrum expresses not only the change of K value, but also the complex changes of all chemical components in pork. Therefore, there is a complex nonlinear relationship between K value and THz spectrum, which can not be explained by linear model. Thirdly, from the principle and structure point of the modeling algorithm, the nonlinear model was better than the linear model in self-learning and self-adjustment, and the network topology of BP-ANN is   0.13~0.14, the model was ideal. Then, within this range, the performance of the model was examined with a smaller step size of 0.001, as shown in Figure 3b. We could see that the more suitable for the analysis of complex chemical components (Chen et al., 2011;Lin et al., 2009). Moreover, the BP-AdaBoost algorithm integrates the BP-ANN weak predictors gradually, and makes out the strong prediction model finally. Therefore, the BP-AdaBoost model exhibited better prediction performance than the BP-ANN model.
On addition, the nonlinear model BP-AdaBoost can be further optimized. Because of the complex ingredients, there were no absorption peaks or characteristic bands obviously in THz spectrum. Some characteristic bands were closely related to K value in THz spectrum, which can be found out by filtering the influence of freshness unrelated substances. A more suitable model can be developed by reflecting the complex nonlinear relationship between K value and THz spectrum to improve the accuracy of the prediction.

Conclusion
The overall results indicated that the THz spectroscopy technique coupled with prediction model has the high potential ability to detect pork freshness. On this study, THz spectra of fresh pork in the range of 0.2~2 THz was obtained by ATR model. After FD preprocessing and SG smoothing, K value prediction model was constructed to detect pork freshness rapidly and nondestructively. Three regression algorithms (i.e. PCR, BP-ANN, and BP-AdaBoost) were attempted comparatively to develop the prediction model. Among them, BP-AdaBoost revealed its superiority in the solution to complicated regression. Ot can be concluded that THz spectroscopy technique coupled with BP-AdaBoost regression algorithm is capable of predicting the freshness K value of other meat or food nondestructively. Chen, Q., Cai, J., Wan, X., & Zhao, J. (2011b). Application of linear/ non-linear classification algorithms in discrimination of pork storage time using Fourier transform near infrared (FT-NOR) spectroscopy.