Predicting LDPE / HDPE blend composition by CARS-PLS regression and confocal Raman spectroscopy

Industries and the scientific community currently focus on creating new ways to recycle and to reuse polymer waste that leads to serious socio-environmental risks. However, the quality of recycled polyethylenes depends strongly on their purity degree, but the distinction between Low Density Polyethylene (LDPE) and High Density Polyethylene (HDPE) by a fast and consistently good methodology is still an unsolved issue for the current recycling processes. In this study, confocal Raman spectroscopy and Competitive Adaptive Reweighted Sampling Partial Least Squares (CARS-PLS) linear regression have been successfully applied to quantify the concentration of LDPE/HDPE blends. The effects of several regression parameters (pretreatment method, Monte Carlo sampling runs, k-fold and maximal number of latent variables for cross-validation) on the CARS-PLS model training and prediction performance were analyzed. The CARS-PLS-based models show root-mean-squared prediction error of 4.06 8.87 wt% of LDPE for the whole composition range of HDPE/LDPE blend.


Introduction
Polyethylenes (PEs) are the main thermoplastic polymers consumed by the current civilization and, consequently, the largest polymer fraction found in urban solid wastes.The reason for the great versatility of their mechanical properties is the control of the degree of polymeric branches during the ethene polymerization by a low-cost production [1] .However, this characteristic of PEs results in several difficulties in manufacturing their recycled products with attractive properties by mechanical recycling [2,3] .HDPE/LDPE blends have been widely used by the plastic industry to adjust processability and mechanical properties of the polyethylene resins [4] .However, the unknown and uncontrolled composition of these polymeric blends and recycled polyethylene wastes hinders the processing and production of material goods with satisfactory performance and quality.
In several countries, Low Density Polyethylene (LDPE) and High Density Polyethylene (HDPE) are the main representatives in the family of PEs due to their higher degree of production than that observed for other polyethylenes commercially available [5,6] , such as Linear Low Density Polyethylene (LLDPE), Ultra High Molecular Weight Polyethylene (UHMWPE) and Ultra Low Density Polyethylene (ULDPE).
LDPE and HDPE are semi-crystalline thermoplastics, frequently distinguished by their densities (δ LDPE = 0.91-0.93g/cm 3 and δ HDPE = 0.95-0.97g/cm 3 ) [7] , which are closely linked to their differences in the number of polymer branches [8][9][10] .LDPE is commonly processed by extrusion, blow molding and injection molding.This polyethylene has high impact resistance and flexibility among the PEs, as well as interesting electrical properties to be used as an electrical insulator.Consequently, LDPE has been applied to the production of flexible packaging, wiring and cable coating.HDPE is used in several segments: buckets, bowls, trays, toys and pots are obtained by injection processing; packaging for detergents and cosmetics are made by blowing processing; insulation of telephone wires, decorative tapes, garbage bags and grocery bags are obtained by extrusion [7] .
The determination of the fractional composition of LDPE/HDPE blends is not a simple task because the chemical structures of their polymer chains are only based on carbon and hydrogen.Wide-Angle X-Ray Scattering (WAXS), Dynamic Mechanical Thermal Analysis (DMTA) and Differential Scanning Calorimetry (DSC) have shown limitations to estimate the composition of this polymer blend due to the effects of co-crystallization for blends with more than 10 wt% of LDPE [11] .Contrary to these characterization techniques, confocal Raman spectroscopy is a quick, nondestructive and inexpensive method since it does not require expensive inputs or time-consuming methods for sample preparation and analysis [12] .
In this contribution, we evaluated the potential of Partial Least Squares linear regression modified by Competitive Adaptive Reweighted Sampling (CARS-PLS) to analytically determine the compositional fraction of LDPE/HDPE blends by prediction models based on confocal Raman data.

Mathematical and computational fundamentals
PLS linear regression is an mathematical method that externally correlates an instrumental data set (X matrix) and an interest property set (Y matrix) by linear equations [13,14] : where n refers to the number of observations of a property; k is the number of responses measured for each sample; m corresponds to the number of properties to be predicted by PLS regression; h is the number of latent variables (LVs); T and U are the matrices of scores for X and Y data matrices, respectively; P and Q matrices present the inputs for X and Y, in that order; E and F matrices contain the residual errors for the prediction model.
To maximize the covariance between X and Y matrices, the scores are obtained from the linear combinations of the elements from the instrumental data set, using weight coefficients (w) and a given number of LVs [15] .In the conventional PLS regression, the elements in the T matrix (t), i.e. the scores, are estimated by: where x is an elements in the X matrix.
Keeping the minimum modulus for F elements and the matrices of scores (T and U) internally correlated by U = T (i.e., X scores are assumed to be the most appropriate predictors for Y matrix) [13] , the interest property set is predicted by PLS linear regression using: where G is a matrix of random errors and B is the matrix of model regression with the linear coefficients.Computationally, the PLS linear model is implemented by the NIPALS algorithm detailed in Figure 1.
The CARS algorithm was projected to interactively find the optimal subset, i.e., points in an instrumental data set (X matrix with the spectra data) to build the PLS regression with the lowest value of Root Mean Square Error of Cross Validation (RMSECV) [16] .At every sampling run, the CARS algorithm builds a PLS model with a randomly selected variable subset from the calibration set (Monte Carlo sampling method).The Exponentially Decreasing Function (EDF) and Adaptive Reweighted Sampling (ARF) are subsequently applied as a two-step method for wavelength selection to remove the wavelength (elements in the X matrix) that present the poorest weight coefficients (w) by a simulation of the "survival of the fittest" principle.
In CARS-PLS, the importance of each x element is evaluated by a normalized weight calculated by: While the ARF method keeps the x element with the largest weights, the EDF method induces the reduction of the number of x elements to build the PLS models with the small absolute regression coefficients by force.In each sampling run, EDF uses the following exponential model: where:

Apparatus and software
Raman spectra of the HDPE/LDPE extruded pellets were obtained using confocal Raman Microscope Alpha300 R (WITEC, laser of 532 nm and 45 mW), and collected from 210 to 3785 cm -1 at room temperature with a spectral resolution of 3 cm -1 .
All Raman data were smoothed using the Savitzky-Golay method [17] (polynomial order of 5, window points of 10) and previously normalized.CARS-PLS regression of the pre-processed spectra were carried out on MATLAB software (version R2015a) using libPLS 1.95 toolbox [18] .

CARS-PLS regression analysis
Forty-two spectra were used as a cross validation set, while sixteen spectra were used as an independent prediction set.The root mean squared errors were measured by [19] : where n is the spectrum number; i y are the reference concentrations of the samples and ˆi y are the concentrations predicted by the calibration set (RMSECV), or independent validation test (RMSEP), respectively.
The fitting degree between the predicted results and reference values was obtained by the correlation coefficient (R) [20] : ) where mean ŷ is the average polymer concentration of all samples in the cross validation and external test sets.

Results and Discussions
The Raman spectra from the processed samples are shown in Figure 2 to represent all the compositions of the LDPE/HDPE blends (0-100 wt% of LDPE) and the characteristic Raman shifts of the polymer chains of the polyethylenes (Table 1).In sum, the Raman shifts at 1070, 1135 and 1300 cm −1 are from C-C stretching and -CH 2 -twisting.The medium Raman shift at 1445 cm -1 is associated to three -CH 2 -vibrational modes from the PE crystal structure (one wagging and two scissoring vibrational modes) [21] .The strong Raman shifts at 2845 cm -1 and 2883 are from the asymmetric and the symmetric stretching of the CH 2 units, respectively [22] .The weak Raman shift at 480 cm -1 is from the molecular rotations of the C-C ramifications with four to nine carbons in gauche state [23] .
In addition, Raman shifts derived from optical effects were identified on the Raman spectra of the LDPE/HDPE blends: 2725 and 2430 cm -1 are overtones and combinations of wavenumbers in the range of 1400-1495 cm -1 (-CH 2 -bonds) [24] ; 2935 cm −1 (smooth shoulder) is reported to be from the Fermi resonance between the CH 2 symmetric stretching and the overtone from the CH 2 bond [25] .The Raman spectra of LDPE and HDPE are very similar, but it was observed that the maximum intensity of the Raman band at 1460 cm -1 increases with the reduction of LDPE in the polymer blend, while the opposite behavior is observed for the Raman shifts at 1370 and 1416 cm -1 .These spectral characteristics are the basis for operation of the multivariate calibration to quantify the composition of the LDPE/HDPE blends using confocal Raman spectra data [20] .
Table 2 presents the optimal predictive models built by the CARS-PLS algorithm using several statistical pretreatment methods for the Raman data (the number of latent variables for cross-validation, type of cross-validation and number of runs were maintained constant, as described  in the table label).The pretreatment step is essential to reduce the negative effects of the Raman signal instability caused by sample fluorescence and laser instability.Here, we evaluated the Pareto, mean-centering and autoscaling methods; note that all Raman data were previously processed by Savitzky-Golay smoothing and normalization procedure before their pretreatment.The results indicate that the predictive model based on mean-centering (PLS-C) best fits the external test set (R pred = 0.979), and also show the lowest prediction error (RMSPE = 4.062 wt% of LDPE).Independently of the data pretreatment method, the CARS-PLS regression has an excellent calibration performance, since the calibration errors in Table 2 are extremely low, lesser than 0.9 for all prediction models.The data pretreatment by Pareto method minimizes the relative importance of large values, but it does not cause significant changes to the original spectral data.In the autoscaling method, the objective is give equal importance for all the spectral data, while the mean centering pretreatment consists of removing the offsets from the spectral data [26,27] .
Table 3 summarizes the predictive PLS-based models with the lowest results of RMSECV and RMSEP, obtained by the CARS search algorithm and several K-fold values for cross-validation.Their correlation coefficients of calibration (R calib ) and prediction (R pred ) were detailed as well.The K-fold cross-validation technique randomly divides the calibration dataset into K mutually exclusive subsets with the same size, i.e., with the same number of spectra.While K-1 subsets are applied to the training of the predictive model, one subset is used to calculate RMSECV and R calib (model testing).Leave-one-out, which was used to analyze the pretreatment method effects on the CARS-PLS predictive models, is a specific case of K-fold cross-validation, where K is equal to the total number of spectra data (N).In this mathematical approach, N calculations are performed, incurring expressive computational cost when N is high.
As can be seen in Table 3, there is no improvement in prediction and calibration performance of the CARS-PLS models using more than 5-fold, in which the RMSEP is equal to that obtained by leave-one-out cross-validation (RMSEP = 4.062 wt%).In this fold condition, the fitting degrees do also not display fluctuations for either calibration (R calib = 0.999) or prediction (R pred = 0.979) datasets, while the optimal number of latent variables is 19.
According to Table 4, RMSECV decreased and R calib increased as the maximal numbers of latent variables for cross-validation increased, being the minimum result at 0.039 wt% for the PLS-40 model built with 34 LVs.However, the RMSEP results indicate a direct effect on the prediction performance of the CARS-PLS models due to an increase of the maximal LVs, since RMSEP falls from 8.017 wt% to 5.521 wt% of LDPE when the maximal number of LVs is enhanced from 5 to 10, respectively.
In a linear PLS regression, a projected vector subspace is assembled by a linear relationship between the latent variables in the spectral dataset.For this reason, the optimum number of LVs should be identified to obtain the best calibration performance for the PLS model [28] .The advantage of the CARS-PLS method is the possibility to conduct a sophisticated and automatic search to optimize this parameter without the need of excessive manual searches, required in the conventional linear and nonlinear PLS regressions [16] .
In order to investigate the influence of the number of Monte Carlo sampling runs on the CARS-PLS model performance, predictive models with 50 to 10000 runs were built and they are shown in Figure 3 (it was kept constant the other parameters, i.e. pretreatment method, cross-validation  algorithm and maximal number of latent variables for cross-validation).All show good correlation coefficients for the predicted values of the LDPE relative concentration from the external validation set (R pred higher than 0.9) and excellent correlation coefficient for calibration (R calib = 0.999).
The highest calibration and prediction error was identified for the CAR-PLS model built with 50 runs, probably due to the low steps for searching the main Raman shifts to set up a predictive model with a robust predictive performance.
The results of the CARS-PLS models assembled with more than 100 sampling runs evidently show that the RMSECV reduction does not necessarily improve the predictive ability of the PLS model (reduction of the RMSEP value).Moreover, all the statistical errors of the CARS-PLS models are constant when more than 5000 sampling runs (RMSEP = 0.281 and RMSECV = 4.806 wt% of LDPE) are used.All CARS-PLS-based models presented more significant prediction performance with the interval containing the Raman shift at 2883 cm -1 (both amorphous and crystalline polyethylene phases) and 1445 cm -1 (only from the PE crystalline phase).In a previous work with Interval PLS linear regression [20] , we identified that the Raman signal at 2845 cm -1 , which regards the CH 2 asymmetric stretching in amorphous and crystalline phases, enables to obtain prediction models with the smallest RMSEP values (2.68-6.94wt% of LDPE).The most plausible justification is associated to the intensity and width of Raman shifts (1370, 1416 and 1460 cm -1 ), which are not just related to the content of the polymer chemical groups, but also to the macromolecular organization of the polymeric chains.The difference between the branching degree of LDPE and HDPE affects the methylene polymer conformations in the amorphous and crystalline regions, directly influencing their molecular rotations and vibrations, intimately connected to the Raman signal detected by this vibrational spectroscopy.

Conclusions
A modified PLS linear regression was used to predict the composition quantification of LDPE/HDPE blends.The predictive PLS-based models presented the lowest prediction error of 4.062 wt% of LDPE with a good fitting coefficient of 0.979 in the whole content range, 0-100 wt% of HDPE.
The CARS-PLS parameters display a significant role in the RMSECV and RMSEP of the predictive models.In the conditions evaluated, the mean centering method for Raman data pretreatment favors the best prediction performances, while the autoscaling method benefits the lowest calibration errors.The increase in the K-fold and the maximal numbers of LVs for cross-validation caused a reduction of the RMSECV values, but RMSEP is not directly related with these regression variables.The optimal number of sampling runs was 100; above this value, the CARS-PLS models have a decrease in their potential to determine the LDPE relative amount in the polymeric blend.

Table 1 .
Main Raman shifts of the LDPE and HDPE.

Table 2 .
Optimal predictive models obtained by CARS-PLS regression with the confocal Raman spectra pretreated by several methods (constant parameters: maximal number of latent variables for cross-validation = 20; cross-validation = leave-one-out; Monte Carlo sampling runs = 100).

Table 3 .
Optimal predictive models obtained by CARS-PLS regression using several K-fold values for cross-validation (constant parameters: pretreatment = centering; maximal number of latent variables for cross-validation = 20; Monte Carlo sampling runs = 100).

Table 4 .
Optimal predictive models obtained by CARS-PLS regression using several maximal numbers of latent variables for cross-validation (constant parameters: pretreatment = mean centering; cross-validation = leave-one-out; sampling runs = 100).