Exploratory and discriminative studies of commercial processed Brazilian coffees with different degrees of roasting and decaffeinated

The fingerprints of the volatile compounds of 21 commercial Brazilian coffee samples submitted to different industrial processing i.e. decaffeinated or different roasting degrees (traditional and dark) were studied. The volatiles were collected by headspace solid phase microextraction (HS-SPME) and analyzed by GC-FID and GC-MS. The chromatographic data matrices (fingerprints) obtained were explored by the principal component analysis (PCA) and partial least squares – discriminative analysis (PLS-DA). Initially the chromatographic profiles were aligned by the algorithm correlation optimized warping (COW). The PCA showed the discrimination of the decaffeinated coffees from the others with both the SPME fibres used. This separation probably occurred due to the loss of some volatile precursors during the decaffeination process, such as sucrose. For both the fibres tested, PDMS/DVB and CX / PDMS SPME, the PLS-DA models correctly classified 100% of the samples according to their roasting degree: (medium and dark), the main differences being the concentrations of some of the volatile compounds such as 2-methyl furan, 2-methylbutanal, 2,3-pentanedione, pyrazine, 2-carboxyaldehyde pyrrole, furfural and 2-furanmethanol.

Besides roasting, decaffeination is another process commonly applied to green coffee beans.Most of the decaffeination methods use organic solvents for caffeine extraction, such as dichloromethane, chloroform and others,dichloromethane being the most commonly used In Brazil.Two other methods are also known for the extraction of this alkaloid: water decaffeination and supercritical fluid (CO 2 ) decaffeination (CLARKE and VITZTHUM, 2001).
The influence of the decaffeination process on the bean composition and on the final product quality depends on the extraction method used, since other classes of compounds are also lost during the extraction process.In a study about the decaffeination process using dichloromethane, the results indicated a considerable decrease in sucrose content for the C. canephora and C. arabica varieties of green coffee (20 and 60 %, respectively).The decaffeination process also extracted 11 % of the chlorogenic acids from the C. canephora beans and 16 % from the C. arabica beans.The proteins, lipids and trigonelline were little extracted during the decaffeination process (TOCI et al., 2006).
There are currently several methods that can be combined with gas chromatography in order to analyze the volatile fraction, and the use of solid-phase microextraction (SPME) appears to be an attractive approach.Since its introduction, SPME has been shown to be an excellent sampling method, allowing for the simultaneous extraction and concentration of analytes from sample matrices (PAWLISZYN, 1998).
The use of chemometric methods is more or less essential to process data from instruments, since the technological advances in instrumentation have greatly increased their capability and the amount of data that can be generated and collected.In the past, most chemometric analyses were performed on reduced data sets, only using the areas of selected peaks detected on the chromatograms.Currently this limitation can be overcome by using the entire chromatographic profiles via chemometric analyses (RIBEIRO et al., 2009(RIBEIRO et al., , 2010(RIBEIRO et al., , 2012)).
On the other hand, with the use of the chromatogram, small unavoidable differences in the experimental conditions are more apparent (minor changes in retention time and drift in the chromatograms caused, for instance, by column ageing or flow-rate variations).To minimize these differences, the algorithm for the correlation optimized warping (COW) of chromatographic profiles, introduced by Nielsen et al. (1998), is used.
The aligned chromatographic profiles were analyzed by the principal component analysis (PCA) to differentiate decaffeinated coffees from medium and dark roasted coffees.The partial least squares-discriminative analysis (PLS-DA) (BARKER and RAYENS, 2003) was then used to discriminate coffee samples according to

Introduction
Nowadays, world coffee consumers are looking for specific coffee tastes, and hence the food industries have had to invest into and create a greater variety of processed products from the same raw material.Amongst these processed coffee products originating from different coffee species (Coffea arabica and Coffea canephora), coffees with differentiated roasting degrees, decaffeination and freeze-drying, amongst others, are already present on the market.
Due to the obvious importance of flavour in consumer acceptance and the quality perception of coffee, the chemical composition and their key odorant compounds have been intensively studied (MAYER et al., 2000;MAYER and GROSCH, 2001;BUFFO and CARDELLI-FREIRE, 2004).
According to De Maria et al. (1999) the majority of the volatile compounds are formed during the roasting process, but by different mechanisms, such as the Maillard reactions, Strecker degradation and several other breakdown and degradation processes.Trigonelline, for example, forms pyridine and pyrrole derivatives by degradation.The pyrazines are formed by degradation of carbohydrates and chlorogenic acids are responsible for the formation of phenolic derivatives.Furan derivatives are formed by glycides and sucrose and the lipids are the main class forming aldehydes, ketones, aliphatic alcohols and aromatic compounds.
During the roasting processes, the colour of the beans is directly correlated with the final roasting time and temperature: the higher the temperature and the longer the exposition time, the darker the coffee, so that colour can be used to define the end of the roasting process.
The degree of roasting is usually described as being "light", "medium" or "dark".A dark roasting process, for example, implies dark brown bitter beans and the lack of typical coffee aromas, whereas a light roasting process may be insufficient to complete all the pyrolytic reactions, resulting in a light brown coffee with underdeveloped organoleptic characteristics (BUFFO and CARDELLI-FREIRE, 2004).
When studying the roasting of Arabica coffee, Toci et al. (2006) found that the process degraded a considerable quantity of sucrose, and hence depending on the degree of roasting, the concentration of sucrose can be lower than the detection limit of the analytical method used.Significant degradation of the chlorogenic acids also occurs during the roasting process (83 and 97 % for the C. arabica and C. canephora samples, respectively).The reduction in protein content is around 10 % in light roasted coffee and up to 20 % in dark roasted coffee, and the trigonelline concentration is also reduced with the degree of roasting (TOCI et al., 2006).http://bjft.ital.sp.gov.brBraz.J. Food Technol same conditions as the GC-FID.The GC-MS data were treated using the Automated Mass Spectral Deconvolution and Identification System (AMDIS) v. 2.61 software and the NIST Mass Spectral Search Program v. 1.6d (NIST, Washington, DC, USA), as well as making comparisons with previous reports on the volatile compounds of roasted coffee (RIBEIRO et al., 2009).

General SPME procedure for sampling and injection
The volatile compounds were collected using different fibres but under the same conditions, in order to choose the most suitable fibre for sampling coffee volatiles.Ground coffee (250 mg) and 2 mL of saturated aqueous sodium chloride solution (salting-out effect) were transferred to a septum-sealed glass sample vial (5 mL).After 10 min of sample/headspace equilibration under agitation at 1200 rpm and 40 °C, the fibres were exposed to the sample headspace for 10 min.After sampling, the fibres were immediately exposed in the injection port of the GC, and the analytes thermally desorbed for 10 minutes at 220 °C.All the analyses were carried out in triplicate.

Chemometric data treatment
The original chromatographic profiles were organized into an X matrix format (IxJ), where each i replicate was used as a sample and the J variables were the electrical signal received by the FID detector every 0.05 seconds.The data was analysed by the Matlab 6.5 software (The MathWorks, Co., Natick, MA, USA) using the computational package PLS_Toolbox (Eigenvector Research, Inc. -PLS_Toolbox version 3.02.)(WISE et al., 2004).
The chromatograms were aligned using a correlation optimized warping (COW) algorithm obtained from www. models.kvl.dk/source/.After alignment, the data matrix was smoothed by the Savitzky-Golay algorithm using a window size of five points (SAVITZKY and GOLAY, 1964) and column-wise autoscaling.
In chromatography, peak selection not only enhances the stability of the classification model but also helps in interpreting the relationship between the model and the sample compositions.In this work, the new method denominated ordered predictors selection (OPS) (TEÓFILO et al., 2009) was applied.This method uses an intuitive vector formed by a combination of vectors such as the regression vector, correlation vector and others.With this intuitive vector, the independent variables are ordered according to their importance to the model.The ordered variables are then tested using increments over a previously defined window and the RMSECV and the their roasting degree (medium and dark roasted) and to compare the use of the two SPME fibres.
The identification of volatile compounds and differentiation of coffees according to their roasting degree has been carried out before (BICCHI et al., 1997(BICCHI et al., , 2002;;LOPEZ-GALILEA et al., 2006), but the data analysis proposed in this work, using the chromatographic fingerprint of roasted coffees instead of peak areas is a significant innovation.In addition, the discrimination of decaffeinated coffees is new and an alternative to the formation of coffee flavour, since the decaffeination process extracts other compounds besides caffeine.
The aim of this study was to differentiate commercial Brazilian coffee samples by their volatile compounds using modern chemometric methods.

Coffee samples
Twenty-one samples of commercial Brazilian coffees were obtained from local stores.Seven of them were decaffeinated and light roasted (DC), another seven were traditional or medium roasted (TR), and the last seven were dark roasted (DR).All samples were from different production batches.

GC/FID parameters
The analyses were carried out using a G-6850 GC-FID system (Agilent, Wilmington, DE) fitted with a HP-5 capillary column (30 m × 0.25 mm × 0.25 µm), using helium (1 mL min -1 ) as the carrier gas.The oven temperature was programmed as follows: 40 °C → 7 °C / min → 230 °C → 30 °C / min → 280 °C.The injection port was equipped with a 0.75 mm i.d.liner and the injector was maintained at 220 °C in the splitless mode.Under these conditions, no sample carry-over was observed on the blank runs carried out between extractions.

PDMS/DVB fibre
The principal component analysis (PCA) was applied to the pre-treated data set in order to get an insight into which peaks could be responsible for the discrimination of the decaffeinated samples from the others (medium or dark roasted).From the scores plot in Figure 1A, a distinct visual clustering distinguished the decaffeinated samples from the other two classes when the data were displayed with respect to the first three principal components (PC).The decaffeinated coffees are located on the left, with negative scores in PC1 (which describes 57.56 % of the original information), well separated from the medium and dark roasted samples on the right side with positive scores (Figure 1A).PC1 also showed a tendency to separate the medium from the dark roasted coffee samples.correlation cross-validation coefficient (r cv ) values stored for each window analyzed.The best set of variables is indicated by the lowest RMSECV and the highest r cv .

HS-SPME-GC-MS analysis
The volatile compounds obtained from the roasted coffee headspaces, extracted with the two SPME fibres, were detected by GC-MS analysis and identified by comparing their fragmentations with the NIST data bank and some recently published papers (RIBEIRO et al., 2009(RIBEIRO et al., , 2010)).This procedure was capable of extracting and detecting a large number of volatile compounds.Pyrazine, furan and pyrrole derivatives were some of the organic classes found in the coffee headspace.Some of these compounds are described in Table 1 and used in the followed discussions.samples (thirty replicates -fifteen for each class), while the other four samples (twelve replicates -three for each class) were used as the external validation set.
To construct the training data set, the pre-treated chromatogram profiles were divided into five regions: region one ranging from retention times of 2.5 to 3.8 minutes, region two from 3.8 to 6.8 minutes, region three from 6.8 to 8.6 minutes, region four from 8.8 to 10.2 minutes and the last region from 10.2 to 15 minutes.The variables (peaks) in each region were selected by the OPS method (see materials and methods) and the original variables reduced from 7500 to 150 (14 peaks).
Figure 2C shows the main peaks selected for the construction of the PLS-DA.The peaks were identified as being: 2-methylfuran (4), 2-methylbutanal (5),  The decaffeination process causes a significant loss of sucrose (TOCI et al., 2006), and the principal products of sucrose pyrolysis are furan, pyrrole and pyrazine derivatives.The chromatographic results indicated that the concentration of pyrazines, pyrroles and certain furan derivatives were reduced in the decaffeinated samples.
The partial least squares -discriminative analysis (PLS-DA) was applied to classify the samples according to their roasting degree.For this, the decaffeinated coffees were excluded from the original data matrix and the data set randomly split into a training set consisting of ten and revealed basically the same information as with the PDMS/DVB fibre.PC1 described 77.94 % of the total variance and indicated a good separation between the decaffeinated coffees and the other classes.The plot of the scores is illustrated in Figure 3A.
Using the same procedure used for the PDMS/ DVB fibre, a PLS-DA model was also built to classify the coffee samples according to their roasting degree.The decaffeinated coffee samples were again removed in this case and the remaining data set was randomly split into a training set (ten samples) and external validation set (four samples).The important variables (peaks) for the training PLS-DA model.LV1 contains 43.72 % of the total variance and is important for a good separation between medium and dark roasted coffees.The training and external validation samples are included in Figure 2A.
The different concentrations of the selected peaks determine the sample position inside the classes.The compound 4-methylthiazole (12) appears in higher concentrations in dark roasted coffees than in medium roasted coffees (negative loadings).For the other selected peaks, the concentrations are higher in medium roasted coffees (positive loadings).A loading plot of latent variable 1 is shown in Figure 2B.These results differ from Franca et al. (2006) in the case of furfural.
The high concentrations of almost all the compounds in traditional coffee can be explained by the walls of the beans.All the pyrolytic reactions occur in the beans, and their walls act as autoclaves under high pressure.When the roasting process continues to produce dark roasted beans, the walls do not support the pressure and "explode", and lots of volatile compounds are lost to the air.
This analysis shows that it is possible to discriminate medium and dark roasted coffees by applying variable selection directly to the chromatographic profiles, instead of peak areas.The discrimination was masked (see Figure 1A) when all useful and non useful retention times were taken into account.

CX/PDMS fibre
A similar data analysis was performed on the data matrix from the CX/PDMS fibre.PCA was applied to the eight peaks visually selected from the aligned chromatograms

Comparison between the results of the two SPME fibres
The results obtained using the PCA demonstrated that both fibres tested in this work were useful for the identification of decaffeinated coffee samples.Table 1 shows the main compounds detected and identified by GC-MS analyses which are important for the PCA analyses.
The loadings in Figures 1B and 3B indicated that the compounds 9, 10, 14 and 20 contributed to the clustering in the PCA analyses for the PDMS/DVB and CX/PDMS fibres.The other volatile compounds important for the PCA analyses differed from one fibre to the other.This occurs because the coverings of the SPME fibres are made from different materials and so the equilibrium adsorption of certain compounds does not work equally in both cases.
Another perceptible difference occurred in the relative concentrations of the compounds adsorbed by both fibres.The CX/PDMS fibre was more effective for the highly volatile compounds, non-retained analytes.This means that the CX/PDMS fingerprint shows higher concentrations of the compounds than the PDMS/ DVB fingerprint.This can be seen mainly with the light compounds.However, the PDMS/DVB fibre showed greater applicability for a larger range of compounds (higher overall sampling capability).
Visually, comparing Figures 1C and 3C, the CX/PDMS fibre demonstrated significant differences between the relative concentrations of some peaks used set were selected by the OPS method and the original data variables were reduced from 5100 to 284 (17 peaks).
By applying PLS-DA, a model describing the discrimination of the predefined classes was obtained.Using three latent variables, the statistical parameters indicated a low root mean square error of cross validation (RMSECV = 0.22) and a high correlation coefficient (r cv = 0.97).The results obtained from the PLS-DA indicated that all the samples from the external validation set were correctly classified.
The scores for LV1 plotted against LV2 and LV3 in Figure 4A show how the samples were clustered in the subspace defined by the first three components of the PLS-DA model.LV1 accounts for 53.94 % of the total variance and is important for a good separation of the medium from the dark roasted coffees.As with the PDMS/DVB fibre, after the variable selection, the volatiles not useful for discrimination amongst the samples were excluded.
It can be seen in Figure 4C, which shows the chromatograms from each class that, except for 2-pentanone, the concentrations of all the selected compounds were higher in medium roasted than in dark
for discriminations in the PCA analyses.This difference was not an important parameter in the chemometric analyses because the extraction systems always work under equilibrium.However, the PDMS/DVB fibre showed excellent reproducibility and an extended lifetime as compared to the CX/PDMS fibre.
When applying PLS-DA to predict the sample roasting degree, the models constructed for both fibers for the chromatographic profile, instead of peak areas, using three latent variables, correctly classified all the samples in the training and external validation sets.

Conclusions
The results reported in this work demonstrated that the analysis of the chromatographic fingerprints of volatile compounds from roasted coffee as spectroscopic profiles, contrary to the traditional integrated areas of the detected peaks, gave a new vision to the treatment of the chromatographic data.This occurred because relevant additional information could be obtained and interpreted by chemometrics.The chromatographic data alignment and pretreatment algorithms have opened new ways to treat chromatographic data in general, and are winning a lot of adepts in the real world.
The PCA analyses for both fibres indicated that the compounds N-methylpyrrole, pyrazine, furfural and 5-methyl-2-furancarboxyaldehyde appeared in higher concentrations in traditional coffees as compared to decaffeinated ones.

Figure
Figure1Cshows a typical overlap of two mean chromatograms, one for traditional and dark roasted coffees and the other for decaffeinated coffees, for comparison.The compounds mentioned above are enumerated and the differences between the samples can be identified.It can be seen that N-methylpyrrole (9), pyrazine (10), 2-furanmethanol acetate (22), 2-ethyl-5-methylpyrazine (23), 2-furanmethanethiol (24) and 3-ethyl-2,5-dimethylpyrazine (26) tended to be in higher relative concentrations in the traditional and dark roasted samples (positive loadings).The other compounds appeared in greater relative concentrations in the decaffeinated samples (negative loadings).

Figure 2 .
Figure 2. (A) LV1 x LV2 x LV3 scores plot for the PDMS/DVB fibre -( ) Dark roasted and ( ) medium roasted peaks selected by OPS and used in the PLS-DA.(B) LV1 loadings plot of the main peaks selected by OPS for the PLS-DA model (C) Chromatograms of medium roasted () and dark roasted ( ) coffees obtained using the PDMS/DVB fibre.

Figure 3 .
Figure 3. Plots of the PC1 x PC2 x PC3 scores for the CX/PDMS fibre (A) -( ) Decaffeinated samples, ( ) dark roasted and ( ) medium roasted.PC1 loadings for the eight peaks selected for PCA (B).Mean chromatogram of medium and dark roasted ( ) and decaffeinated ( ) coffees obtained using the CX/PDMS fibre (C).

Figure 4 .
Figure 4. Plots of the CX/PDMS scores -LV1 x LV2 x LV3 -( ) Dark roasted and ( ) medium roasted (A).LV1 loadings plot of the main peaks selected by OPS for the PLS-DA model (B).Peaks selected by OPS and used in the PLS-DA.Chromatograms of medium () and dark roasted ( ) coffees obtained using the CX/PDMS fibre (C).

Table 1 .
Main compounds identified from the chromatographic peaks in Figure1by comparison of their MS spectra with those of the NIST MS data base and the literature.