Tomato quality based on colorimetric characteristics of digital images

ABSTRACT Results of evaluations using optical evaluation methods may be correlated with tomato quality and maturation. In this context, the objective of this study was to evaluated the correlation between tomato colorimetric and physico-chemical variables, clustering them as a function of maturation stages, using multivariate analysis. The experiment was conducted using 150 fruits and three maturation stages (immature, light red and mature). The physico-chemical variables were evaluated through traditional methods. The colorimetric variables were assessed on images in RGB color model taken with a digital camera. The correlation between colorimetric and physico-chemical variables was analyzed using the Pearson’s coefficient. Principal components analysis and k-means clustering method was applied to three data set: RGB isolated variables; colorimetric variables calculated by relation between the RGB bands (colorimetric indexes); and physico-chemical variables. The colorimetric variables present higher explanatory capacity of the maturation variation than physico-chemical variables. The colorimetric indexes presented higher performance in clustering (accuracy of 0.98) tomatoes as a function of maturation.


Introduction
Tomato (Licopersicum esculentum) is an internationally important crop species; it is the second most produced vegetable in the world (Peixoto et al., 2017). Studies have been using methods that assist in tomato quality control, which is required by industry and consumers (Jorge et al., 2011;Huang et al., 2018).
Laboratory tests are usually used to evaluate fruit quality through physico-chemical attributes, such as pulp firmness, total soluble solids, and acidity (Magwaza & Opara, 2015;Teka, 2013). These procedures are, in general, destructive and demand chemical reagents and expensive equipment, making it difficult to obtain a rigorous quality control. Moreover, fruit selection and classification by the industry and markets are based on variables established by quality control agencies using, predominantly, visual inspection (CEAGESP, 2003).
Artificial vision systems have efficiency assessed and quantified fruit quality and control characteristics (Zhang et al., 2014;Wang et al., 2015). Color is the most important quality attribute of tomato fruits, which is related to their appearance, sugar content, acidity, texture, flavor, and succulence (Ferreira et al., 2010;Governici et al., 2017).
Colorimetric characteristics based on images in RGB color model have allowed the implementation of artificial vision systems with high capacity of classification, using lowcost equipment (Wan et al., 2018). These systems can assist in selecting and control tomatoes fruit quality with nondestructive methods (Bicanic et al., 2015).
Therefore, the objective of this study was to evaluated the correlation between tomato colorimetric and physico-chemical variables, clustering them as a function of maturation stages using multivariate analysis.

Material and Methods
This study was conducted using 150 tomatoes of the variety Caqui, obtained in market places and selected by visual inspection of color. The tomatoes were classified as immature (green), colorful (orange-reddish), and mature (predominantly red) based on the variables defined by CEAGESP (2003). Fifty tomatoes of each maturation stage were evaluated.
Images of each fruit were taken using a digital camera (Cannon S110), with resolution of 2592 × 1944 pixels and light capture within the visible spectra.
The images were taken by placing each tomato in the center of a photographic studio for small objects. The light was generated by a LED lamp of 100-W and temperature of 5000 K. The digital camera was placed in a tripod at 0.17 m height, at 0.23 m from the tomato fruit, and set with maximum zoom of 3x and automatic white balance and ISO. This configuration was the same for all shots to standardize the images. The images were taken in RGB additive color system and stored in JPEG files.
The images were then pre-processed and segmented using the ImageJ program, by applying the Gaussian filter to eliminate noises and improve their quality, and segmenting them to remove the background, highlighting the object of interest (tomato). The limit used to separate the background and tomato in the image was based on the method of Otsu (Otsu, 1975), using a binary value for each pixel-0 (black) or 1 (white), according to Costa et al. (2018).
The final image was the result of a logic operation of combinations of the filtered and the binary images. The processed images were evaluated for mean intensity levels of the red (R), green (G), and blue (B) bands of the area of each fruit in two places of the images, corresponding to the opposite sides of the tomato fruit.
The spectral variations of fruits were highlighted, intensifying the distinction between maturations, and eliminating the environmental light effect, by calculating four spectral indexes for the RGB images (Eq. 1, 2, 3 and 4): where: R -mean color intensity of the red band in pixel units, 0 ≤ R ≤ 255; G -mean color intensity of the green band in pixel units, 0 ≤ G ≤ 255; B -mean color intensity of the blue band in pixel units, 0 ≤ B ≤ 255; GR -green-red relation, as used by Baesso et al. (2012); and, PR -pigment relation, as used by Metternicht (2003); NGRDI -normalized green red difference index, as used by Gilabert et al. (2002); and, NEG -normalized excess green, as used by Baesso et al. (2012).
The physico-chemical variables total soluble solid (TSS), pH, titratable acidity, and water content were determined according to AOAC (2010). The pulp firmness of the fruits was determined using the flattener method (Calbo & Nery, 1995), and their weight was measured in a precision digital balance.
The colorimetric and physico-chemical variables were subjected to exploratory analysis, calculating the arithmetic mean, maximum and minimum values, standard deviation, and coefficients of variation. The data were also subjected to boxplot dispersion analysis to verify the presence of outliers beyond the lowest or highest limits.
The colorimetric and physico-chemical variables were correlated using the Pearson's coefficient of correlation.
The principal component analysis and k-means clustering method (Makky & Soni, 2013;Li et al., 2014) were carried out in the Past 3.1 program, to assess the potential of the variables evaluated in the digital imagens to distinguish the tomatoes as a function of maturation stages. (1) Considering that the variables presented different units, they were normalized before the analysis by their relation between means and standard deviations in each maturation stage and divided into three groups: colorimetric variables 1 (CV1), consisted of the R, G, B, and Mean-RGB; colorimetric variables 2 (CV2), consisted of calculated colorimetric indexes green-red relation (GR), pigment relation (PR), normalized green red difference index (NGRDI), and normalized excess green (NEG); and physico-chemical variables, consisted of fruit pulp firmness, weight, total soluble solids, pH, titratable acidity, and water content.
The groups with above 70% of the variance explained (Ferreira, 2008) by principal component 1 (PC1) and principal component 2 (PC2) could discriminate maturation stages through k-means clustering analysis.
The quality of the clustering was evaluated by the clustering purity index (CPI) (Zhao & Karypis, 2002), according to Eq. 5.
for maximum and minimum values, standard deviation, and coefficient of variation, indicating that these variables are the most affected by the maturation stage (Table 1). Outliers were not found for the boxplot dispersion analysis for any variable.
The variation of the physico-chemical variables (Table 1) showed that fruit pulp firmness was the most affected variable by the tomato maturation stages. Firmness is a physical attribute that present inverse correlation to fruit maturation (Costa et al., 2017). According to Ferreira et al. (2010), the closer to the physiological maturation, the lower the pulp firmness, due to solubilization of substances of the cell walls by the action of enzymes whose activities increased along with the development of fruits.
Although chemical attributes such as TSS and pH (Pu et al., 2016) are used as fruit maturation indicators, environmental, genetic, and management factors affect the tomato quality (Teka, 2013). Therefore, the use of fruits from market places without information on the productive cycle may have contributed to the lower variation in these variables as a function of maturation stage.
Pulp firmness was the physico-chemical attribute with higher correlation with colorimetric variables (Table 2), confirming that this physico-chemical attribute is the most affected by fruit maturation in the experimental conditions used.
The G intensity of the RGB model presented correlation with more physico-chemical variables than the other variables, showing that this colorimetric characteristic shows the tomato quality variations as a function of maturation.
The calculated colorimetric indexes had higher correlation with physico-chemical attributes, especially pulp firmness.

Results and Discussion
The colorimetric variables in RGB color model and the calculated variables GR and NEG presented high variations R -Mean color intensity of the red band; G -Mean color intensity of the green band; B -Mean color intensity of the blue band; RGB mean -Mean intensity of RGB; GR -Green-red relation; PR -Pigment relation; NGRDI -Normalized green red difference index; NEG -Normalized excess green; TSS -Total soluble solids Table 1. Exploratory analysis of colorimetric and physico-chemical variables of tomatoes (n = 150) R -Mean color intensity of the red band; G -Mean color intensity of the green band; B -Mean color intensity of the blue band; Mean-RGB -Mean intensity of RGB; GR -Green-red relation; PR -Pigment relation; NGRDI -Normalized green red difference index; NEG -Normalized excess green; **, * and ns -Significant at p ≤ 0.01, p ≤ 0.05 and not significant, respectively Table 2. Pearson's correlation matrix for colorimetric and physico-chemical variables of tomato fruits (5) Thus, colorimetric characteristics can be used to evaluate tomato quality and acceptability using non-destructive and automatic methods. The use of the colorimetric indexes NEG and GR to condense colorimetric information and improve the classification process was applied by Baesso et al. (2012) for identification of nutritional deficit in common bean crops.
Despite the significant correlations, fruit weight can present higher correlation with colorimetric variables when the maturation is monitored on the same fruit.
The correlation between the variables RGB and fruit qualitative characteristics is shown in studies that use these variables as indicators of quality (Wang et al., 2015;Governici et al., 2017;Costa et al., 2018) to assist in the decision-making during selection processes and harvest.
However, RGB intensity can be affected by variations in environmental light, resulting in distortions in the quantification of color intensity. The use of a light illumination apparatus using a single emitter for all shots minimized the effect of light variation in the measurements of RGB intensities. Non-controlled illumination environments require the use of colorimetric indexes to improve the perceptions of intensities of variables colorimetric, since R, G, and B bands eliminate the effect of illumination on the quantification of the colorimetric intensity (Sena Júnior et al., 2003). Table 3 shows the percentage of variance explained by the principal components in each group. Although it is an intuitive criterion, according to Ferreira (2008), a minimum explanation value between 70% and 90% is commonly adopted to determine the number of principal components needed to explain the variability of the data. The groups CV1 and CV2, consisted of colorimetric characteristics and independent variables, presented percentages higher than 70%, and were selected for clustering analysis.
These results denote that tomato colorimetric characteristics can explain variances in fruit characteristics as a function of maturation stage, even with little control of factors associated to tomato crop management and production. The physicochemical characteristics are affected by these factors (Amarante et al., 2015;Watanabe et al., 2015), hindering their classification when the fruits are not from the same area.
In addition, colorimetric characteristics are assessed based on optical, non-destructive methods, making possible the automatization of selection, classification, and harvest processes (Zang et al., 2014;Wan et al., 2018). The evaluation of physicochemical attributes, such as TSS, titratable acidity, pH, and water content, demand laboratory practices (AOAC, 2010) whose results are not instantaneous and require the use of chemical products and equipment that increase the analysis costs.
Fruit color variation during maturation is related to pigment production, such as chlorophyll, carotenoids, and lipocene (Bicanic et al., 2015;Park et al., 2018). Therefore, including these pigments in the physico-chemical variable group can contribute to increase the explanatory capacity of the principal components of this group, reaching similar results to those of analysis of colorimetric variables. Table 4 shows the correlation between the principal components PC1 and PC2 and the colorimetric variables CV1 and CV2. The R intensity, color characteristic of mature tomatoes, presented high correlation with PC1, component with high explanatory level (68.99%); thus, it is the most important colorimetric variable for the explanation of the maturation variation. The G intensity present higher correlation with PC2, but was the only variable affecting this component, with explanatory level of 28.02%.
The variables of group CV2 were strongly correlated with PC1, confirming that the use of GR, PR, NDVGRI, and NEG indexes improves the distinction of fruits from different maturation stages, and the explanatory capacity of the principal components.
Makky & Soni (2013) used coefficients of colorimetric variables in a multivariate analysis of canonical discrimination and developed an equation to automatically classify palm tree bunches as a function of fruit maturation. Therefore, considering the high explanatory level of PC1 for the variance in group CV2 (97.73%), an equation generated from the coefficients (weights) of each variable in this component can be used as a tomato maturation index and assist in the decisionmaking in automatic systems of artificial vision. Figure 1A shows the dispersion of scores of each fruit as a function of the two first components of group CV1. Fruits at maturation stage present, in general, a specific dispersion in the bidimensional plan. Moreover, the clustering of immature tomatoes was highly affected by the attribute G, and mature tomatoes were more affected by the attributes R and B. The dispersion of scores of the group CV2 ( Figure 1B) as a function of maturation stages was more grouped than that of group CV1. The immature tomatoes were affected by the colorimetric variables GR, NGRDI, and NEG, whereas light red tomatoes were responsible to the variable PR, and mature tomatoes had lower effect of all indexes.
Makky & Soni (2013) used a similar procedure to the k-means clustering method, with colorimetric variables from the RGB and HSI color models, for groups of palm tree fruits in eight classes of maturation and found accuracy in the clustering of 88.7%.
Researches using the k-means algorithm for colorimetric characteristics to evaluate fruit maturation also found high Table 3. Percentage of explanation of the variance by the first and second principal components (PC1 and PC2) and accumulated percentage (PCac) for the groups of colorimetric variables 1 (CV1), colorimetric variables 2 (CV2), and physicochemical variables (PCV) R -Mean color intensity of the red band; G -Mean color intensity of the green band; B -Mean color intensity of the blue band; Mean-RGB -Mean intensity of RGB; GR -Green-red relation; PR -Pigment relation; NGRDI -Normalized green red difference index; NEG -Normalized excess green Table 4. Correlation between the principal components (PC) and variables of group CV1 and of group CV2 accuracy indexes, such as Li et al. (2014), when evaluated the maturation of grapes in the trees, and Yamamoto et al. (2014), when classifying tomato fruits without damages.
The clustering of fruits in 3 classes at maturation stages by the k-means algorithm of group CV1 (Table 5) showed that the class 1 was formed predominantly by mature tomatoes, class 2 by colorful fruits, and class 3 by immature fruits (CPI = 0.72). The clustering generated from group CV2 increased the clustering quality (CPI = 0.98), showing that the highlighting of colorimetric differences of fruits using calculated colorimetric indexes facilitated the classification of fruits as a function of maturation.
Although the variables RGB evaluated singly were efficient for the distinction of tomato maturation stages, the use of the correlation between these variables through calculated colorimetric indexes (CV2) was the most indicated data to be input in automated systems based on artificial vision, focusing on the selection and classification of tomatoes.
The use of colorimetric indexes such as GR, PR, NGRDI, and NEG decreases distortions caused by environmental light, highlighting the colorimetric differences as a function of fruit maturation, increasing the number of correct classifications in the clustering.
The implementation of low-cost artificial vision systems for selection and harvest mechanisms is a potential application to be explored in further researches focused on rural properties and food industries. The classification of tomatoes in intermediate stages to those evaluated in the present study and the analysis of correlation between colorimetric variables and sensorial characteristics (appearance, flavor, aroma, and consistency, for example) can also be investigated, focused on the development of a more robust classification model.

Conclusions
1. Tomato pulp firmness was the physico-chemical attribute that presented higher correlation with colorimetric variables, and the most affected attribute by fruit maturation.
2. The principal components of colorimetric variables were efficient to measure variations in tomato maturation, with high accumulated percentage of variability explained 97.01% for group CV1, and 99.87% for group CV2.
3. The calculated colorimetric indexes green-red relation, pigment relation, normalized green red difference index, and normalized excess green increased the clustering quality (accuracy of 0.98) of tomatoes as a function of maturation stages with the use of the k-means algorithm.  Table 5. Clustering of variables of group CV1 and CV2 by the k-means algorithm in each maturation stage