CLASSIFICATION OF ROBUSTA COFFEE FRUITS AT DIFFERENT MATURATION STAGES USING COLORIMETRIC CHARACTERISTICS

Coffee growers who produce the robusta species (Conilon) have sought to increase productivity and drink quality by improving production techniques. Artificial vision systems can assist in increasing the efficiency of operations associated with crop management. This study aimed to obtain colorimetric characteristics of robusta coffee fruits at different stages of maturity and use them for classifying fruits from digital images. A digital camera with spectral resolution in the visible was used to acquire images from 60 samples of coffee fruits at the green, cherry, and over-ripe stages of maturity. Colorimetric variables were extracted from the RGB, HIS, and L*a*b* color models and correlated with the physicochemical attributes of the fruits. The principal component analysis associated with the k-means technique was applied to the colorimetric variables that showed a significant correlation with the physical-chemical attributes. The colorimetric variables were reduced to a principal component, which presented an explanatory percentage of the variance of 82.33%. The clustering obtained by the application of the k-means technique showed the feasibility of using images for the automatic classification of robusta coffee fruits, with an overall accuracy of 100%.


INTRODUCTION
The robusta coffee (Coffea Canephora) is among the varieties that have gained market in recent years, with an estimated production in the 2019 growing season between 14.36 and 16.33 million bags, which represents an increase of 1.3 and 15.2%, respectively, relative to the 2018 growing season (CONAB, 2019). However, this variety has a reduced quality in terms of aroma and flavor, which makes it be used mainly in the production of blends and instant coffee (Lima Filho et al., 2015).
The commercial value of coffee is directly related to the beverage quality (Boaventura et al., 2018;Guimarães et al., 2019). Fruit ripening at harvest influences the physicochemical attributes of grains and, consequently, their quality (Borges et al., 2002;Fagan et al., 2011). Postharvest processes such as drying and storing agricultural products also negatively interfere with coffee quality when conducted improperly, causing losses to producers (Favarin et al., 2004;Saath et al., 2012). Therefore, quality control and classification during the harvest and post-harvest stages is a fundamental factor to obtain better quality coffee, increasing its market value. In this sense, studies that allow the development of technologies specific to robusta coffee are required due to its specificities.
The interest in using artificial vision systems in artificial agriculture has been an important tool in assisting to select and classify agricultural products based on their colorimetric characteristics (Cubero et al., 2011;Rajkumar et al., 2012;Wan et al., 2018). Studies based on the artificial vision for coffee aim to obtain information that can assist in the harvest (Silva et al., 2014;Ramos et al., 2017) and postharvest processes (Oliveira et al., 2016). These studies use predominantly colorimetric information of the species Coffea arabica, with little information on the robusta species in this context. Systems based on optical techniques are characterized by the possibility of measuring parameters without the need for fruit destruction and can be applied to automated equipment (Ramos et al., 2017;Castro et al., 2019). They also present accurate results, which can be instantaneous compared to methods based on human Engenharia Agrícola, Jaboticabal, v.40, n.4, p.518-525, jul./aug. 2020 perception, such as visual and sensory analysis, and laboratory and invasive procedures that hinder classification and increase the costs of analyzing the quality of these products, such as physicochemical analysis.
This study aimed to obtain colorimetric characteristics of robusta coffee fruits at different stages of maturity and use them to classify the fruits from digital images, aiming at the application in automated coffee harvesting and separation systems.

MATERIAL AND METHODS
The coffee fruits used as samples in the experiment belonged to the species robusta (Coffea canephora), cultivar ENCAPA 8121. The plot had plants aged up to 25 years, with a cultivated area of 3000 m 2 located on the Agroecological Farm of the Federal Rural University of Rio de Janeiro, Seropédica campus (22°44′38″ S and 43°42′27″ W). Harvest was carried out manually in 22 plants, totaling 148 L of coffee. The harvested fruits were submitted to the drying process in a natural terrace.
After drying, 60 coffee samples at different stages of maturity (20 samples of green, cherry, and over-ripe coffee) were selected. Each sample had approximately 200 g of fruits grouped by visual color inspection.

Image acquisition and processing
The images were acquired using a Nikon Coolpix l820 digital camera, with acquisition capacity in the spectral region of the visible. Two 40-W halogen lamps were used to control ambient lighting. The camera was placed on a tripod at a height of 0.83 m to capture the images of the coffee fruits, which were positioned centrally at 0.50 m from the camera. Both lamps were lit for indoor lighting.
The fine adjustments in the camera settings (focus distance, ISO, zoom, and white balance) were defined from preliminary experiments and maintained constant for all images, seeking to standardize the characteristics for image acquisition. The images were obtained in the RGB color model, where information on the intensity of the red (R), green (G), and blue (B) colors is captured.
Subsequently, the image segmentation process was performed to remove the background, highlighting the coffee sample. The threshold for separating the background of the coffee image was obtained by the Otsu method, which binarized the image by assigning a value of 255 (total white) for the region of interest (fruits) and a value of zero (total black) for the image background. The original image was merged with the binarized image using a logical union operator, resulting in the final image, in which only the fruit region was evidenced ( Figure 1). FIGURE 1. Images of fruits at the three stages of maturity after the binarization process. Green coffee fruits (A), cherry coffee fruits (B), and over-ripe coffee fruits (C).
For the analysis in the L*a*b* color model, the RGB images of the fruits were converted according to the standards and equations proposed by the Comission Internationale de l'Eclairage. The RGB images were initially transformed into the XYZ primary color system. The L*a*b* model allowed evaluating the lightness (L*), the color ranges chroma a* and b*, which are associated with chromaticity and range from green to red and blue to yellow, respectively, and the total chromaticity (c*), according to eqs (5) to (9) (Pedrini & Schuwartz, 2007).

Physicochemical parameters of coffee beans
The physicochemical parameters of coffee beans were obtained through the physicochemical analyses of electrical conductivity (EC), total soluble solids (TSS), and total titratable acidity (TTA). These analyses were carried out to relate physicochemical variables at different stages of maturity with the colorimetric variables obtained by digital images. The analyses used the same samples from which the colorimetric information was extracted.
The electrical conductivity test was carried out by the bulk system, according to the methodology recommended by Krzyzanowski et al. (1991) and using an MS Tecnopon Mca 150 digital conductivity meter, set for temperature compensation and electrode with a 1-cm cell constant. The physicochemical variables TSS and TTA were determined according to the Association of Official Analytical Chemists (AOAC, 2010).

Analysis of results
Colorimetric variables and physicochemical attributes were submitted to exploratory analysis with the arithmetic mean, maximum and minimum values, standard deviation, and coefficients of variation (CV) calculated for each variable.
The correlation between colorimetric variables and physicochemical attributes as a function of fruit maturity was performed using Pearson's correlation coefficient at a 0.05 significance level.
The colorimetric variables that presented a correlation coefficient higher than 0.50 were selected for a multivariate analysis from the principal component analysis (PCA) associated with the k-means clustering to develop a protocol for classification from digital images of the robusta coffee fruits according to colorimetric characteristics that vary according to maturity.
The PCA was applied after normalizing the values of the physicochemical attributes aiming at the standardization of the variables from the zero mean and unit variance, according to [eq. (10)]. Where: is the original value of the variable; ̅ is the mean of the j variable, and ( ) is the standard deviation of the j variable.
The results showed the total proportion of the variance explained by each principal component, information necessary to reduce variables that can describe the variance of colorimetric attributes as a function of maturation, the correlation between colorimetric variables and principal components of highest relevance, and the scattering of scores of each fruit in a two-dimensional plane formed by the two principal components of highest relevance.
The clustering was performed using the k-means technique seeking to group fruits of a similar stage of maturity (green, cherry, and over-ripe). Clustering quality was evaluated through the relationship between the number of fruits of the predominant stage of maturity of each group and the total number of evaluated fruits (overall accuracy). Table 1 shows the main statistical descriptors for the colorimetric variables of the set of evaluated fruits. The existence of variability in the colorimetric characteristics, especially when using the variables of the HSI and L*a*b* models is observed. The indicated variation is associated with the difference in the stages of maturity of coffee fruits used in the experiment, showing that classification methods based on data variance, such as PCA, can be used to distinguish the maturity of fruits from the characteristics obtained through digital images. The means of the evaluated chemical attributes were compared as a function of the three stages of maturity to characterize them according to the fruit quality (Table 2). Among the analyzed parameters, the TSS and TTA of coffee fruits could be differentiated at the three analyzed stages. 123.50b 204.17a 106.50c *Means followed by the same letter in the row are not different regarding the stages of maturity.

RESULTS AND DISCUSSION
The analyzed physicochemical characteristics interfered with the beverage quality of robusta coffee fruits, being influenced by maturity and different genotypes (Lima Filho et al., 2013;Mori et al., 2018). The TTA is influenced by the degree of fruit fermentation because green fruits present low TTA values, increasing during the fruit ripening process, which makes the harvest at the ideal time essential to avoid that the fruit fermentation process harms the beverage quality (Nobre et al., 2011). Regarding TSS, sugar concentrations are directly associated with the stage of maturity of coffee fruits, and the lower their degree of maturity, the lower the contents of total sugars (Scholz et al., 2011).
The electrical conductivity is related to the amount of solute ions inside cells. High EC values in cherry fruits (Table 2) may be associated with the occurrence of fermentation due to the degradation of the cell membranes of fruits caused by excessive exposure to the drying process or presence of fruits with mechanical damage in the evaluated samples (Resende et al., 2011).
Changes in physicochemical attributes as a function of ripeness lead to variations in the colorimetric characteristics of fruits during their development. The linear correlation between TSS levels and the spectral response in the red band allowed Silva et al. (2014) to develop maps of spatial variability in a coffee field, highlighting areas with better quality coffee. Costa et al. (2018) observed a correlation between oil accumulation in macaúba palm fruits and the colorimetric characteristics, especially the hue, allowing the detection of the appropriate time for harvest.
The correlations for robusta coffee fruits under the conditions of this research (Table 3) showed that, overall, the colorimetric characteristics were significantly correlated with chemical attributes. Therefore, it demonstrated that the spectral variables were indicators of the quality of coffee fruits that can be used as parameters to classify fruits automatically and non-invasively, that is, without the need to destroy the fruits. In this context, the variables hue (H) and chroma b stood out, as they had significant correlations and with power above 0.65 with the three chemical attributes evaluated. The use of color models to distinguish the stage of maturity has been applied in the selection and quality control processes of different fruits. In general, colorimetric characteristics from the RGB model show significant correlations between parameters related to the fruit quality and TSS (Silva et al., 2014) and TTA (Noh & Lu, 2007).
However, the RGB model was susceptible to variations in lighting that can impair this type of analysis. Parameters obtained through other color models less susceptible to the effect of lightness, such as the H hue (HSI model), which determines the predominant color, tend to improve the correlations between physicochemical attributes and facilitate the distinction of the stage of maturity. The significant and high-power correlation between the H hue and physicochemical attributes observed in Table 3 was also found by Hashimoto et al. (2012) and Iqbal et al. (2016) when using this colorimetric characteristic to distinguish the stage of maturity.
The L*a*b* model, whose all colorimetric variables (L*, a*, b*, and c*) showed a significant correlation with physicochemical attributes (Table 2), has been used with great success for food quality for presenting a uniformly ordered color distribution regardless of the device used to acquire the image (Pedrini & Schwartz, 2007). Oliveira et al. (2016) also classified green coffees using the colorimetric parameters obtained by the L*a*b* model as The colorimetric variables that showed a significant correlation with TTS, TTA and EC were used to perform the APC, as they were successful in distinguishing the three stages of maturity (Table 2). Thus, the variables green intensity (G), hue (H), saturation (S), and chromaticity (a*, b*, and c*) were selected for the multivariate analysis and later classification according to fruit ripeness. Table 4 shows the variance and percentage of variance explained by each principal component. The number of principal components necessary to explain the data variability must correspond to an accumulated percentage higher than 70%. Thus, the colorimetric variables obtained at maturity can be reduced by a single variable, which facilitates the implementation of algorithms to classify robusta coffee fruits from images, as PC1 presented an explanatory percentage of 82.33%.  Table 5 shows the correlation between the colorimetric variables and the principal components PC1 and PC2, which were the most representative and together presented 97.76% of the explanatory power of the data variation. The colorimetric variables were predominantly correlated with PC1, except for the green intensity (G), which showed a strong direct correlation with PC2. Due to the high explanatory power of the variance associated with PC1, Equation (11) allows characterizing the maturity of robusta coffee fruits based on the coefficients (eigenvectors) associated with colorimetric variables. Positive values calculated from this index indicated that the maturity level can be characterized by the variables hue (H), saturation (S), chroma a (a*), and chroma c (c*). On the other hand, negative values indicated that the stage of maturity can be characterized by chroma b (b*) and green intensity (G). The scattering of scores (Figure 2) in the direction of PC1 shows that the negative values were associated with coffee fruits at the green stage, while positive values were found at the cherry stage. Values close to zero were associated with the over-ripe stage. In practical terms, the sum of hue (H), saturation (S), chroma a (a), and chroma c (c*) higher than the sum of chroma b (b*) and green intensity (c*) indicates fruits at the cherry stage. Otherwise, the fruits are associated with the green stage.
Engenharia Agrícola, Jaboticabal, v.40, n.4, p.518-525, jul./aug. 2020 Figure 2 also shows that the classification generated by the k-means algorithm had an overall accuracy of 100%, that is, fruits of the same stage of maturity were classified in the same group. The result showed that the colorimetric characteristics of robusta fruits were strong indicators to be used as input parameters in the automated selection and harvesting systems. The reduction of the number of variables through the PCA application allows the generation of models for classification of a lesser degree, facilitating the computational implementation and reducing the need for processing and storage of data in hardware, allowing the development of systems with simpler components and lower costs.
The use of classification systems through machine learning algorithms and artificial intelligence has made a great contribution to technological development for analyzing the quality of agricultural products. As the clustering accuracy obtained in this research, recent studies have also shown protocols with a high degree of overall accuracy for fruit classification based on artificial vision. Mohammad et al. (2015) used the linear discriminant analysis, Castro et al. (2019) used the support-vector machine technique, and Toon et al. (2019) used artificial neural networks and obtained classification accuracy rates above 90.00% for persimmon, currant, and tomatoes fruits, respectively.
The genetic improvement of robusta coffee has led to increased productivity and beverage quality (Ferrão et al., 2008;Ramalho et al., 2016). However, technological advances are fundamental for the crop to reach higher production levels. Research such as the one presented here, although potentially applied to other coffee species, such as Coffea arabica, contributes to the understanding and development of equipment applicable in operations involving the management, harvesting, and post-harvesting of this species.

CONCLUSIONS
The colorimetric characteristics obtained from models varied according to the evaluated stages of maturity, which allowed the use of classification techniques through digital images.
The chemical attributes of total soluble solids (TSS) and titratable acidity (TTA) characterized the difference of fruits at the green, cherry, and over-ripe stages. In general, the colorimetric characteristics showed a significant correlation with chemical attributes, standing out hue (H) and chroma b (b*).
The principal component analysis showed that all the variation observed by the colorimetric characteristics as a function of the maturity of robusta coffee fruits could be reduced into one variable, as the PC1 presented an explanation of the variance of 82.33%.
Robusta coffee fruits were grouped into three classes according to their maturity (green, cherry, and over-ripe) when dispersed in a two-dimensional plane formed by PC1 and PC2 using the k-means algorithm, with an overall accuracy index of 100%.