DETERMINATION OF QUALITY AND RIPENING STAGES OF 'PACOVAN' BANANAS USING VIS-NIR SPECTROSCOPY AND MACHINE LEARNING

This paper aimed to develop predictive models to determine total soluble solids, firmness, and ripening stages of 'Pacovan' bananas, using Vis-NIR spectroscopy and machine learning algorithms. A total of 384 bananas were divided into different days of storage (0, 3, 6, 9, 12, 15, 18, and 21 days) at two temperatures (25°C and 20°C). Bananas were subjected to spectral analysis using a spectrometer operating in spectral range of 350 – 2500 nm. Physicochemical parameters of quality, total soluble solids, and firmness were determined by reference analyses. Different machine learning algorithms were used to develop regression models and supervised classification. The best model for total soluble solids was the Random Forest with variable selection, showing an R 2cv of 0.90 and RMSECV of 2.31. The best model for firmness was the Support Vector Machine with variable selection, showing an R 2cv of 0.84 and RMSECV of 7.98. The best classification model for different ripening stages was the Multilayer Perceptron with variable selection, which achieved the precision of 74.22%. Therefore, Vis-NIR spectroscopy associated with machine learning algorithms is a promising tool for monitoring the quality and ripening stages of 'Pacovan' bananas.


INTRODUCTION
Brazil is the third largest fruit producer in the world, with approximately three million hectares of planted area. In this context, bananas (Musa spp.) are one of the most cultivated fruits in the country. This crop is a source of fiber, vitamin C, carbohydrates, and highly important mineral nutrients, such as calcium, potassium, phosphorus, and magnesium, which contribute to human nutrition and, consequently, stimulate the commodity chain (Castilho et al., 2014;Neris et al., 2018;ABRAFRUTAS, 2019;Santos et al., 2019).
Post-harvest bananas develop climacteric processes characterized by biochemical and physical transformations, capable of directly affecting the nutritional properties and acceptability of the fruit. For that reason, during this stage, monitoring and controlling the quality of these characteristics is important for the product commercialization (Hossain & Iqbal, 2016;Xie et al., 2018;Cho & Koseki, 2021).
Traditional methods of monitoring the post-harvest quality of bananas are usually conducted through wet methods, which are destructive, invasive, and timeconsuming. These methods can properly represent the characteristics of the fruit, but they also bring food loss and waste, as well as loss of efficiency in harvesting decision-making. Thus, there is an increasing interest in non-destructive methods that provide rapid monitoring without losses (Sanaeifar et al., 2016).
Visible and near-infrared spectroscopy (Vis-NIR) is a particular case of a rapid and non-destructive technique which has been incorporated into processes to control and monitor the quality of food products. These products benefit from the acquisition of relevant information concerning numerous physicochemical aspects, using optical mechanisms of interaction with the products (Hu et al., 2017).
In the fruit sector, studies have revealed the technique effectiveness to determine the quality of a wide variety of crops, such as passion fruit (Oliveira-Folador et al., 2018), grape (Costa et al., 2019), tomato (Li et al., 2021), strawberry (Xie et al., 2021), pear (Yu et al., 2021, and banana (Cho & Koseki, 2021). Different machine learning methods can be used to process and analyze data obtained by Vis-NIR spectroscopy, and then enable the quantification and classification of numberless physicochemical and colorimetric parameters.
Prediction and classification models can, for example, be processed by: inductive learning, by means of vector decision trees for characteristics of data sets (Shafiee & Minaei, 2018); parameterized learning of functions, using data separation into planes or hyperplanes in a vector space (Vapnik, 2000;Mireei et al., 2017); learning by average proximity correspondences between data (Sabanci & Akkaya, 2016); and learning by reproduction networks of different types of intelligent mechanisms (Nunes et al., 2010).
Machine learning methods, such as J48 (Quinlan, 1993), Random Forest, Support Vector Machine (Keerthi et al., 2001), K-Nearest Neighbours, and Artificial Neural Network are commonly used for developing prediction and classification models. However, their use for spectroscopy data in agricultural and food products is still little explored. Hence, the objective of this paper is to develop predictive models to determine total soluble solids, firmness, and ripening stages of 'Pacovan' bananas, using Vis-NIR spectroscopy associated with machine learning algorithms.

MATERIAL AND METHODS
A total of 384 'Pacovan' cultivar bananas at the green ripening stage were obtained at Central de Abastecimento Agrícola, in the municipality of Juazeiro (Juazeiro, Bahia, Brazil: Latitude: 09º 24' 42" S and Longitude: 40º 29' 55" W). They were subjected to surface sterilization and stored at temperatures ranging from 20 to 25°C, and relative humidity between 50 and 58%. The bananas were evaluated on different days of storage (0, 3, 6, 9, 12, 15, 18, and 21 days). On each evaluation day, 48 bananas were individually subjected to a process of spectral acquisition, followed by reference analyses for total soluble solids (TSS) and firmness. The spectral acquisition system consisted of: (1) a FieldSpec 3 spectrometer (Analytical Spectral Devices, Boulder, Colorado, USA), with an 8° optical sensor field of view, wavelength range from 350 to 2500 nm, resolution from 3 to 10 nm, and precision of ± 1 nm; (2) 50-W quartztungsten-halogen light source; (3) a dark chamber with dimensions of 100 x 50 x 50 cm; and (4) a computer with RS3 software (Analytical Spectral Devices, Boulder, Colorado, USA) ( Figure 1). A white Spectralon ceramic plaque (Labsphere Inc., North Sutton, NH, USA) with approximately 100% reflectance was used as a calibration standard, and each spectrum was established by the average of sixty scans performed on two sides of the banana. FIGURE 1. Vis-NIR reflectance spectra acquisition system, consisting of a spectrometer, light source, dark chamber, and computer.
On each evaluation day, visual classification of fruits at the green, nearly ripe, ripe, and rotting ripening stages was carried out based on and adapted from the Von Loesecke (1950) ripening scale. Total soluble solids (TSS) were determined from the fruit juice, using a digital refractometer (HI 96804, Hanna Instruments, USA) with a measuring range between 0 and 85%, precision of ±0.2%, and the results were expressed as percentages (%). Firmness was measured by a digital penetrometer (PTR-300, Instrutherm, Brazil) with a 5 mm cylindrical probe, and the results were expressed as Newtons (N). Refractometer and penetrometer readings were performed only once on each evaluated fruit.
Predictive models were developed with classifiers J48, Random Forest (RF), Support Vector Machine using Sequential Minimal Optimization (SVM-SMO), K-Nearest Neighbours algorithm (KNN-IBK), and Multilayer Perceptron Artificial Neural Network (MLP-ANN), within the Weka 3.8.4 software implementation environment (University of Waikato, New Zealand). Models were developed with default settings of the algorithms in the software and based on 10-fold and leave-one-out crossvalidation methods.
To eliminate possible collinearity problems within the spectral data set, a variable selection was conducted using the following filters: Correlation-based Feature Selection (CFS) (Hall, 1999) with the Best-first search method and CfsSubsetEval function; and Wrapper (Kohavi & John, 1997) with the GreedyStepwise search method and WrapperSubsetEval function.
Using the CFS filter, the selection was carried out investigating characteristics between variables by correlation functions, whereas the selection with the Wrapper filter was performed using modeling algorithms to test variable sets (Witten et al., 2011). Models were reconstructed with the selected variables. Predictor variables were the full-spectrum data and selected variables, while response variables were TSS, firmness, and ripening stages.
Regression models performance was evaluated through the coefficient of determination of crossvalidation (R 2 cv) (Equation 1 The Area Under the Curve (AUC) parameter, created during model processing using the graphical technique of visualization, organization, and selection of Receiver Operating Characteristic (ROC) classifiers (Prati et al., 2008), was used to indicate the discriminative capacity of the models. An AUC value close to 1 indicates high discriminative capacity, while a value close to 0.5 means little discriminative power (Luo et al., 2012).

RESULTS AND DISCUSSION
Total soluble solids content increased from 1.8% to 35.5%, while firmness decreased from 86.6 N to 3.3 N over the days of storage at any temperature ( Figure 2). Biochemical and cellular alterations occur during the fruit ripening process, such as the conversion of starch into sugars, and softening of cell wall structures through the breakdown of pectins by enzymes (Yang et al., 2019;Cho & Koseki, 2021).  Figure 3 shows the behavior of the absorption spectra in the region from 400 to 2400 nm over the days of storage. In the visible region between 560 and 720 nm absorption peaks are observed for days 0 and 3 of storage. During this period, the fruit was in the initial ripening stage, and wavelengths between 600 and 750 nm account for the presence of chlorophyll, which absorbs red light (Liu et al., 2008). Subsequently, colorimetric changes occur, such as chlorophyll degradation and progressive increase in pigment compounds (carotenoids and anthocyanins) in the epidermis, as well as biochemical changes, such as the hydrolysis of starch and cell wall components (pectins and hemicelluloses). Consequently, the banana color changed from green to yellow, and red light absorption decreased (Liu et al., 2008;Quevedo et al., 2008;Carvalho et al., 2011;Liew & Lau, 2012;Hailu et al., 2013;Adebayo et al., 2016). Wavelengths of 970, 1180, and 1440 nm showed absorption peaks on all days of storage. These wavelengths in the near-infrared region are related to the absorption bands by sugars (980 nm -second OH overtone) and water (973, 1324, and 1581 nm -third OH overtone) (Shafiee & Minaei, 2018;Costa et al., 2019). Determination of quality and ripening stages of 'Pacovan' bananas using Vis-NIR spectroscopy and machine learning Engenharia Agrícola, Jaboticabal, v.42, special issue, e20210160, 2022 The RF and SVM models, associated with the Wrapper selection filter, had performance for total soluble solids and firmness of 0.90 and 0.84 for R 2 cv, 2.31 and 7.98 for RMSECV, and 1.71 and 6.12 for MAECV, respectively (Table 1). The results of this study are similar to those of Liew & Lau (2012), who developed models for total soluble solids and firmness of Cavendish bananas with R 2 of 0.96 and 0.86, respectively, and Jaiswal et al. (2012), who developed models for total soluble solids of Grand Naine bananas with R 2 of 0.88. Legend: RF -Random Forest; SVM -Support Vector Machine; KNN -K-Nearest Neighbours; MLP -Multilayer Perceptron; R 2 cvcoefficient of determination of cross-validation; RMSECV -root mean square error of cross-validation; MAECV -mean absolute error of cross-validation.
The performance of supervised classification models is indicated in Table 2. The MLP model, associated with the Wrapper filter, discriminated the different ripening stages with a precision of 74.22%, Kappa index of 0.62, sensitivity of 78.26%, selectivity of 89.31%, and false-positive rate of 10.69%. The precision is close to that reported by Mustafa et al. (2009), who developed models with a precision of 81% to discriminate ripening stages of bananas. The Wrapper filter selected the wavelengths of 357, 364, 432, and 934 nm for the MLP classifier. The selection of wavelengths improved the model discrimination capacity, reducing errors caused by temporal variabilities, and removed redundant information that compromises the classifier performance (Ramos, 2003).
The classifier was able to discriminate the green ripening stage with 100% accuracy, and had a satisfactory performance in discriminating the other stages (Table 3). Adebayo et al. (2016) state that there is a negative correlation between ripening stages and absorption coefficients of photons from the light falling on the banana. As the fruit goes through the ripening stages, absorption rates of these photons, which are related to chemical compositions, such as sugar content, soluble solids content, and the presence of photosynthesis compounds, are reduced (Adebayo et al., 2016).

CONCLUSIONS
Regression models based on the Random Forest and Support Vector Machine algorithms with variable selection were able to predict total soluble solids and firmness with performances around 90%. The supervised classification model based on the Multilayer Perceptron algorithm with variable selection discriminated the different ripening stages of the bananas with a precision greater than 70%. The selection of wavelengths in the visible and near-infrared spectral regions significantly increased the models capacity to determine the total soluble solids, firmness, and ripening stages. Therefore, Vis-NIR spectroscopy associated with machine learning algorithms is a promising tool for monitoring the quality attributes and ripening stages of 'Pacovan' bananas.