MACHINE LEARNING FOR SOYBEAN SEEDS LOTS CLASSIFICATION

The seed germination and vigor evaluation are essential for the sowing sector to measure the performance of different seed lots and improve the efficiency of storage and sowing processes. However, the analysis of various tests to determine seed quality generates a large amount of information, making it almost impossible for humans to perform a quick and effective quality control analysis. Therefore, the objective of this study was to evaluate the differences in the physiological quality of soybean seeds in different cultivars using machine learning techniques to rank the lots based on their quality. Three cultivars were used, and the analysis was germination, accelerated aging, tetrazolium treatment, seedling emergence, and 1000 seed weight from 65 lots were measured. The lots were evaluated in two phases, one immediately after harvest and the other after six months of storage. Random forest, multi-layer perceptron, J48, and classification via regression classifiers were used, aided by the feature resampler technique. Random forest and


INTRODUCTION
Analyzing seed quality tests generates such a massive amount of information that it becomes almost impossible for humans to rapidly and effectively analyze such information quickly in a quality control laboratory (Pinheiro et al., 2021). Therefore, erroneous results may result in economic losses for seed companies.
Seed quality standards are required, with minimum legal requirements, and companies must perform internal control tests that generate important information. It can lead to copious data generated during an agricultural harvest depending on the company's size.
Based on this demand, seed technology research has focused on identifying aspects associated with the ranking of lots based on the physiological potential of the seeds. One tool that has attracted the attention of researchers is the use of machine learning and artificial intelligence to rank lots.
Data mining techniques consist of methods and classifications that generate more accurate information, where patterns are automatically extracted from the dataset (Reddy, 2021;Cardoso & Machado, 2008). Thus, data mining has emerged as an important tool for predicting the physiological quality of seeds.
Data generated during quality control tests of lots must be evaluated by adapting responses using machine learning techniques to reduce the time and resources spent on repetitive laboratory tests. Therefore, there is a need to streamline analyzing the large amount of data generated during the characterization of seed quality.
The objective of the present study was to evaluate the differences in the physiological quality of soybean seeds in different cultivars using machine learning techniques to rank the lots based on their quality that was evaluated immediately following harvest and after six months of storage.

MATERIAL AND METHODS
The study was conducted at the Internal Laboratory of Seed Analyzes of a company located in Sinop, Mato Grosso do Sul state, Brazil. Seeds cultivars were provided by the company using their genetic material for cultivation in the state of Mato Grosso and were produced during the 2018/19 harvest. The cultivars were classified as shown in Table 1. The seeds were evaluated in two stages. The first stage was immediately following harvest and the second was after six months of storage. The storage conditions were those practiced by the company: refrigerated environment maintained at 13 °C and 60% relative humidity. Germination, accelerated aging, tetrazolium, seedling emergence, and 1000 seed weight tests were measured.
To determine the 1000 seed weight, eight replicates were used with 100 seeds that were weighed on an analytical balance and the seed weight was calculated according to the Rules for Seed Analysis (RAS) (Brasil, 2009). The moisture content was determined using the incubator method at 105 ± 3 °C for 24 h, using two subsamples of 5 g of seeds from each lot.
The germination test was conducted using four subsamples of 50 seeds per treatment sown in a Germitest paper roll moistened with water at a ratio of 2.5 × the mass of the paper. The rolls were maintained in a germinator set at 25 °C, and the evaluations followed the criteria established by the RAS (Brasil, 2009). The following germination aspects were considered: normal vigorous seedlings, normal weak seedlings, abnormal seedlings, dead seeds, and hard seeds. Normal vigorous seedlings and normal weak seedlings were the company protocols.
The accelerated aging test was performed with four subsamples of 50 seeds placed on an aluminum screen distributed in a single layer in plastic boxes containing 40 mL of distilled water. They were then placed in an incubator at a constant temperature of 41 °C for 48 h (Marcos Filho, 1999). After the aging period, the seeds were subjected to the germination test, according to the RAS (Brasil, 2009), with a single evaluation performed at five days and the percentage of normal seedlings calculated.
For the tetrazolium test, the procedure described by França Neto et al. (1988) was followed using two subsamples of 50 seeds that were preconditioned in moistened paper rolls and maintained under these conditions for 16 h at 25 °C. These samples were subsequently placed in plastic cups, submerged in tetrazolium solution (0.075%), and kept at a temperature of 35 °C for 180 min in the dark. After reaching the perfect color, the seeds were washed and sectioned longitudinally through the center of the embryo axis and classified according to vigor, viability, moisture, mechanical, and stink damage.
During the analysis of seedling emergence in the sand, four subsamples of 100 seeds per lot were sown at a 5 cm depth with a spacing of 40 cm between rows. In addition, those seedlings that emerged 14 days after sowing were counted (Brasil, 2009).
The data generated using these vigor tests were utilized for the machine learning technique, with 93 rows containing 42 attributes considered for the supervised machine learning training database (Table 2), with 54 lots accepted for commercialization, 27 rejected, and 12 termed intermediate. For the processing and prediction of lots, the data had to be first preprocessed so that the tool could perform the correct reading and analysis. For this step, the data were obtained in .xls format, and all attributes were single row; each value in columns below its respective attribute. Subsequently, the file was converted to .csv format and the dataset was executed using Microsoft Windows Notepad software, replacing the "commas", when the assigned value was a decimal (number with commas), with "points" and "semicolons", which divide the columns of attributes into "commas". Rows with missing values or data considered erroneous were excluded from this process.
Four classifiers were used for data mining: J48, random forest, classification via regression (CVR), and multi-layer perceptron (MLP). The initial procedure was cross-validation, where the dataset, training, and test were divided into 10 subsets. This technique reduced the likelihood that coincidences underestimated or overestimated the performance of a given configuration. During the cross-validation without data duplication, the "Resample" filter was used to randomly produce a subsample of the dataset evaluated using sampling and maintain the distribution of classes toward a uniform distribution (Witten et al., 2011). The software informed of the ideal number of repetitions for training so the classifier could demonstrate its maximum performance to classify the dataset.
Weka software, version 3.8.5, developed by the University of Waikato, was used for the data mining (Eibe et al., 2020). When choosing which algorithms would be the most accurate, the following evaluation metrics were used: accuracy, precision, recall, F-measure, and area under a receiver operating characteristic (ROC) curve according to the methodology described by Lever et al. (2016). The values of true positives, false positives, true negatives, and false negatives extracted from the confusion matrix were used to calculate the recall and precision metrics using eqs (1) and (2) proposed by Medeiros et al. (2020). Finally, the best learning technique was determined based on the results obtained.
The process adopted can be better understood by the methodology described in Figure 1, demonstrating the steps adopted on the data generated in the quality control laboratory, where a set of data was formed that moved through the information treatment and proceeded to the training. Then, the data mining was established and, after performing the tests with the best algorithms, the values for decision-making were calculated. The data were also subjected to statistical procedures for comparing means using analysis of variance (ANOVA) and when a significant difference was found, the means were compared with Tukey's test at 5% significance.

RESULTS AND DISCUSSION
The evaluation of seed germination and vigor is essential for the sowing sector to measure the performance of different seed lots and improve the efficiency in storage and sowing processes, ensuring the crop's success. The selection of a high-yield seed lot results in germination close to 100% and vigor with a value near germination (Moraes, 2020). The applicability of traditional statistical methods in agricultural experimentation is mainly performed using a comparison of means (ANOVA), followed by a complementary test (e.g., Tukey's test) when significant results are obtained. It used various analyses or attributes for the seed sector's quality control of seed. For example, using ANOVA and Tukey's test makes selecting lots with several letters overlap or not differ from each other, as shown in Table 3; therefore, it is challenging to decide the best quality lots.
Thus, traditional statistical analyses make it difficult to decide on the classification of the vigor levels of seeds from several lots because it is common to have a high demand for evaluating seed lots in the seed industry. As shown in Table 3, the proposed statistical analysis does not allow the determination of lots with different vigor levels; therefore, the specialist needs to empirically establish the criteria for allocating lots, which may be for the disposal or commercialization of seeds. The pressure increases for the analyst. One lot has 30,000 kg, of approximately 750 bags worth US $70.00 each, and this value might directly influence the analyst's decision. TABLE 3. Comparison of means using physiological performance tests of different cultivar seeds using four sieve sizes tested immediately following harvest and after six months of storage.
Means followed by the same letter (uppercase in the column and lowercase in the row) do not differ by Tukey's test at 5% probability. *TSW (thousand seeds weight), MC (moisture), TZ VIGOR (tetrazolium -vigor test), TZ VIAB. (tetrazolium test -viability), AA (accelerated aging), AA NS (accelerated aging -vigorous normal seedlings), AA NW (accelerated aging -weak normal seedlings), AA AN (accelerated aging -abnormal seedlings), AA D (aging accelerated -dead seeds), AA H (accelerated aging -hard seeds), G (germination pattern), G NS (germination pattern: vigorous normal seedlings), G NW (germination pattern: weak normal seedlings), G AN (germination pattern -abnormal seedlings), G D (germination pattern -dead seeds), G H (germination pattern -hard seeds), and E (emergence in sand). The dataset evaluated in the present study using the machine learning models detected vigor levels based on germination performance. The suggested models achieved high mean accuracy values, above 60%, suggesting a highly significant predictive power. The training set for each model comprised 81.7% of data correctness for random forest and CVR, 79.6% for J48, and 74.2% for MLP, the latter being that with the lowest performance methods. In a study of predicted germination, Genze et al. (2020) found significant predictive power values > 90%; thus, obtaining a more accurate germination index using machine learning.

IEVES
The evaluation components established regarding the stratification results of the soybean seed lots showed high values for detecting the physiological aspects of the seeds, where the random forest algorithm-generated 92.6% recall and 90.9% recall average accuracy relative to the actual test dataset for the accepted data class. This algorithm showed 92.6% recall and 73.59% accuracy for the reject class. The intermediate class exhibited an 8.3% recall and 25% accuracy (Table 4). Medeiros et al. (2020) studied an approach based on interactive and traditional machine learning methods to classify soybean seeds and seedlings based on their morphological characteristics and physiological potential. They obtained values with 93% precision, highlighting its good performance in classifying seeds based on their morphology (size, color, and damage). In studies of synthetic datasets for seed phenotyping, Toda et al. (2020) analyzed neural networks and obtained 96% recall and 95% average accuracy for the test dataset of the actual data.
The F-measure obtained mean values of recall and precision, facilitating the interpretation of only one metric instead of two or more (91%). Consequently, the classes with the highest values were accepted and rejected. The area under a ROC curve shows the relationship between the sensitivity and specificity of the classifier; the higher the value, the more adjusted the curve. However, when observing the data in Table 4 for classifiers J48 and MLP, the area under a ROC curve was higher for the reject class (0.94 and 0.89). Therefore, the ROC curve was better defined in the reject class than in the accepted class. Hussain & Ajaz (2015) conducted a study on seed classification using Weka software and found 93.8% recall, 93.8% F-measure, and 98.9% ROC area. The 10-fold CVR classifier had 95.2% recall, 95.2% F-measure, and 99.6% ROC area using 10-fold MLP as a classifier, highlighting the good performance of the results found in Table 4.
The analysis of the decision trees generated by the CVR showed the practicality of understanding decision-making for the separation attributes of the lots. The best performance in the tests showed high vigor values through laboratory analyses in the quality control and segregation of sieves on seed sizes. Figure 2 shows the decision-making following a sequence defined by the best numerically expressed attribute obtained in the vigor tests. Germination is a standard test and is required by the Ministry of Agriculture, Livestock, and Supply (MAPA) as a mandatory item for seed commercialization in Brazil. The MAPA Normative Instruction 45 of 2013 established the commercialization pattern of soybean seeds, with at least 80% germination. Thus, the germination test is used to compare the quality of different lots and characterize the physiological quality, establish parameters for commercialization, and determine the sowing rate to determine the maximum germination potential (Marcos Filho, 2015).
From the decision tree of the first stage ( Figure 2-A), the seed lots met the criteria pre-established by the supervisory body, with 84.5% accepted. Therefore, the initial quality of soybean seeds is fundamentally important for storage (Vergara et al., 2019b). From the vigor analysis, the physiological quality variability of seeds was observed, even in the production field; therefore, there was unevenness in the vigor of the same lots (Gazolla- Neto et al., 2015). However, the results generated by the decision tree met the criteria of the present study, and it was necessary to establish other standards for decision-making to analyze the parameters of each species or how the shape of the data should be treated.
The proposed methodology is adequate because the precision to discriminate seeds in their different classes of accepted lots was high, ranging from 0.85 to 0.90 accuracy between the machine learning models. However, human effort is still essential because manual tests must meet the standards (Boelt et al., 2018).
Another highlighted attribute was the tetrazolium test, the first criterion used in decision tree 2 (Figure 2-B). Tetrazolium analysis allows for the quick determination of seed viability, even for the most dormant seeds, compared to the germination test. It is crucial in the global seed market at present, where the industry requires reliable information on the viability of seed lots quickly to make decisions on seed commercialization and sowing (Soares et al., 2016).
We can still visualize other data in the tetrazolium treatment, such as moisture damage. Vergara et al. (2019a) stated that moisture-related damage significantly affected the physiological quality of soybean seeds suffering from a delayed harvest. In a study on soybean and precision agriculture, Vergara et al. (2019b) showed that fields with bug and moisture damage had low physiological quality and reduced protein levels.
In a study by Moraes (2020), the decision tree was based on vigor, although only one vigor test was used, suggesting a more significant number of vigor tests for future studies, as performed in the present study. Our study considered several vigor tests and the same tests were performed over two periods (initial and six months of storage). According to Tillmann et al. (2019), vigor tests are of fundamental importance for a more efficient classification of lots. There was also an increase of efficiency in the present study with greater accuracy in the "rejected" and "intermediate" classes because more attributes were obtained based on vigor.
The data from seed analyses are unbalanced, especially for companies with high lot quality (Table 3). A resample filter was used to solve the problem and not bias the algorithm. Unbalanced learning is a classification problem in which the number of observations of one class far exceeds that of another class. The subsampling technique is the best technique among conventional approaches to managing this problem (Sarada & Devi, 2019). Here, the feature resampler technique was used to resample the data. The selection of resources is vital because it decreases the dimensionality of the data and helps the classifier function faster, thus improving its accuracy (Sarada & Devi, 2019). Oliveira et al. (2021), using fermented cocoa beans, also had unbalanced data and stated that the classes with more data had higher accuracy and precision values than the other classes. These classes would more often be classified incorrectly by classifiers sensitive to unbalanced data. If these classes are classified incorrectly, they influence the values of the performance metrics of their respective classes owing to their small number.
Detailing the accuracy resulting from the classification obtained by the algorithms, we found that the CVR had a lower rate of false positives in the three classes and greater accuracy in the rejected and intermediate classes. These latter classes are the most complex for making decisions. Thus, the higher the accuracy, the better, and we obtained 79% accuracy in the rejected class, which was promising.
CVR is a classification method that can transform problems into regression functions (Yu-Xun et al., 2014). This method combines the principles of the decision tree algorithm and linear regression in several constructed subtrees (leaves) and involves two main steps. First, an ordinary decision tree is delimited, maximizing the separation of criteria/parameters/attributes and their variations based on the target/output values. This was achieved by calculating the deviation reduction. Then, subdivisions of this tree are placed into several possible subtrees and, according to the regression function (linear model), usually in the leaves (Arora & Dhir, 2017). The data in the present study were quantitative; therefore, regression combined with the decision tree was a more assertive solution.
According to Pinheiro et al. (2021), the datasets used in machine learning training are usually enormous; therefore, manual analysis generates time-consuming responses. Furthermore, when information on cultivar, treatments, purity, germination, and other quality attributes is generated, the work becomes slow and inefficient decision-making. Therefore, testing classifier models is essential to match the algorithm's performance for the dataset provided (Pinheiro et al., 2021).
According to Jha et al. (2019), the only purpose of machine learning is to feed a system with data from previous experiences and statistical values to perform its assigned task and thus solve a specific problem. Therefore, machine learning is a mathematical approach to building intelligent systems.

CONCLUSIONS
It was possible to classify numerous soybean seeds with great accuracy and precision using artificial intelligence and machine learning techniques. The best algorithms were random forest and CVR when applying the machine learning technique. In addition, the feature resampler technique was necessary for solving the data imbalance problem.