Use of near infrared spectroscopy in cotton seeds physiological quality evaluation

: This study aimed to evaluate the near-infrared spectroscopy potential in analyzing the quality of cottonseed regarding different physiological quality levels, noting the need for faster techniques and tools to aid decision making. It was used eight samples of cottonseed with and without lint, presenting different physiological quality. The “high” (lots 1, 4, 5, 6 and 7) and “low” (lots 2, 3 and 8) vigor levels were defined based on vigor tests carried out and on the Normative Instruction 45/2013. The near infrared spectroscopy spectra was obtained from four types of sample preparations: whole seeds, cut in a half, without tegument and grounded seeds. Using the spectra and the grouping of lots in high and low vigor, cross validation models were optimized, built using the PLS - DA method, making it possible to predict seed classes. Grounded seeds were the best type of sample preparation, with 95% of correct predictions for high vigor seeds and 100% of low vigor (both for seeds with lint) and with 100% correct predictions for high vigor seeds and 91.7% low vigor (without lint).


INTRODUCTION
The cotton crop represents an important part of generating income and employment on a national and global scale. Its cultivation is mainly destined to the production of fiber, in which, the harvest of 2019/2020 the production estimate was about 2,853.7 thousand tons of cotton plume. The total area planted in Brazil was 1,670.80 thousand hectares, which is the largest of the last five harvest years (CONAB, 2020).
Considering all the technology used for cotton production, it is important to use high quality seeds in order to obtain satisfactory results in the field. Mattioni et al. (2012) observed that the use of seeds of medium and low quality originate plants that cannot be equated, in development and productivity, with those of high-quality seeds.
In the post-harvest stages of cottonseed, the evaluation of the seed lots allows estimating their aggregate value for both reception, delineation and commercialization purposes, as well as to predict the number of seeds required for planting. Thus, vigor tests that differentiate the physiological potential of the materials are carried out so that complementary information is obtained for the internal quality control of the companies producing cottonseed, but are generally, they are more time-consuming evaluations.
Seeking faster responses with consistent results with the actual quality of cottonseed, the near-infrared spectroscopy combined with chemometric methods can represent a promising alternative for such information.
The advantages of this equipment are the ability of the analyzes to be carried out successively, in a short period of time. Capable of generating a large quantity of information, with less need of labor, speed, less cost, not polluting, does not use chemicals or reagents and it can be non-destructive (Amorim, 1996).
This research aimed to evaluate the potential of the near-infrared spectroscopy using different seed sample procedure to determine the physiological quality of cottonseed with and without lint.

MATERIAL AND METHODS
The experiment was conducted at the Central Laboratory of Seed Analysis, and in the Laboratory of Seed Pathology of the Universidade Federal de Lavras -UFLA, Lavras, Minas Gerais. The same was conducted in two parts. In the first, physiological tests were carried out to determine the physiological quality levels of cottonseed samples. In the second, the samples were analyzed in the near-infrared spectroscopy equipment for the detection of these different levels of cotton seed quality.
The seeds used were supplied by the company Bayer®, in which these samples were from eight lots of different physiological seed qualities, in which each sample was divided into seeds with and without lint, and to identify each sample it was given a numbering from 1 to 8.
Seedling emergence: four replications of 50 seeds per treatment were used under uncontrolled environmental conditions. The substrate of the bed was composed of sand and soil in a ratio of 2:1. The counting of emerged seedlings was made on the 12 th after sowing. The results were expressed as a percentage.
Seedling stand: conducted together with the seedling emergence test, counting was registered on the 7 th and 21 st day after sowing, with the results expressed as a percentage of the number of emerged seedlings on the respective counting days.
Speed of seedling emergence: also conducted with the seedling emergence test, where the number of seedlings emerged was counted daily until day 14 th , using the counting parameter of Maguire (1962).
Tetrazolium: four replications of 50 seeds were used for each treatment. The seeds were soaked between germitest paper moistened with distilled water at 25 °C for 16 hours. Then the seed tegument was removed and stained in 0.075% tetrazolium salt for 4 hours at 30 °C, in the absence of light. The results were expressed as a percentage of vigorous and viable seeds, according to Vieira and Von Pinho (1999).
Seed health: eight replications of 25 seeds arranged on two sheets of filter paper were used in Petri dishes, then 2.4-D, distilled water, and agar were added for soaking the filter paper. The Petri dishes were incubated seven days in a chamber for 12 hours at 20 °C. Each seed was analyzed individually with the aid of a microscope to identify the incidence of fungus. The results were expressed in percentage. The filter paper method was made, as described in Brasil (2009b).
The data were submitted to analysis of variance by the SISVAR software and the means were compared by the Tukey test (p < 0.05) (Ferreira, 2011), except for the sanity test, due to the large data variation.
Based on the results of the physiological tests, the quality of the cotton samples was classified using the marketing standard used by the Ministério da Agricultura Pecuária e Abastecimento (MAPA), dated 09/17/2013 -IN 45/2013 (Brasil, 2013). Where cottonseed to be commercialized, must be without lint and have at least 75% of germination. Thus, two levels of physiological quality (high and low) were considered, being then, samples with values above 75% of germination considered of high quality and below this value of low quality. This classification was performed for later comparison with the data obtained in the analysis of near-infrared spectroscopy.

Near-Infrared Spectroscopy
For each sample of cottonseed with and without lint, four types of seed preparation were tested: Whole seed: did not have any pre-treatment, which was exposed directly to the output of the infrared light. The number of seeds used was 100 units, each representing one replication.
Seed cut in half: each seed was sectioned longitudinally with a scalpel and exposed to the infrared light, where the seed cut band was faced the light right after cutting to avoid oxidation. The number of seeds used was 100 units, each representing one replication.
Seed without tegument: the tegument was removed from the seed using a scalpel and exposed directly to the infrared light right after the procedure to avoid oxidation of the sample. The number of seeds used was 100 units, each representing one replication.
Grounded seed: a mill was used, with the addition of liquid nitrogen and polyvinylpyrrolidone into 40 seeds per replication, to avoid oxidation of the sample, and then the powder was placed in cuvettes for exposure to the infrared light. It was used 20 replications in the total of each treatment.
Near-infrared Spectroscopy: the spectra were obtained by placing the samples directly or buckets (ground seeds), at the output of the infrared source of the tensioner device 27 Bruker®, generating spectra by FT-IR detector (Fourier-Transform Near-Infrared ) coupled with the help of OPUS_ Spectroscopy software version 6 from the same equipment manufacturer. To compile the reading database, the spectrometer collected 48 scans at each measurement of absorbance, with a resolution of 8 cm -1 , in the range 10000 to 4000 cm -1 , per replication.
Multivariate analysis: a cross-validation model was optimized from the grouping of the cottonseed samples in high and low quality, using 3/4 of the samples for calibration and 1/4 for the test, constructed from the method of multivariate classification by partial least squares regression with discriminant analysis (PLS-DA) with multiplicative Journal of Seed Science, v.42, e202042016, 2020 scatter correction preprocessing of the data. The wavelength between 8000 cm -1 and 10000 cm -1 were removed from the analysis due to noise. It was used the Pirouette® statistical software, with the Y classes being the dependent variables and the obtained spectra constituting the independent variables X (Abdi, 2003).
The sensitivity and specificity of the optimized models were obtained by dividing the number of predicted samples of the class by the total number of predictions of the class, and dividing the number of samples predicted as not being of the class by the total number of samples that are not of the class, respectively (Szymanska et al., 2011), in the validation stage.

RESULTS AND DISCUSSION
The variance analysis of the physiological tests showed significant differences between the cottonseed samples studied, which was possible to determine the two levels of seed quality desired for this work, as high and low physiological quality levels.
The results of germination (Table 1) for cottonseed with lint presented better quality to the samples one, two, four, and six; otherwise, the samples three, five, seven, and eight showed lower quality. Cottonseed without lint had a higher percentage of germination for samples one, four, five, six, and seven compared to samples two, three, and eight. The presence and absence of the lint in the seed could affect the physiological quality in different ways. Seeds with lint can have low quality when the lint makes the perfect media for fungus proliferation (samples four, five and seven); otherwise, it can also help to protect the seed from fungus as a barrier (samples one, two and six).
Seeds without lint can have a decrease of germination if the sulfuric acid treatment passes through the tegument damaging the embryo, and this could happen if the seed has physical damage or the sulfuric acid damages the tegument (samples two and three). Thus, if the treatment is successful in only removing the lint, percentages of germination can be increased (samples four, five, six and seven), besides making it easier for seed sow in the field. According to the results of cottonseed germination between the treatments with and without lint, the authors Queiroga et al. (2009) found that seeds without lint had better physiological qualities.
For the commercialization of cottonseed, it is required that the seeds are without lint and present above 75% germination, as indicated in the Normative Instruction 45 of Brazil's Ministry of Agriculture - (Brasil, 2013), which is routinely used by the cottonseed producing companies. From this premise, it was defined based on the germination test (Table 1) of cottonseed without lint that samples one, four, five, six and seven were eligible for commercialization, and samples two, three and eight did not fit in this category, which was used as a parameter to determine high and low-quality seed samples.
In the evaluation of seedling emergence (Table 2) from seed with lint, it was possible to observe superiority for sample 1 and inferiority for samples four, seven and eight, with the others presenting intermediate values of vigor. In seeds without lint samples, three and eight had low vigor compared to the others. When it was compared the vigor of seeds with and without lint, only sample two lowered vigor after the delinting.
The speed of seedling emergence (Table 2) seedlings with lint showed to be similar to most of the samples, differentiating the sample one with the highest emergence speed and the sample eight the lowest. For seeds without lint, there was more significant variation between samples, but sample one remained with higher value, samples three and eight as inferior, and the others presented intermediate values for this variable.
The results of the stand (7 and 21 days) for cotton seeds without lint were similar to those observed for germination, with samples three and eight having the lower stand, which differed statistically from the others; the stand (7 and 21 days) of seeds with lint had sample eight showing the lower percentage.
Observing the results obtained from seedling emergence in the stand of 7 and 21 days (Table 2), it could not be seen any relevant variation from the seventh to the twenty-first counting day. It could be inferred that the physiological quality and the presence or absence of lint did not influence the establishment of the seedling stand.
Comparing the results of germination, seedling emergence, speed of seedling emergence and stand of 7 and 21 days, it could be seen that sample eight presented inferiority in all the tests, for presence or absence of lint. These tests were also able to differentiate the sample three as inferior physiological quality, except when evaluated seed with lint (Table 2).
However, sample two showed no differentiation from the samples with higher vigor. That might have occurred because this sample has intermediate vigor between high and low-quality seeds, with no differences detected in these tests (Steiner et al., 2011).
The last evaluation made it identify the quality of the eight cottonseed samples was the tetrazolium test, and cotton seeds with lint presented for most of it higher or similar percentages of vigorous and viable seeds compared to the seeds without lint (Table 3).
In the seed production system, cotton seeds undergo a long period of processing. The physical processes that these Journal of Seed Science, v.42, e202042016, 2020 seeds undergo (harvest, fiber withdrawal, delinting) cause irreversible damages, reducing the physiological quality that can be detected in germination and vigor tests. The tetrazolium can be used in different stages of the processing, which make it able to monitor the origin of the damage, the severity and its extension, representing an important tool for the evaluation of the cottonseed quality (Mattioni et al., 2012;Zorzal et al., 2015). As for the separation of the vigorous seed samples without lint, the same results were observed as the germination test and similarities with the other tests, samples two, three and eight were statistically inferior to the others. The mechanical damages represented the main causes of the low vigor of the evaluated seed with and without lint. In the results of viable seeds, similarities to the other tests were also observed, as samples three and eight having the lower means.
Due to the high amplitude of the seed health test, the data was not presented, no correlation that could have influence directly physiological quality was found.
The fungus incidence in the seeds with and without lint was: Colletotrichum gossypii, Fusarium oxysporum, Penicillium sp., and Aspergillus sp. The presence of pathogens in the seeds implies a lower health quality of the seeds and may impair the storage and performance of the plants in the field (Gama et al., 2012;Pedroso et al., 2010).
According to the results of the physiological tests, basing the NI 45 of cottonseeds commercialization, the samples were classified in high (samples one, four, five, six and seven) and low quality (samples two, three and eight) to be used as a parameter for the near-infrared spectroscopy test.
Means for NIR spectra from samples of high and low physiological quality, with and without lint, are shown in Figures 1 and 2. The data were compared by using multiplicative scatter correction (MSC), using the Chemoface ® software (Nunes et al., 2012).
Molecular absorptions in NIR spectroscopy are not intense, which may result in overlapped bands, making them wide, lowering the sensitivity, and making it necessary to use calibration models. Thereafter, statistical tools may be associated with chemical data. The PLS-DA method used seeks the linear relationship between the dependent variables "Y" (estimation of the variables of interest) and independent variables "X" (spectra), through the regression by partial least squares (PLS), which creates correlations between similarities and structural differences between the compounds, allowing the interpretation of a series of complex data with a large number of variables.
Although only minor visual differences for spectra means were observed (Figures 1 and 2), both for high and low physiological quality, such difference were detected by the models. Furthermore, spectra means were similar for cotton seed samples with and without lint from the same preparation type, as well as among different preparation types   Journal of Seed Science, v.42, e202042016, 2020 (whole seeds, cut in a half, without tegument and grounded seeds). As a supervised pattern recognition method, PLS-DA requires some type of information to build the model in advance, thus the "high quality" and "low quality" classes were added as the dependent variables (y) for the construction of the calibration model.
In Table 4 is presented multivariate analyses of the near-infrared spectroscopy test, and better results were obtained for the grounded seeds with and without lint, which separated samples with a high percentage. This type of sampling method has a larger contact surface to the penetration of the radiation light to the compounds of the integument and embryo at the same time, representing a greater quantity of sources of discrimination, for better predictions between the different seed quality. This type of sampling method also homogenizes the analytical material, since even in good seed sample; there may be lower quality seeds, which could influence the result.
The models obtained for the grounded seeds with lint (Table 4), reached 95% and 100% of correct answers in the external validations for high and low quality, respectively. It is important to observe the results of the external validation, since it indicates the predictive capacity of the model for seeds that did not participate in the construction of the same, indicating that it is robust and capable of evaluating different samples.
Models for the high-quality grounded seeds without lint it was obtained 92.3% of hits answers, 4.6% of error and 3.1% of seeds not recognized in the cross-validation, otherwise, for the external validation, there was 100% of correctness. For the low-quality samples, there was 97.4% accuracy in the cross-validation and 91.7% in the external validation.
In the evaluation of the performance of the optimized models, values of sensibility and specificity had 1.00 in both values for a model (Table 4); for Forina et al. (1991), this is a perfect model.
In the sampling method that used whole seeds for the modeling, it was observed a great percentage of seeds Table 4. Mean percentages of hits and error in the crosses and external validations, and values of sensibility and specificity of the optimized models for seed sampling (whole, cut in half, without tegument and ground) of cotton seeds, with and without lint, correlated with the high (samples 1, 4, 5, 6 and 7) and low (samples 2, 3, and 8) physiological quality classification using the technique of near-infrared spectroscopy, with the aid of PLS-DA. classified correctly. Similar the sampling method of grounded seeds. The separation of seed samples with lint obtained lower percentages of accuracy compared to seed the samples without lint, both for high and low-quality samples (Table 4). This difference may have occurred when the lint influenced the scattering of the infrared lights into the sample, acting as a thin physical barrier for the penetration of the radiation. Using the near-infrared spectroscopy allied to chemometrics, Soares et al. (2016) classified cotton seed four high genetic quality cultivars (no lint) by PLS-DA method, obtaining 96.91% correctly classified cultivars at the first validation, and 88.66% in the second validation. Thus, it is observed a potential use of the whole cottonseed without lint for the classification between different parameters by the PLS-DA method allied to NIR spectroscopy.
The values of sensibility and specificity obtained in the models optimized for the whole cottonseed were more dissimilar than the grounded seed values when comparing the results of the seeds with and without lint.
From the cut in half cottonseed, less efficient models were obtained than those previously presented. It was observed the occurrence of a super fit model for the cut in half seed of low vigor with lint, as it presented good results for the samples belonging to the calibration set, but failed to predict external samples ( Table 4). The values of sensitivity and specificity obtained in these optimized models were also more discrepant comparing the results of the seeds with and without lint. Oxidation of the seed after the cut could have been the main issue in this sampling method.
The models generated for cottonseed without tegument did not present a high percentage of correct answers. There was disagreement between the classifications of the high and low-quality samples, counting on super adjusted models and the lower values of sensitivity and specificity obtained among the different modes of preparation. The damages caused when the tegument was removed could be an issue making such a large variation for the validations.

CONCLUSIONS
The chemometric analysis (PLS-DA), allied to near-infrared spectroscopy, made it possible to predict the physiological quality of cottonseed with good accuracy, representing a promising technique.
The models optimized for grounded cottonseed, with and without lint, promote the percentage of correctness and higher values of performance among the other models.