ABSTRACT:
The use of high-quality seeds is essential in production systems. NIR spectroscopy, combined with chemometric methods, is a promising, rapid, nondestructive, and easy-to-use tool for assessing seed physiological potential. This study aimed to analyze the feasibility of NIR spectroscopy coupled with chemometric methods to detect differences in the physiological quality of canola seeds. Seeds from eight lots of the Nuola 300, Hyola 575, and Diamond cultivars underwent initial characterization tests, seedling length tests, and X-ray analysis. For spectral acquisition, random seed samples were selected, with 12 readings taken for each cultivar. The cultivars were categorized as high (C1) and low (C2) vigor. The spectra were preprocessed using methods such as Multiplicative Scattering Correction (MSC), Standard Normal Variance (SNV), and the first and second Savitzky-Golay derivatives. These data were subsequently used to build classification models using Partial Least Squares Discriminant Analysis (PLS-DA). The model demonstrated high accuracy and kappa coefficients, with the SNV method being the best suited for the dataset. The wavelength regions between 1,004-1,064 nm and 1,698-1,907 nm were the most relevant for distinguishing seed quality levels. Analysis of near-infrared (NIR) spectra, subjected to preprocessing based on derivative methods and dispersion correction, demonstrated high efficiency in detecting variations in the physiological quality levels of canola seeds.
Index terms:
Brassica napus L. var. oleifera; chemometrics; oilseed; physiological potential
RESUMO:
O uso de sementes de alta qualidade é essencial em sistemas de produção. A espectroscopia NIR, combinada com métodos quimiométricos, é uma ferramenta promissora, rápida, não destrutiva e fácil de usar para avaliar o potencial fisiológico de sementes. Este estudo teve como objetivo analisar a viabilidade da espectroscopia NIR, acoplada a métodos quimiométricos, para detectar diferenças na qualidade fisiológica de sementes de canola. Sementes de oito lotes das cultivares Nuola 300, Hyola 575 e Diamond foram submetidas a testes iniciais de caracterização, testes de comprimento de plântulas e análise de raios X. Para aquisição espectral, amostras aleatórias de sementes foram selecionadas, com 12 leituras feitas para cada cultivar. As cultivares foram categorizadas em alto (C1) e baixo vigor (C2). Os espectros foram pré-processados usando métodos como Correção de Espalhamento Multiplicativo (MSC), Variável Normal Padrão (SNV) e a primeira e segunda derivadas de Savitzky-Golay. Posteriormente, esses dados foram utilizados para a construção de modelos de classificação por meio da Análise Discriminante por Mínimos Quadrados Parciais (PLS-DA). O modelo apresentou alta precisão e coeficientes kappa, sendo o método SNV que se aplica melhor para o conjunto de dados. As regiões de comprimento de onda entre 1.004-1.064 nm e 1.698-1.907 nm foram as mais relevantes para a distinção dos níveis de qualidade das sementes. A análise dos espectros do infravermelho próximo (NIR), submetidos a pré-processamento com base em métodos derivativos e correção de dispersão, demonstrou alta eficiência na detecção de variações nos níveis de qualidade fisiológica das sementes de canola.
Termos para indexação:
Brassica napus L. var. oleifera; quimiometria; oleaginosa; potencial fisiológico
INTRODUCTION
Canola (Brassica napus L. var. oleifera) is an oilseed crop of global relevance, ranking as the third most important in the world (USDA, 2023). Its grains have a protein content of 21% to 33% and 42% to 48% oil (Yuldasheva, 2023). This crop plays an important role as a source of vegetable oil for human consumption, biofuel production, and various industrial applications (Raboanatahiry et al., 2021). In this context, seeds play an indispensable role in the production of canola seeds, since in Brazil there is a need for more information (Gularte et al., 2020). The physiological quality of seeds is primarily assessed by the germination test; however, it is recommended that this evaluation be complemented with vigor tests, since germination alone reflects the maximum potential under optimal conditions and does not provide information on seed performance under stress or field conditions. Vigor tests allow a more complete characterization of seed quality, providing greater reliability in predicting field emergence and crop establishment (Marcos-Filho, 2015). However, changes in the physiological potential in seeds can be diagnosed using near-infrared (NIR) spectroscopy, based on absorbing electromagnetic radiation between 780 nm and 2500 nm. This technique provides information related to hydrogen groups, such as C-H, N-H, and O-H, allowing the analysis of the composition of organic matter (Ozaki et al., 2016; Hills, 2017). Electromagnetic radiation depends on the interaction of compounds, which are absorbed by water, carbohydrates, lipids, and proteins (Larios et al., 2020).
When associated with chemometric methods, the spectral data obtained from the FT-NIR instrument allow detection of variations in the biochemical profile of the analyzed sample. In the seed technology, it is assumed that this compositional difference in different lots is a characteristic that can be used for the classification of vigor (Fan et al., 2020) and viability (Kusumaningrum et al., 2018). NIR spectroscopy has already demonstrated efficiency in classifying the seed quality of several Brassica species, such as cabbage, radish (Shetty et al., 2011), and spinach (Shetty et al., 2012). However, the validation to discriminate levels of physiological quality in canola seeds can represent a significant advance in the evaluation of seed quality with greater precision and efficiency. For instance, the germination test in canola requires about eight days to be completed (ISTA, 2023), which often delays decision-making in seed production chains. In addition, conventional tests depend on the subjective interpretation of analysts, which may lead to variability in results. By contrast, Near-Infrared (NIR) spectroscopy combined with chemometric methods can provide faster, more objective, and potentially more accurate assessments, reducing human error and operational costs. This approach does not aim to replace traditional methods but to complement them, offering reliable support to seed quality control programs. Thus, the objective of this study was to analyze the feasibility of NIR spectroscopy, together with chemometric methods, for detecting differences in the physiological potential of canola seeds.
MATERIAL AND METHODS
The experiments were conducted at the Department of Agronomy and of Universidade Federal de Viçosa (UFV), in Vicosa, Minas Gerais state (20°45’14” S, 42°52’54” W, altitude of 648 m), Brazil. Three lots of the Nuola 300 cultivar were used: Lot 1 (Lavras - MG, 2020), Lot 2 (Diamantina - MG, 2021), and Lot 3 (Lavras - MG, 2019). Two lots of the Hyola 575 CL® cultivar were included: Lot 4 (Diamantina - MG, 2021) and Lot 5 (Lavras - MG, 2020). In addition, three lots of the Diamond cultivar were evaluated: Lot 6 (Cascavel - PR, 2018), Lot 7 (Lavras - MG, 2021), and Lot 8 (Lavras - MG, 2020). Three lots of the Nuola 300 cultivar were used: Lot 1 (Lavras - MG, 2020), Lot 2 (Diamantina - MG, 2021), and Lot 3 (Lavras - MG, 2019). Two lots of the Hyola 575 CL® cultivar were included: Lot 4 (Diamantina - MG, 2021) and Lot 5 (Lavras - MG, 2020). In addition, three lots of the Diamond cultivar were evaluated: Lot 6 (Cascavel - PR, 2018), Lot 7 (Lavras - MG, 2021), and Lot 8 (Lavras - MG, 2020).
Physiological and physical analysis
Moisture content (MC): obtained via the oven method at 105 °C for 24 hours (Brasil, 2009). Two replications were used for each lot, with a sample weight of 1 g of seeds per replication.
Germination test: The seeds were sown in gerbox containers with a paper substrate moistened with water at a ratio of 2.5 times the mass of dry paper, and placed in a germinator at 20 °C under constant light (Brasil, 2009). Assessments were performed on the fifth day (first germination count - FC) and terminated on the seventh day (final germination count - G), counting normal seedlings. Daily counts allowed for the calculation of the Germination Speed Index (GSI), as proposed by Maguire (1962).
Electrical conductivity (EC): performed via the methodology described by Kaefer et al. (2014) for canola seeds, in which 50 seeds were soaked in 75 mL of water, for 4 hours, and the results were expressed in μS.cm-1.g-1.
Seedling length: At the end of the germination test, normal seedlings from the final count of the germination test were transferred from the germination paper to the photographic base, along with an object of known size for scale calibration. The images after being taken were stored and later inserted into the ImageJ® software for measurements of the area and root part. The R software was used, through the SeedCalc package (Silva et al., 2019), and the shoot length (SL), average root length (RL), total length (TL), root-to-shoot length ratio (RSL), uniformity index (Unif), growth index (Growth), vigor index (Vigor).
Physical analysis (X-ray test): 100 seeds from each lot were fixed on adhesive paper to allow the analysis of each seed along with the respective generated seedling. The images were acquired using an MX-20 device (Faxitron X ray Corp. Wheeling, IL, USA). The seeds were exposed to radiation for 10 seconds at a voltage of 23 kV, with a focal distance of 41.6 cm. The image contrast has been calibrated at 970 (wide) x 2300 (center). The images were processed using the ImageJ® software. The variables obtained in this analysis were: area (mm), relative density (RealDen), and integrated density (IntDen) (pixel-1.mm2 value).
Acquisition of NIR spectra
The spectra were obtained through the random selection of whole seed samples from each lot, arranged in an overlapping manner at the opening of the equipment. The procedure consisted of four replications, each with three readings, totaling 12 spectra for each lot. The readings were conducted using the Thermo Scientific spectrometer, Antares II FT-NIR analyze (Figure 1), in the wavelength range of 1,000 to 2,500 nm, in reflectance (R) mode. The TQ Analysis software, associated with the spectrometer, was used to record the spectra.
Design of experiments and statistical evaluation
For the initial characterization, seedling length, and X-ray tests, a completely randomized design (DIC) with four replications was used. The data were assessed for the error normality using the Shapiro-Wilk test and for variance through the Bartlett test. Subsequently, an analysis of variance was performed, and the means for each cultivar were compared using the Scott-Knott test at 5% significance level (Table 1).
The lots were classified according to the physiological performance obtained by the initial characterization tests, seedling length, and X-rays. The variables were submitted to principal component analysis (PCA), for later cluster analysis with the K-means method. The lots were classified according to classes C1 - High vigor (Lots L4, L7, L8 and L9) and C2 - Low vigor (L1, L2, L3 and L5). Subsequently, the spectral data were subjected to pre-processing methods, including Multiplicative Scatter Correction (MSC), Standard Normal Variate (SNV), and the first and second derivatives of Savitzky-Golay. For each pre-treatment, a classification model was verified using Partial Least Squares-Discriminant Analysis (PLS-DA) (Barker and Rayens, 2003), with 70% of the data allocated for training and 30% for testing with samples not used in training.
The efficiency of the models was evaluated using the confusion matrix, and accuracy and kappa coefficient values were obtained for the train and test data. The efficiency of the classification models was assessed using the confusion matrix, from which the accuracy and Cohen’s kappa coefficient were calculated for both training and test datasets, following the approach proposed by (Cohen, 1960). This procedure allowed us to evaluate the reliability and robustness of the classification. Additionally, the most important wavelength ranges for the vigor classes were identified for constructing the classification model. Data analysis was performed using the R 4.0.2 software (R Core Team, 2022).
RESULTS AND DISCUSSION
Moisture content varied between 8.8% and 10.7% (Table 1), remaining within the maximum tolerable limit of 2.0 percentage points for standardizing analyses and reliably assessing the physiological potential of seed lots (Marcos-Filho, 2015). Determining seed moisture content is essential for official seed lot quality testing, as it directly influences various aspects of seed physiological quality (Sarmento et al., 2015).
Average results for moisture content (MC), first germination count (PC), germination (G), germination speed index (GSI), electrical conductivity (EC), shoot length (SL), average root length (RL), total length (TL), root-to-shoot length ratio (RSL), uniformity index (Unif), growth index (Growth), vigor index (Vigor), area, relative density (RelaDen), and integrated density (IntDen), obtained for the assessment of the physiological quality of canola seed
The moisture values obtained for the canola seed lots (ranging from 8.8 % to 10.7 %) show a moderate dispersion that can have significant repercussions on spectral interpretation: different moisture ranges alter the intensity and baseline of the NIR spectra because of the water absorption bands (O-H), which can mask or bias real differences associated with the physiological quality of the seeds. According to Szulc et al. (2020) NIR spectroscopy is commonly used for non-destructive moisture determination in seeds and requires rigorous calibrations to compensate for the effects of water on the spectra.
Figure 2 presents the multivariate principal component analysis (PCA) used to assess variability among canola seed lots and distinguish them based on different physiological quality levels. The two principal components explained 70.8% of the cumulative variance (PC1 47.9% and PC2 22.9%) (Figures 2A and 2B).
Biplot of principal component analysis (PCA) (A) and Cluster (B) with class separation First germination count (FC); germination (G); germination speed index (GSI); electrical conductivity (EC); shoot length (SL); average root length (RL); total length (TL); ratio of average root length to shoot length (RSL); Uniformity index (Unif); growth index (Growth); vigor index (Vigor); area; relative density (RelDen); integrated density (IntDen).
The PCA (Figure 1A) presents the vectors of the initial characterization of the lots. Mean root length (CPR), growth index (Cresc), vigor index (Vigor), total length (TL), and uniformity (Unif) were positioned to the left of the ordination diagram. Therefore, individuals located in the negative scores of PC1 present considerably higher contribution values for these variables. The variables area (Area), integrated density (DenInt), and density (Density) are located in the positive scores of PC2 and contributed significantly to the classification of the lots into different levels of physiological quality. The physiological performance tests established in the initial characterization tests, seedling length, and X-rays, were used to form two classes (clusters) among the canola lots (Figure 1B): class 1, representing the highest physiological quality of the canola seed lots, was composed of lots L1, L2, L3, and L5, while class 2, by lots L4, L6, L7, and L8, characterized by lower quality. Analysis of the NIR spectra in the raw lot data revealed distinct levels of physiological quality in the canola seeds, highlighting classes 1 (high vigor) and 2 (low vigor) (Figure 3). It is important to note that the original raw spectra correspond to the spectral data obtained directly, without the application of preprocessing methods such as baseline correction or noise smoothing (Sohn et al., 2022). These spectra reflect the original characteristics of the samples, before any adjustments to eliminate interference.
Average of spectral data per class (A) and original spectra (B) In green, high-vigor canola seeds - C1, and in purple, low-vigor seeds - C2.
In the obtained spectra, a variation in the baseline is observed across the entire spectral range, possibly associated with the physical heterogeneity of the samples. This variation can be attributed to the use of whole seeds instead of ground seeds, which influences light scattering and, consequently, the uniformity of the spectral data (Zhang et al., 2022). Using the raw data, Mukasa et al. (2019), studying the detection of viable seeds in Hinoki Cypress, observed that the spectra exhibited a uniform, wide-dispersion, and noisy characteristics, making it impossible to distinguish viable from non-viable seeds. Similar spectral behavior was observed in corn seed hybrids (Andriazzi et al., 2023). This highlights the importance of spectral preprocessing. Spectral readings may have been affected by seed size disparities between canola seed lots, as well as differences in seed coat thickness. Seed samples have the potential to influence light scattering in NIR spectroscopy. Therefore, preprocessing methods are necessary to reduce the physical effects of sample morphology on the spectra (Souza et al., 2023).
The analytical information contained in NIR spectra is complex and difficult to interpret due to their multivariate nature. This makes it challenging to distinguish small spectral differences between samples. Thus, to extract valuable information from extensive spectral data sets quickly and effectively, the use of advanced chemometric algorithms is essential. These algorithms preprocess the data (Xia et al., 2019).
Selecting an optimal preprocessing method is difficult, as multiple different mathematical transformations are used, and different preprocessing methods yield different prediction results (Sohn et al., 2021). Therefore, preprocessing the spectral data is necessary to remove any irrelevant information and improve the performance of the calibration model (Xia et al., 2019). The spectra were subjected to different types of preprocessing, and using the PLS-DA method, classification models for two levels of seed physiological quality were obtained: class 1 (high vigor) and class 2 (low vigor) (Table 2).
Preprocessing methods are applied to reduce variability caused by dispersion between samples. These methods can include Multiplicative Scatter Correction (MSC) or Standard Normal Variate (SNV), as well as derivative methods such as Savitzky-Golay, to produce first and second derivatives for smoothing the signal of the spectra, which in turn reduces the overlap and oscillation of absorbance line peaks (Ambrose et al., 2016a).
Adequate preprocessing for building the calibration model was determined based on the predictive ability of the physiological quality classes of the seeds in the lots, according to accuracy and kappa coefficient values in relation to the models obtained for each type of preprocessing (Table 2).
The classification model obtained using the original spectra, without any pretreatment, and the preprocessed data demonstrated a high accuracy of 100% and a kappa coefficient of 1.0 for training. The test for the original data presented lower accuracy (92%) and kappa (0.8571). After preprocessing the data in the test, the SNV, MSC, and 1st derivative of SG methods presented similar accuracy (96%) and kappa (0.92) metrics. The metrics in the 2nd derivative of SG were lower for both the accuracy test (57%) and kappa (0.1429) (Table 2). The kappa coefficient is used as a model evaluation index; the higher the kappa value, the better the classification result and the more stable the model (Zhang et al., 2023). The kappa index obtained in the SNV, MSC, and 1st Derivative of SG methods can be classified as almost perfect (0.81-1.00) according to the classification proposed by Landis and Koch (1977). The evaluation of the mathematical model indicates the accuracy, precision, precision, and robustness of the model’s predictions (Kohn et al., 2002). Accuracy and the Kappa index were successfully used to evaluate the efficiency of the best-fitting models in classifying cassava seeds according to physiological quality (Sousa et al., 2023) and Urochloa decumbens (Souza et al., 2023). When using the models with SNV, MSC, and first-derivative SG preprocessing, a better separation of classes C1 and C2 was observed. Therefore, the SNV model was chosen due to its simplicity and data dispersion correction characteristics, in addition to the possibility of clear visualization in the PLS-DA score graph (Figure 4). PLS-DA enables the identification and discriminative selection of variables (Brereton and Lloyd, 2014). The results obtained for the PLS-DA classification models, based on NIR spectra, are in line with studies that applied spectroscopy techniques for seed classification (Venkatesan et al., 2020). This classification is effective in modeling highly correlated data (Zhang et al., 2022). PLS-DA allows for discriminative variable identification and/or selection (Brereton and Lloyd, 2014).
Score plot of partial least squares discriminant analysis (PLS-DA) with the two first discriminant components of high vigor canola seeds - C1, and low vigor seeds - C2, based on FT-NIR spectra with SNV preprocessing data.
The use of a combined approach of a Savitzky-Golay smoothing algorithm and SNV corrections is relevant for optimizing NIRS data for canola, corn, and sorghum stems (Goñi et al., 2024). For the quantification of oil, protein, and glucosinolate content in three rapeseed cultivars, the best results were obtained with Savitzky-Golay preprocessing (Petisco et al., 2010). The SNV method performed best in discriminating between soybean seed lots (Larios et al., 2020) and quantifying oil content in canola seeds (Santiago et al., 2023).
Nine varieties of B. juncea (27 samples), nine varieties of B. rapa (27 samples), and one of B. napus (3 samples) provided an estimate of seed oil content with 100% performance compared to the PLS-DA model using NIR spectrometer data. Santiago et al. (2023) found similar results in canola seeds, with an R2 of 0.86 for predicting oil content. PLS-DA was also used to determine the concentration of water-soluble carbohydrates (Goñi et al., 2024) and the tocopherol content in canola (Xu et al., 2019).
In addition to accuracy and kappa, the confusion matrix is a widely used tool for evaluating the performance of machine learning classification models. The matrix displays the distribution of seeds according to their current and predicted classes and indicates the quality and efficiency of the model (Figure 5). The confusion matrix of the data after preprocessing using the SNV method indicates that errors were primarily associated with C1 - high vigor.
Confusion matrix of the classification of canola seed lots into vigor levels C1 - high vigor and C2 - low vigor, using the PLS-DA model with original spectra and after preprocessing with SNV, MSC, and 1st derivative of SG.
The wavelengths that contributed most, according to the models that fitted the seed classifications at different quality levels, are described in Figure 4. The peaks of importance in the wavelength bands correspond to functional groups that contributed to the qualitative classification of the lots in terms of quality (Zhang et al., 2020).
It is noted that the 20 most important wavelengths, with importance values ranging from 68% to 100%, are located in the spectral bands from 1,004 nm to 1,064 nm and from 1,698 nm to 1,907 nm (Figure 6). Significant spectral differences were observed between viable and non-viable seeds in the bands between 1,000 nm and 1,900 nm (Xia et al., 2019). According to Wang et al. (2021), wavelengths between 500 nm and 1,100 nm or 1,000 nm and 1,850 nm have biological significance for detecting seed vigor.
Importance of wavelength variables for classification via PLS-DA of vigor levels of canola seeds.
Absorption bands in the NIR region are associated with functional groups related to water, proteins, oil, and carbohydrates, are generally broad, and overlap in various parts of the spectrum (Soares et al., 2024). The most important wavelengths with peaks in the 1,060 nm to 1,195 nm range are related to carbohydrates and 1,100 nm to 1,185 nm to protein structure, while the spectral range between 930 nm to 1,390 nm refers to oil content (Shenk et al., 2007; Al-Amery et al., 2018; Medeiros et al., 2022), as do the peaks between 1,700 nm and 1,800 nm (Guinda et al., 2015). NIR spectra contain bands at longer wavelengths, resulting from overlapping absorptions that correspond to vibrational combinations of chemical bonds, such as C-H, O-H, and N-H (Xia et al., 2019). In Brassica napus seeds, Xu et al. (2019) correlated the absorbance of 1,722 nm with the C-H bond, while the 1,908 nm band was attributed to the O-H stretching associated with the lipid. In the same species, the 1,725 nm peak is related to C-H stretching vibrations, and the 1,940 nm band corresponds to the stretching vibration or the combination of O-H deformation, attributable to the main macronutrients present in the seeds, such as water, oil, and fiber (Medeiros et al., 2022).
In high-quality seeds, greater spectral stability is observed, with well-defined bands and consistent intensity, suggesting intact reserves and preserved cell integrity. In contrast, low-quality seeds show biochemical and structural alterations that reflect physiological deterioration. Lipid degradation, noticeable at 1,722-1,725 nm (C-H), evidences oxidative processes that undermine vigour, while changes in carbohydrate bands (1,060-1,195 nm) indicate the consumption of energy reserves, and variations in water-associated wavelengths (1,908-1,940 nm, O-H) suggest cell disorganisation and instability in water status (Silva et al., 2023). These spectral changes are closely related to modifications in seed chemical composition, particularly in lipids and proteins, which are critical for membrane integrity and metabolic activity (Venkatesan et al., 2020). Proteins, essential for the growth and development of the embryonic axis and the establishment of seedlings in the field (Erbaş et al., 2016), undergo denaturation, reduced synthesis, and degradation during seed deterioration, processes that contribute significantly to the loss of seed viability (Pinheiro et al., 2023). Thus, the bands attributed to lipids, proteins, carbohydrates, and water are strongly associated with the identification of low-vigor seeds.
An absorption peak at 1,002 nm is associated with amorphous sucrose (Ozaki et al., 2016). The region between 1,880 nm and 1,930 nm is associated with combined O-H vibration bands (Hourant et al., 2000). There is also a combination of O-H stretching vibration and the third C-O overtone associated with cellulose with absorption at 1,820 nm (Workman and Weyer, 2007). The absorption peak at 1,860 nm is reported by the combination of asymmetric N-H stretching with amide II, related to protein molecules (Kusumaningrum et al., 2018). The broad wavelength range of 350-2,500 nm is associated with oleic acid (Guinda et al., 2015).
Inviable Hinoki cypress seeds exhibited absorption in the 1,000-1,500 nm and 2,300-2,500 nm ranges, while viable seeds showed higher absorption values in the 1,700-1,910 nm wavelength region. The discrepancies in the observed average spectral patterns may originate from divergences in the chemical components of the seeds (Mukasa et al., 2019).
Although identifying specific chemical compounds in seeds is quite difficult due to the overlap of spectral bands that may be associated with different compounds (Kusumaningrum et al., 2018), regions of the electromagnetic spectrum have been identified that are most important for distinguishing vigor levels in canola seeds, as demonstrated by high accuracy and kappa coefficients suitable for pre-processing (Table 2).
Near-infrared (NIR) spectroscopy is a valuable tool for seed lot classification, enabling rapid quality assessment, particularly for rapid decision-making regarding seed lot disposal or disposal, saving time and resources in the seed industry. Furthermore, it plays a fundamental role in genetic improvement programs and germplasm preservation, providing a non-destructive approach to predicting seed quality (Venkatesan et al., 2020; Medeiros et al., 2020).
CONCLUSIONS
The analysis of near-infrared (NIR) spectra, subjected to preprocessing based on derivative methods and scatter correction, demonstrated high efficiency in detecting variations in the physiological quality levels of canola seeds.
ACKNOWLEDGMENTS
The authors are grateful to the Universidade Federal dos Vales do Jequitinhonha e Mucuri (UFVJM) and the Universidade Federal de Viçosa (UFV). The Celena Alimentos S/A Company, theUniversidade Federal de Lavras (UFLA), and EMBRAPA (Agroenergia) for providing the seeds. This study was funded in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - BRAZIL (CAPES) - (Funding Code: 001), the Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq), the Fundação de Amparo à Pesquisa do Estado de Minas Gerais (FAPEMIG) and Financiadora de Estudos e Projetos (FINEP).
REFERENCES
-
AL-AMERY, M.; GENEVE, R.L.; SANCHES, M.F.; ARMSTRONG, P.R.; MAGHIRANG, E.B., LEE, C.; HILDEBRAND, D.F. Near-infrared spectroscopy used to predict soybean seed germination and vigour.Seed Science Research, v.28, n.3, p.245-252, 2018. http://dx.doi.org/10.1017/S0960258518000119
» https://doi.org/http://dx.doi.org/10.1017/S0960258518000119 -
AMBROSE, A.; LOHUMI, S.; LEE, W.H.; CHO, B.K. Comparative nondestructive measurement of corn seed viability using Fourier transform near-infrared (FT-NIR) and Raman spectroscopy.Sensors and Actuators B: Chemical, v.224, p.500-506, 2016a. http://dx.doi.org/10.1016/j.snb.2015.10.082
» https://doi.org/http://dx.doi.org/10.1016/j.snb.2015.10.082 -
ANDRIAZZI, C.V.G.; ROCHA, D.K.; CUSTÓDIO, C.C. Determination of the physiological quality of corn seeds by infrared equipment.Journal of Seed Science, v.45, e202345002, 2023. http://dx.doi.org/10.1590/2317-1545v45265346
» https://doi.org/http://dx.doi.org/10.1590/2317-1545v45265346 -
BARKER. M.; RAYENS. W. Partial least squares for discrimination. Journal of Chemometrics, v.17, n.3, p.166-173, 2003. http://dx.doi.org/10.1002/cem.785
» https://doi.org/http://dx.doi.org/10.1002/cem.785 -
BRASIL. Ministério da Agricultura, Pecuária e Abastecimento (MAPA). Regras para Análise de Sementes Ministério da Agricultura, Pecuária e Abastecimento. Secretaria de Defesa Agropecuária. Brasília: MAPA/ACS, 2009. 399p. https://www.gov.br/agricultura/pt-br/assuntos/insumos-agropecuarios/arquivos-publicacoes-insumos/2946_regras_analise__sementes.pdf
» https://www.gov.br/agricultura/pt-br/assuntos/insumos-agropecuarios/arquivos-publicacoes-insumos/2946_regras_analise__sementes.pdf -
BRERETON, R.G.; LLOYD, G.R. Partial least squares discriminant analysis: taking the magic away.Journal of Chemometrics , v.28, n.4, p.213-225, 2014. http://dx.doi.org/10.1002/cem.2609
» https://doi.org/http://dx.doi.org/10.1002/cem.2609 -
COHEN, J. A coefficient of agreement for nominal scales. Educational and Psychological Measurement, v.20, n.1, p.37-46, 1960. http://dx.doi.org/10.1177/001316446002000104
» https://doi.org/http://dx.doi.org/10.1177/001316446002000104 -
ERBAŞ, S.; TONGUÇ, M.; ŞANLI, A. Mobilization of seed reserves during germination and early seedling growth of two sunflower cultivars.Journal of Applied Botany and Food Quality, v.89, 2016. http://dx.doi.org/10.5073/JABFQ.2016.089.028
» https://doi.org/http://dx.doi.org/10.5073/JABFQ.2016.089.028 -
FAN, Y.; MA, S.; WU, T. Individual wheat kernels vigor assessment based on nir spectroscopy coupled with machine learning methodologies. Infrared Physics and Technology, v.105, p.1-7, 2020. http://dx.doi.org/10.1016/j.infrared.2020.103213
» https://doi.org/http://dx.doi.org/10.1016/j.infrared.2020.103213 -
GOÑI, A.M.; FERNÁNDEZ, J.A.; DEMARCO, P.A.; SECCHI, M. ;A. ; CARCEDO, A.J.; CIAMPITTI, I.A. Determination of water-soluble carbohydrates by near-infrared spectroscopy for canola, maize, and sorghum stem fractions.Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, v. 304, p.123320, 2024. http://dx.doi.org/10.1016/j.saa.2023.123320.
» https://doi.org/http://dx.doi.org/10.1016/j.saa.2023.123320 -
GUINDA, Á.; RADA, M.; BENAISSA, M.; OURRACH, I.; CAYUELA, J.A. Controlling argan seed quality by NIR.Journal of the American Oil Chemists’ Society, v.92, n.8, p.1143-1151, 2015. http://dx.doi.org/10.1007/s11746-015-2674-8
» https://doi.org/http://dx.doi.org/10.1007/s11746-015-2674-8 -
GULARTE, J.A.; MACEDO, V.G.K.; PANOZZO, L. E. Canola seed production and market in Brazil.Applied Research & Agrotechnology, v.13, p.5834-1-9, 2020. https://revistas.unicentro.br/index.php/repaa/article/viewFile/5834/4613
» https://revistas.unicentro.br/index.php/repaa/article/viewFile/5834/4613 - HILLS. A.E. Spectroscopy in biotechnology research and development. Encyclopedia of Spectroscopy and Spectrometry p.198-202, 2017.
- HOURANT, P.; BAETEN, V.; MORALES, M.T.; MEURENS, M.; APARICIO, R. Oil and fat classification by selected bands of near-infrared spectroscopy.Applied spectroscopy, v. 54, n.8, p.1168-1174, 2000.
- INTERNATIONAL SEED TESTING ASSOCIATION - ISTA. International Rules for Seed Testing Bassersdorf, Switzerland: ISTA, 2023.
- KAEFER, J.E.; TAVIN, A.; NOZAKI, M.H.; RICHART, A.; KUHN, V. Avaliação e parametros para a realização do teste de condutividade elétrica em sementes de canola (Brassica napus L. var oleífera). Journal of Agronomic Sciences, v.3, n.1, p. 158-167, 2014.
-
KOHN, R.A.; KALSCHEUR, K.F.;RUSSEK-COHEN, E. Evaluation of models to estimate urinary nitrogen and expected milk urea nitrogen.Journal of Dairy Science v.85, n.1, p.227-233, 2002. http://dx.doi.org/10.3168/jds.S0022-0302(02)74071-X.
» https://doi.org/http://dx.doi.org/10.3168/jds.S0022-0302(02)74071-X -
KUSUMANINGRUM, D.; LEE, H.; LOHUMI, S.; MO, C.; KIM, M.S.; CHO, B.K. Non-destructive technique for determining the viability of soybean (Glycine max) seeds using FT-NIR spectroscopy.Journal of the Science of Food and Agriculture, v. 98, n. 5, p. 1734-1742, 2018. http://dx.doi.org/10.1002/jsfa.8646
» https://doi.org/http://dx.doi.org/10.1002/jsfa.8646 - LANDIS, J.R.; KOCH, G.G. The measurement of observer agreement for categorical data.Biometrics, p. 159-174, 1977.
-
LARIOS, G.; NICOLODELLI, G.; RIBEIRO, M.; CANASSA, T.; REIS, A.R., OLIVEIRA, S.L.; CENA, C. Soybean seed vigor discrimination by using infrared spectroscopy and machine learning algorithms.Analytical Methods, v. 12, n. 35, p. 4303-4309, 2020. http://dx.doi.org/10.1039/D0AY01238F
» https://doi.org/http://dx.doi.org/10.1039/D0AY01238F - MAGUIRE. J.D. Speed of germination-Aid in selection and evaluation for seedling emergence and vigor. Crop Science, v. 2, p.176-177, 1962.
- MARCOS-FILHO, J.Fisiologia de sementes de plantas cultivadas Londrina: ABRATES, 2015. 660p.
-
MEDEIROS, A.D.D.; SILVA, L.J.D.; RIBEIRO, J.P.O.; FERREIRA, K.C., ROSAS, J. T. F., SANTOS, A. A.; SILVA, C. B. D. Machine learning for seed quality classification: An advanced approach using merger data from FT-NIR spectroscopy and X-ray imaging.Sensors, v.20, n.15, p.4319, 2020. http://dx.doi.org/10.3390/s20154319
» https://doi.org/http://dx.doi.org/10.3390/s20154319 -
MEDEIROS, M.L.S.; CRUZ-TIRADO, J.P.; LIMA, A.F.; DE SOUZA NETTO, J.M.; RIBEIRO, A.P.B.; BASSEGIO, D.; BARBIN, D.F. Assessment oil composition and species discrimination of Brassicas seeds based on hyperspectral imaging and portable near infrared (NIR) spectroscopy tools and chemometrics.Journal of Food Composition and Analysis, v.107, p.104403, 2022. http://dx.doi.org/10.1016/j.jfca.2022.104403
» https://doi.org/http://dx.doi.org/10.1016/j.jfca.2022.104403 -
MUKASA, P.; WAKHOLI, C.; MO, C.; OH, M.; JOO, H.J.; SUH, H.K.; CHO, B.K. Determination of viability of Retinispora (Hinoki cypress) seeds using FT-NIR spectroscopy.Infrared Physics & Technology, v.98, p.62-68, 2019. http://dx.doi.org/10.1016/j.infrared.2019.02.008
» https://doi.org/http://dx.doi.org/10.1016/j.infrared.2019.02.008 - OZAKI, Y.; MCCLURE, W.F.; CHRISTY, A.A. Near-infrared spectroscopy in food science and technology New Jersey: Wiley-Interscience, p.406, 2006.
-
PETISCO, C.; GARCÍA-CRIADO, B.; VÁZQUEZ-DE-ALDANA, B.R.; DE HARO, A.; GARCÍA-CIUDAD, A. Measurement of quality parameters in intact seeds of Brassica species using visible and near-infrared spectroscopy.Industrial Crops and Products, v.32, n.2, p. 139-146, 2010. http://dx.doi.org/10.1016/j.indcrop.2010.04.003
» https://doi.org/http://dx.doi.org/10.1016/j.indcrop.2010.04.003 - QIU, G.; LÜ, E.; WANG, N.; LU, H.; WANG, F.; ZENG, F. Cultivar classification of single sweet corn seed using fourier transform near-infrared spectroscopy combined with discriminant analysis. Applied Sciences, v.9, n.8, p.1530, 2019.
- R CORE TEAM. R: A Language and Environment for Statistical Computing R Foundation for Statistical Computing. 2022.
-
RABOANATAHIRY, N.; LI, H.; YU, L.; LI, M. Rapeseed (Brassica napus): processing. Utilization. And genetic improvement. Agronomy, v.11, n.9, p.1776, 2021. https://doi.org/10.3390/agronomy11091776
» https://doi.org/https://doi.org/10.3390/agronomy11091776 -
SARMENTO, H.G.S.; DE SOUZA DAVID, A.M.S.; BARBOSA, M.G.; NOBRE, D.A.C.; AMARO, H.T.R. Determinação do teor de água em sementes de milho, feijão e pinhão-manso por métodos alternativos.Energia na Agricultura, v.30, n.3, p.250-256, 2015. http://dx.doi.org/10.17224/EnergAgric.2015v30n3p250-256
» https://doi.org/http://dx.doi.org/10.17224/EnergAgric.2015v30n3p250-256 -
SANTIAGO, A.C.; PIMENTEL, G.V.; BRUZI, A.T.; MARTINS, I.A.; HEIN, P.R.G.; LIMA, M.D. R.; PEREIRA, D.R. Path analysis and near-infrared spectroscopy in canola crop.Ciência Rural, v.53, e20220071, 2023. http://dx.doi.org/10.1590/0103-8478cr20220071
» https://doi.org/http://dx.doi.org/10.1590/0103-8478cr20220071 -
SILVA, L.J.D.; MEDEIROS, A.D.D.; OLIVEIRA, A.M.S. SeedCalc, a new automated R software tool for germination and seedling length data processing.Journal of Seed Science , v.41, p.250-257, 2019. http://dx.doi.org/10.1590/2317-1545v42n2217267
» https://doi.org/http://dx.doi.org/10.1590/2317-1545v42n2217267 -
SILVA, M.F.D.; ROQUE, J.V.; SOARES, J.M.; MOURA, L.D.O.; MEDEIROS, A.D.D.; SILVA, F.L.D.; SILVA, L.J.D. Near infrared spectroscopy for the classification of vigor in soybean seeds. Revista Ciência Agronômica, v. 54, p. n.n., 2023. http://dx.doi.org/10.5935/1806-6690.20240005
» https://doi.org/http://dx.doi.org/10.5935/1806-6690.20240005 - SHENK, J.S.; WORKMAN JR, J.J.; WESTERHAUS, M.O. Application of NIR spectroscopy to agricultural products. InHandbook of near-infrared analysis, p. 365-404, 2007.
-
SHETTY, N.; MIN, T G.; GISLUM, R.; OLESEN, M.H.; BOELT, B. Optimal sample size for predicting viability of cabbage and radish seeds based on near infrared spectra of single seeds.Journal of Near Infrared Spectroscopy, v. 19, n.6, p.451-461, 2011. http://dx.doi.org/10.1255/JNIRS.966
» https://doi.org/http://dx.doi.org/10.1255/JNIRS.966 -
SHETTY, N.; OLESEN, M.H.; GISLUM, R.; DELEURAN, L.C.; BOELT, B. Use of partial least squares discriminant analysis on visible-near infrared multispectral image data to examine germination ability and germ length in spinach seeds.Journal of Chemometrics , v.26, n.8-9, p.462-466, 2012. http://dx.doi.org/10.1002/cem.1415
» https://doi.org/http://dx.doi.org/10.1002/cem.1415 -
SOARES, J.M.; NORONHA, B.G.; SILVA, M.F.; PINHEIRO, D.T.; DIAS, D.C.F.; SILVA, L.J. Near-infrared spectral evaluation of physiological potential, biochemical composition and enzymatic activity of soybean seeds. Journal of Seed Science , v.46, e202446037. 2024. http://dx.doi.org/10.1590/2317-1545v46291222
» https://doi.org/http://dx.doi.org/10.1590/2317-1545v46291222 -
SOHN, S.I.; PANDIAN, S.; OH, Y.J.; ZAUKUU, J.L.Z.; KANG, H.J.; RYU, T.H.; CHO, B.K. An overview of near infrared spectroscopy and its applications in the detection of genetically modified organisms.International Journal of Molecular Sciences, v.22, n.18, 9940, 2022. http://dx.doi.org/10.3390/ijms22189940
» https://doi.org/http://dx.doi.org/10.3390/ijms22189940 -
SOHN, S.I.; PANDIAN, S.; ZAUKUU, J.L.Z.; OH, Y.J.; PARK, S.Y.; NA, C.S.; CHO, Y.S. Discrimination of transgenic canola (Brassica napus L.) and their hybrids with B. rapa using Vis-NIR spectroscopy and machine learning methods.International Journal of Molecular Sciences , v.23, n.1, p.220, 2021. http://dx.doi.org/10.3390/ijms23010220
» https://doi.org/http://dx.doi.org/10.3390/ijms23010220 -
SOUZA, L.R.D.; LIMÃO, M.A.R.; PINHEIRO, D.T.; PERIS, G.C.D.O.; DIAS, D.C.F.D.S.; DIAS, L.A.D.S. Near infrared spectroscopy and seedling image analysis to evaluate the physiological potential of Urochloa decumbens (Stapf) RD Webster seeds.Journal of Seed Science , v.45, p. e202345032, 2023. http://dx.doi.org/10.1590/2317-1545v45277021
» https://doi.org/http://dx.doi.org/10.1590/2317-1545v45277021 -
SOUSA, M.B.E.; FILHO, J.S.S.; DE ANDRADE, L.R.B.; DE OLIVEIRA, E.J. Near-infrared spectroscopy for early selection of waxy cassava clones via seed analysis.Frontiers in Plant Science, v.14, p.1089759, 2023. http://dx.doi.org/10.3389/fpls.2023.1089759
» https://doi.org/http://dx.doi.org/10.3389/fpls.2023.1089759 -
SZULC, J.; GOZDECKA, G.; POĆWIARDOWSKI, W. The application of NIR spectroscopy in moisture determining of vegetable seeds. Czech Journal of Food Sciences, v. 38, n. 2, p. 131-136, 2020. http://dx.doi.org/10.17221/57/2019-CJFS
» https://doi.org/http://dx.doi.org/10.17221/57/2019-CJFS -
PINHEIRO, D.T.; DIAS, D.C.F.D.S.; SILVA, L.J.D.; MARTINS, M.S.; FINGER, F.L. Oxidative stress, protein metabolism, and physiological potential of soybean seeds under weathering deterioration in the pre-harvest phase.Acta Scientiarum. Agronomy, v.45, e56910, 2023. http://dx.doi.org/10.4025/actasciagron.v45i1.56910
» https://doi.org/http://dx.doi.org/10.4025/actasciagron.v45i1.56910 - USDA. United States Department of Agriculture. Oil Crops Outlook: June 2023 OCS-23f. U.S. Department of Agriculture. Economic Research Service, 2023.
-
VENKATESAN, S.; MASILAMANI, P.; JANAKI, P.; EEVERA, T.; SUNDARESWARAN, S.; RAJKUMAR, P. Role of near-infrared spectroscopy in seed quality evaluation: A review.Agricultural Reviews, v.41, n.2, p.106-115, 2020. http://dx.doi.org/10.18805/ag.R-1960
» https://doi.org/http://dx.doi.org/10.18805/ag.R-1960 -
WANG, Y.; PENG, Y.; QIAO, X.; ZHUANG, Q. Discriminant analysis and comparison of corn seed vigor based on multiband spectrum.Computers and Electronics in Agriculture, v.190, p.106444, 2021. http://dx.doi.org/10.1016/j.compag.2021.106444
» https://doi.org/http://dx.doi.org/10.1016/j.compag.2021.106444 -
WORKMAN J.R.J.; WEYER, L. Practical guide to interpretive Near-infrared spectroscopy 2007, http://dx.doi.org/10.1201/9781420018318
» https://doi.org/http://dx.doi.org/10.1201/9781420018318 -
XIA, Y.; XU, Y.; LI, J.; ZHANG, C.; FAN, S. Recent advances in emerging techniques for non-destructive detection of seed viability: A review.Artificial Intelligence in Agriculture, v.1, p.35-47, 2019. http://dx.doi.org/10.1016/j.aiia.2019.05.001
» https://doi.org/http://dx.doi.org/10.1016/j.aiia.2019.05.001 -
XU, J.; NWAFOR, C.C.; SHAH, N.; ZHOU, Y.; ZHANG, C. Identification of genetic variation in Brassica napus seeds for tocopherol content and composition using near-infrared spectroscopy technique.Plant breeding, v.138, n.5, p.624-634, 2019. http://dx.doi.org/10.1111/pbr.12708.
» https://doi.org/http://dx.doi.org/10.1111/pbr.12708 -
YULDASHEVA. Z. Growth phases of autumn rapeseed effect of seedling thickness. In: E3S Web of Conferences. EDP Sciences, p.03083, 2023. http://dx.doi.org/10.1051/e3sconf/202338903083
» https://doi.org/http://dx.doi.org/10.1051/e3sconf/202338903083 -
ZHANG, B.; HU, S.; LI, M. Comparative study of multiple machine learning algorithms for risk level prediction in goaf. Heliyon, v.9, e19092, 2023. http://doi.org/10.1016/j.heliyon.2023.e19092
» https://doi.org/http://doi.org/10.1016/j.heliyon.2023.e19092 -
ZHANG, D.; WANG, Q.; LIN, F.; WENG, S.; LEI, Y.; CHEN, G.; ZHENG, L. New spectral classification index for rapid identification of Fusarium infection in wheat kernel.Food Analytical Methods, v.13, p.2165-2175, 2020. http://dx.doi.org/10.1007/s12161-020-01829-w
» https://doi.org/http://dx.doi.org/10.1007/s12161-020-01829-w -
ZHANG, S.; LIU, S.; SHEN, L.; CHEN, S.; HE, L.; LIU, A. Application of near-infrared spectroscopy for the nondestructive analysis of wheat flour: A review.Current Research in Food Science, v.5, p.1305-1312, 2022. http://dx.doi.org/10.1016/j.crfs.2022.08.006
» https://doi.org/http://dx.doi.org/10.1016/j.crfs.2022.08.006
Data availability
Additional data will be made available by the authors upon reasonable request.
Publication Dates
-
Publication in this collection
10 Nov 2025 -
Date of issue
2025
History
-
Received
14 Aug 2025 -
Accepted
26 Sept 2025












