Abstract
Peanut storage time affected the quality of peanut seed sowing and germination and also affected the taste of edible peanuts. With the increase of peanut storage time, the total amount of water and amino acids decreased, and peanuts appeared moldy. The artificial judgment of peanut storage time mostly relied on visual classification to evaluate the color, which leads to large differences in color classifications between observers. This research was conducted to determine the fresh state of peanuts during storage based on the hyperspectral imaging (HSI) technology, and to identify the storage time of peanuts through hyperspectral images (387~1035 nm). Three models, two preprocessing methods, and two feature band extraction methods were combined. The experimental results shows that the DTMFCatboost model was the best method to detect the storage time of peanuts, and its accuracy of identifying the storage time of peanuts was 97.53%. Studies have shown that HSI has great potential in classifying the freshness and identification of peanuts, and provides a basis for nondestructive testing classification as well as grading of peanuts during storage.
Keywords:
hyperspectral; freshness; nondestructive testing techniques; feature selection; regression model
1 Introduction
Peanuts are one of the most important economic crops in the world. Peanuts are grown in the world, and China accounts for 40%. At the same time, peanuts have important economic and nutritional value. They can be used to produce peanut butter, peanut oil, desserts or be eaten directly. During the past 20 years, Chinese peanut export volume has been among the tops in the world, accounting for more than 25% of the international market share, and has certain competitive advantages in the international market. Peanuts have high nutritional and commercial value. Scientific research shows that peanuts have anticancer, antioxidant (Yu et al., 2021aYu, H.D., Qing, L.W., Yan, D.T., Xia, G., Zhang, C., Yun, Y.H., & Zhang, W. (2021a). Hyperspectral imaging in combination with data fusion for rapid evaluation of tilapia fillet freshness. Food Chemistry, 348, 129129. http://dx.doi.org/10.1016/j.foodchem.2021.129129. PMid:33515952.
http://dx.doi.org/10.1016/j.foodchem.202...
), antiinflammatory (Ravikanth et al., 2017Ravikanth, L., Jayas, D. S., White, N. D. G., Fields, P. G., & Sun, D.W. (2017). Extraction of spectral information from hyperspectral data and application of hyperspectral imaging for food and agricultural products. Food and Bioprocess Technology, 10(1), 133. http://dx.doi.org/10.1007/s1194701618178.
http://dx.doi.org/10.1007/s11947016181...
) and other biological properties. Storage time (Zhang et al., 2022Zhang, L., Zeng, J., Gao, H., Zhang, K., & Wang, M. (2022). Effects of different frozen storage conditions on the functional properties of wheat gluten protein in nonfermented dough. Food Science and Technology, 42, e97821. http://dx.doi.org/10.1590/fst.97821.
http://dx.doi.org/10.1590/fst.97821...
) has a significant impact on the sensory quality, vitality, aflatoxin infiltration and toxin production of peanuts. As the storage time increases, quality deterioration such as mildew and rancidity are prone to occur, thereby reducing its sensory quality and commodity value. The freshness (Şengör et al., 2019Şengör, G. F. Ü., Balaban, M. O., Topaloğlu, B., Ayvaz, Z., Ceylan, Z., & Doğruyol, H. (2019). Color assessment by different techniques of gilthead seabream (Sparus aurata) during cold storage. Food Science and Technology, 39(3), 696703. http://dx.doi.org/10.1590/fst.02018.
http://dx.doi.org/10.1590/fst.02018...
) and safety of peanuts are closely related to the storage time, so it is very important to study the identification of peanut storage time.
Hyperspectral imaging technology has been widely used in the field of agricultural detection (Kucha et al., 2021Kucha, C. T., Liu, L., Ngadi, M., & Claude, G. (2021). Hyperspectral imaging and chemometrics as a noninvasive tool to discriminate and analyze iodine value of pork fat. Food Control, 127, 108145. http://dx.doi.org/10.1016/j.foodcont.2021.108145.
http://dx.doi.org/10.1016/j.foodcont.202...
). Zhang et al. (2020)Zhang, L., Sun, H., Rao, Z., & Ji, H. (2020). Nondestructive identification of slightly sprouted wheat kernels using hyperspectral data on both sides of wheat kernels. Biosystems Engineering, 200, 188199. http://dx.doi.org/10.1016/j.biosystemseng.2020.10.004.
http://dx.doi.org/10.1016/j.biosystemsen...
used hyperspectral data to nondestructively identify lightly sprouting wheat grains, and analyzed the modeling effect of characteristic wavelengths extracted from the reflective spectrum data in a mixed spectrum data set containing different proportions of reflective spectrum data (Zhang et al., 2020Zhang, L., Sun, H., Rao, Z., & Ji, H. (2020). Nondestructive identification of slightly sprouted wheat kernels using hyperspectral data on both sides of wheat kernels. Biosystems Engineering, 200, 188199. http://dx.doi.org/10.1016/j.biosystemseng.2020.10.004.
http://dx.doi.org/10.1016/j.biosystemsen...
). Hyperspectral technology is based on the performance of organic functional groups (O—H, C—H, N—H, S—H) inside the sample that produce transitions in the spectrum. The hyperspectral device obtains a data cube, not only images the result is that not only the spectral data of each point on the image can be obtained, but also the image information of any spectrum can be obtained. It is widely used in many fields, especially in the field of nondestructive testing of agricultural products and food. At present, scholars at home and abroad often use nearinfrared spectroscopy to establish qualitative and quantitative analysis models for agricultural products and food. Quantitative analysis is to detect the composition and content of the sample, and the species identification, fruit ripeness identification, origin traceability; storage time identification, etc. are all qualitative analysis (Selci, 2019Selci, S. (2019). The future of hyperspectral imaging. Journal of Imaging, 5(11), 84. http://dx.doi.org/10.3390/jimaging5110084. PMid:34460507.
http://dx.doi.org/10.3390/jimaging511008...
). Eshkabilov et al. (2021)Eshkabilov, S., Lee, A., Sun, X., Lee, C. W., & Simsek, H. (2021). Hyperspectral imaging techniques for rapid detection of nutrient content of hydroponically grown lettuce cultivars. Computers and Electronics in Agriculture, 181, 105968. http://dx.doi.org/10.1016/j.compag.2020.105968.
http://dx.doi.org/10.1016/j.compag.2020....
can quickly detect nutrient levels in lettuce through hyperspectral imaging technology, using partial least squares regression and principal component analysis (Eshkabilov et al., 2021Eshkabilov, S., Lee, A., Sun, X., Lee, C. W., & Simsek, H. (2021). Hyperspectral imaging techniques for rapid detection of nutrient content of hydroponically grown lettuce cultivars. Computers and Electronics in Agriculture, 181, 105968. http://dx.doi.org/10.1016/j.compag.2020.105968.
http://dx.doi.org/10.1016/j.compag.2020....
). Sun et al. (2020)Sun, J., Wang, G., Zhang, H., Xia, L., Zhao, W., Guo, Y., & Sun, X. (2020). Detection of fat content in peanut kernels based on chemometrics and hyperspectral imaging technology. Infrared Physics & Technology, 105, 103226. http://dx.doi.org/10.1016/j.infrared.2020.103226.
http://dx.doi.org/10.1016/j.infrared.202...
detected the fat content of peanut kernels based on chemometrics and hyperspectral imaging technology, among which BaselineSPAMLR achieved high accuracy (Sun et al., 2020Sun, J., Wang, G., Zhang, H., Xia, L., Zhao, W., Guo, Y., & Sun, X. (2020). Detection of fat content in peanut kernels based on chemometrics and hyperspectral imaging technology. Infrared Physics & Technology, 105, 103226. http://dx.doi.org/10.1016/j.infrared.2020.103226.
http://dx.doi.org/10.1016/j.infrared.202...
). Jin et al. (2013)Jin, N., Huang, W., Ren, Y., Luo, J., Wu, Y., Jing, Y., & Wang, D. (2013). Hyperspectral identification of cotton verticillium disease severity. Optik, 124(16), 25692573. http://dx.doi.org/10.1016/j.ijleo.2012.07.026.
http://dx.doi.org/10.1016/j.ijleo.2012.0...
used hyperspectral to identify the severity of cotton verticillium wilt, and established a back propagation (BP) neural network, genetic back propagation (GABP) neural network and support vector machine (SVM) to establish four Recognition models (Jin et al., 2013Jin, N., Huang, W., Ren, Y., Luo, J., Wu, Y., Jing, Y., & Wang, D. (2013). Hyperspectral identification of cotton verticillium disease severity. Optik, 124(16), 25692573. http://dx.doi.org/10.1016/j.ijleo.2012.07.026.
http://dx.doi.org/10.1016/j.ijleo.2012.0...
). Mesa & Chiang (2021)Mesa, A. R., & Chiang, J. Y. (2021). Multiinput deep learning model with RGB and hyperspectral imaging for banana grading. Agriculture, 11(8), 687. http://dx.doi.org/10.3390/agriculture11080687.
http://dx.doi.org/10.3390/agriculture110...
used hyperspectral imaging technology combined with RGB to classify bananas (Mesa & Chiang, 2021Mesa, A. R., & Chiang, J. Y. (2021). Multiinput deep learning model with RGB and hyperspectral imaging for banana grading. Agriculture, 11(8), 687. http://dx.doi.org/10.3390/agriculture11080687.
http://dx.doi.org/10.3390/agriculture110...
). Zou et al. (2022)Zou, Z., Chen, J., Zhou, M., Zhao, Y., Long, T., Wu, Q., & Xu, L. (2022). Prediction of peanut seed vigor based on hyperspectral images. Food Science and Technology, 42, e32822. http://dx.doi.org/10.1590/fst.32822.
http://dx.doi.org/10.1590/fst.32822...
used hyperspectral nondestructive testing technology to predict peanut seed vigor with high accuracy (Zou et al., 2022Zou, Z., Chen, J., Zhou, M., Zhao, Y., Long, T., Wu, Q., & Xu, L. (2022). Prediction of peanut seed vigor based on hyperspectral images. Food Science and Technology, 42, e32822. http://dx.doi.org/10.1590/fst.32822.
http://dx.doi.org/10.1590/fst.32822...
). Chen et al. (2022)Chen, M., Ni, Y. L., Jin, C. Q., Liu, Z., & Xu, J. S. (2022). Spectral inversion model of the crushing rate of soybean under mechanized harvesting. Food Science and Technology, 42, e123221. http://dx.doi.org/10.1590/fst.123221.
http://dx.doi.org/10.1590/fst.123221...
used hyperspectral technology combined with the inversion model of LSSVM to achieve rapid online monitoring of soybean breakage rate by combine harvesters (Chen et al., 2022Chen, M., Ni, Y. L., Jin, C. Q., Liu, Z., & Xu, J. S. (2022). Spectral inversion model of the crushing rate of soybean under mechanized harvesting. Food Science and Technology, 42, e123221. http://dx.doi.org/10.1590/fst.123221.
http://dx.doi.org/10.1590/fst.123221...
). Wang et al. (2021)Wang, X. W., Xing, X. Y., Zhao, M. C., & Yang, J. R. (2021). Comparison of multispectral modeling of physiochemical attributes of greengage: Brix and pH values. Food Science and Technology, 41(Suppl. 2), 611618. http://dx.doi.org/10.1590/fst.21320.
http://dx.doi.org/10.1590/fst.21320...
used multispectral technology to model the physical and chemical properties of green vegetables, and used a variety of machine learning algorithm models to analyze and predict Brix and pH values (Wang et al., 2021Wang, X. W., Xing, X. Y., Zhao, M. C., & Yang, J. R. (2021). Comparison of multispectral modeling of physiochemical attributes of greengage: Brix and pH values. Food Science and Technology, 41(Suppl. 2), 611618. http://dx.doi.org/10.1590/fst.21320.
http://dx.doi.org/10.1590/fst.21320...
).
In this study, peanuts were used as experimental materials, and the spectral information of three single peanut seeds with different storage times were collected through hyperspectral imaging technology, and three models of SVM, LDA, and DT were established by combining median filter、Savitzkygolay and multivariate scattering correction pretreatment methods. Use Catboost and Lgboost to select feature bands to reduce the impact of lowweight bands on the detection results, and to screen out the best processing method. The effects of different pretreatment methods, models and characteristic band extraction methods on the accuracy of peanut freshness identification are discussed.
2 Materials and methods
2.1 Sample preparation
Peanuts are important oil crops, and the average oil content of peanut varieties is about 50%. High oil content is the main indicator of flower growth and practicality. The oil content of healthy flower breeding peanuts is usually more than 55%, and the average grain yield is 35004000 (kg ha1), which is higher than the average index of peanuts (Wang et al., 2020Wang, J., Shi, L., Liu, Y., Zhao, M., Wang, X., Qiao, L., Sui, J., Li, G., Zhu, H., & Yu, S. (2020). Development of peanut varieties with high oil content by in vitro mutagenesis and screening. Journal of Integrative Agriculture, 19(12), 29742982. http://dx.doi.org/10.1016/S20953119(20)631823.
http://dx.doi.org/10.1016/S20953119(20)...
). At the same time, as a high oleic peanut variety, it has high storage stability. The summer planting period is about 114 days, and it can be planted in large areas in most central regions of China. Therefore, the peanut variety selected for the experiment is Huayu, the most common in China, and a total of 600 peanuts are selected. Before the experiment, 600 peanut varieties were divided into 3 groups. The experiment is carried out in groups, and the moldy and damaged kernels of each variety are eliminated. 180 seeds of uniform size and fullness are selected. The first groups of 180 peanuts are immediately subjected to hyperspectral imaging detection, and 360 peanuts are selected and placed in the storage box. Keep a certain temperature (about 22 degrees Celsius) in the medium, sealed, and store in a cool and dry condition. Among them, 180 capsules were taken out for hyperspectral imaging inspection after 90 days, and the remaining 180 capsules were taken out for hyperspectral imaging inspection after 180 days. The photos taken by the hyperspectrometer during the three storage periods of peanuts are shown in Figure 1. It can be seen from Figure 1 that it is difficult to distinguish with the naked eye whether it is fresh peanuts or aged peanuts that have been stored for a long time.
Photographs taken by hyperspectral instruments of peanuts during three storage periods: (a) Fresh peanuts that have not been stored; (b) Store peanuts for 90 days; (c) Store peanuts for 180 days.
2.2 Hyperspectral imaging system
The hyperspectral image acquisition experiment uses the Imageλ “spectral image” series hyperspectrometer and spaceview software from Zhuoli Hanguang Company. The effective spectral band range is 3871035 nm, the band resolution is 2.8 nm, there are 256 bands, the imaging pixel size is 1344*1024; the measurement time of each sample is set to 10 s, the hyperspectral camera objective lens to the peanut sample loading platform The object distance is set to 170 mm, the moving speed of the electronic transport mobile platform device is set to 4.6 mm/s, the exposure time of the hyperspectral camera is set to 9 ms, and the scanning broadband area is set to 120 mm. All data were recorded by computer software to obtain a threedimensional data cube containing image information and spectral information. The hyperspectral instrument is located in an open room of 15 square meters, Before the experiment starts, the curtains in the room are drawn to block all external light, the halogen light source inside the hyperspectral instrument is turned on, and the brightness of the indoor halogen light source is adjusted, and after the hyperspectral image is not saturated and distorted, the official collection of hyperspectral data collection information begins.
The core components of the system include hyperspectral camera, unified light source, and electronically controlled mobile platform, and computer and control software. It is powered by AC220V power supply, and the lighting unit uses tungsten bromine lamps to ensure a uniform lighting environment. Computer and control software is the core of hyperspectral imaging system. The sample transfer unit is mainly driven by an electric mobile platform. The hyperspectral imager system diagram is shown in the figure. The working principle of the hyperspectral imaging system is to illuminate the sample to be tested on the electronically controlled mobile platform through the tungsten bromine lamp. The reflected light from the sample is captured by the lens spectrum camera, which can obtain twodimensional spatial images and realtime dynamic spectrum information. Specview software and a desktop computer record all the spectral data information, and finally collect the threedimensional cube data with the spectrum and spectral information. The structure diagram of the hyperspectral imaging system is shown in Figure 2.
In order to obtain a reflectance hyperspectral image, it is necessary to perform black and white correction on the original collected image. Under the same original image (R0) condition, turn on the light source device, set the background to a white panel as the white reference image (W), and the reflectance exceeds 90%. The dark reference image (D1) is obtained when the light source is turned off and the camera lens is completely covered by the opaque cover. It is used to eliminate the influence of the dark current of the camera sensor. The sample dark reference image (D2) is obtained under the same exposure time as the original image of peanuts.). The reflectance hyperspectral image (R) is calculated by the following formula (Equation 1):
Where R is the reflection hyperspectral image, R0 is the original image, W is the white reference image, D1 is the dark reference image, and D2 is the sample dark reference image. This equation is used to extract spectral information for subsequent analysis.
3 Spectral collection
Due to the uneven surface of peanuts, it is difficult to maintain the image under the same light intensity. In order to reduce the influence of light, the ROI area is used to obtain the brightness value of the reflective hyperspectral image, and then images of different colors are drawn according to the brightness value of the image. Then we calculate the average value of the image in the same color area. At the same time, a visual analysis of the content of peanuts is carried out. The image shown in the figure is a series of image processing and a visual analysis process to obtain the average spectrum of the reflected hyperspectral image. In (a) and (b) in the figure, it can be seen that the peanuts with almost no difference in appearance have obvious differences in internal quality (moisture content, starch content, fat content). (a) is the original picture of peanuts taken by hyperspectral, (b) visualizes the distribution of the same reflectance substances in peanuts, and can intuitively see the distribution of the content of peanuts inside the peanuts, (c) is region of interest (ROI) extraction picture, (d) is the average spectral reflectance of peanuts picture. The flow chart of the program for obtaining the average spectrum of the image is shown in Figure 3.
Flow chart of the procedure for obtaining the average spectrum of an image: (a) Raw hyperspectral acquisition map; (b) Hyperspectral material content distribution in specific bands; (c) Region of interest (ROI) extraction; (d) Average spectral reflectance curve.
4 Spectral preprocessing
Complex sample spectrum signals are often interfered by factors such as stray light, noise, baseline drift, etc., which affect the final qualitative and quantitative analysis results. Therefore, it is usually necessary to preprocess the original spectrum before modeling. According to the purpose of preprocessing, spectral preprocessing methods can be divided into four categories: baseline correction, scatter correction, smoothing and scale scaling. This research mainly uses multivariate scattering correction (MSC), median filter (MF), and Savitzkygolay (SG). In the original data, there is a lot of signal noise and drift in the spectral signal. All three preprocessing methods can eliminate random noise in the spectral signal and improve the signaltonoise ratio of the sample signal (Li et al., 2021Li, Y.H., Tan, X., Zhang, W., Jiao, Q.B., Xu, Y.X., Li, H., Zou, Y.B., Yang, L., & Fang, Y.P. (2021). Research and application of several key techniques in hyperspectral image preprocessing. Frontiers in Plant Science, 12, 627865. http://dx.doi.org/10.3389/fpls.2021.627865. PMid:33679841.
http://dx.doi.org/10.3389/fpls.2021.6278...
).
As shown in Figure 4, the waveform preprocessed by the median filter algorithm makes the difference between the stored peanuts and the unstored peanuts more obvious; the preprocessed waveform of the multivariate scattering correction reduces a lot of noise and sharp wave signals, and has a background correction. The effect of; the wave curve preprocessed by the Savitzkygolay algorithm is smoother.
Peanut fullband spectrum: (a) Raw data; (b) Data processed by median filter algorithm; (c) Data processed by multivariate scattering correction; (d) Data processed by Savitzkygolay algorithm.
The wavelengthreflectance spectrum curves of the three types of peanut samples (unstored, stored for 90 days, stored for 180 days) are shown in the figure. In the Figure 4(a), it can be seen that there is a peak near 390 nm for the three types of peanuts stored for a period of time, and there is a trough in the fresh peanuts that have not been stored at the same time, and there are tiny peaks near 450 nm and 550 nm at the same time; storage for 90 days Peanuts can clearly see that there is no trough at 390 nm, and there are still very small peaks near 450 nm and 550 nm; part of the waveform of peanuts stored for 180 days has been deformed, and at the same time, there is a peak near 390 nm. The curve is almost smooth; there is no peaktopeak value.
4.1 Preprocessing algorithm
Median Filter (MF)
Median filter is a nonlinear smoothing filtering technique that replaces the value of a certain point in a series of digital sequences or digital images with the median value of all values in its neighboring regions (Li et al., 2017Li, L., Ge, H., & Gao, J. (2017). A spectralspatial kernelbased method for hyperspectral imagery classification. Advances in Space Research, 59(4), 954967. http://dx.doi.org/10.1016/j.asr.2016.11.006.
http://dx.doi.org/10.1016/j.asr.2016.11....
). In the middle, the median filter algorithm is used to process the original hyperspectral data, so that the peaks in the spectral data tend to be flat, so that the curve can better perform baseline fitting at the position where the peak transitions to the smooth band. Make the baseline changes more smoothly at this position, which can effectively reduce the occurrence of underfitting. The formula for median filter is (Equation 2):
In the formula, Med [·] is the median of the input data point column. In addition, the smaller the number of windows for median filter, the smoother the fitted baseline, but too few windows will affect the accuracy of the fitted baseline.
Multivariate Scattering Correction (MSC)
The multivariate scattering correction method is a data processing method commonly used in multiwavelength calibration modeling at this stage (Zhang et al., 2019Zhang, L., Rao, Z., & Ji, H. (2019). NIR hyperspectral imaging technology combined with multivariate methods to study the residues of different concentrations of omethoate on wheat grain surface. Sensors, 19(14), 3147. http://dx.doi.org/10.3390/s19143147. PMid:31319577.
http://dx.doi.org/10.3390/s19143147...
). The spectral data obtained after scattering correction can effectively eliminate the effect of scattering and enhance the spectral absorption information related to the component content. The use of this method first requires the establishment of an “ideal spectrum” of the sample to be tested (that is, the average spectrum of all spectra), that is, the change of the spectrum and the content of the components in the sample meet a direct linear relationship, and the spectrum is required to be used as a standard for all The nearinfrared spectra of other samples were corrected, including baseline shift and offset correction. The specific algorithm process is as follows (Equations 3, 4,5):
In the above formula, A represents the n × pdimensional calibration spectrum data matrix, the number of samples for behavior calibration, p is the number of wavelength points used for spectrum collection, and $\overline{\text{A}}$ represents the average value obtained by averaging the original hyperspectra of all samples at each wavelength point The spectral vector, ${\text{A}}_{i}$ is a 1 × pdimensional matrix, which represents the spectral vector of a single sample, and ${\text{m}}_{i}$ and ${\text{b}}_{i}$ respectively represent the relative offset coefficients and translations obtained after the univariate linear regression of the hyperspectral ${\text{A}}_{i}$ and the average spectrum $\overline{\text{A}}$ of each sample.
SavitzkyGolay (SG)
Smoothing is to eliminate random noise in the spectral signal and improve the signaltonoise ratio of the sample signal (Ruffin et al., 2008Ruffin, C., King, R. L., & Younan, N. H. (2008). A combined derivative spectroscopy and SavitzkyGolay filtering method for the analysis of hyperspectral data. GIScience & Remote Sensing, 45(1), 115. http://dx.doi.org/10.2747/15481603.45.1.1.
http://dx.doi.org/10.2747/15481603.45.1...
). The Savitzkygolay (SG) smoothing method uses polynomials to decompose the data in the moving window of the original spectrum and uses least squares to fit the data. Its essence is a weighted average algorithm.
Set a set of (2M + 1) data x[n] centered on n = 0, and fit it with a polynomial (Equation 6):
The least squares fitting residual is obtained as (Equation 7):
Use the convolution operation to find the constant term of the fitting polynomial, that is, to perform a weighted average of the input data (Equation 8):
Use the convolution operation to find the constant term of the fitting polynomial, that is, to perform a weighted average of the input data:
Make the partial derivative 0 (Equations 910):
5 Modeling method
5.1 Support Vector Machine (SVM)
Support vector machine is a kind of efficient machine learning supervised model. In SVM, the goal of the optimal classification function is to maximize the minimum geometric distance from all samples to the segmented hyper plane. When the sample is linearly indivisible on the original feature, we can project the sample into another spatial dimension (usually a higher dimension) to make it currently separable. Compared with BP neural network and time series algorithm, SVM It has the advantages of few adjustment parameters, high prediction accuracy and fast speed. Yu et al. (2021b)Yu, Y., Shao, M., Jiang, L., Ke, Y., Wei, D., Zhang, D., Jiang, M., & Yang, Y. (2021b). Quantitative analysis of multiple components based on support vector machine (SVM). Optik, 237, 166759. http://dx.doi.org/10.1016/j.ijleo.2021.166759.
http://dx.doi.org/10.1016/j.ijleo.2021.1...
used support vector machine for the spectral detection of multiple components and established a prediction model suitable for water environment systems (Yu et al., 2021bYu, Y., Shao, M., Jiang, L., Ke, Y., Wei, D., Zhang, D., Jiang, M., & Yang, Y. (2021b). Quantitative analysis of multiple components based on support vector machine (SVM). Optik, 237, 166759. http://dx.doi.org/10.1016/j.ijleo.2021.166759.
http://dx.doi.org/10.1016/j.ijleo.2021.1...
). Yang et al. (2017)Yang, D., He, D., Lu, A., Ren, D., & Wang, J. (2017). Detection of the freshness state of cooked beef during storage using hyperspectral imaging. Applied Spectroscopy, 71(10), 22862301. http://dx.doi.org/10.1177/0003702817718807. PMid:28627234.
http://dx.doi.org/10.1177/00037028177188...
established partial least squares (PLS) and least squares support vector machine (LSSVM) classification models using different spectral variables to detect the fresh state of cooked beef in the storage process. Classification and classification provide the basis (Yang et al., 2017Yang, D., He, D., Lu, A., Ren, D., & Wang, J. (2017). Detection of the freshness state of cooked beef during storage using hyperspectral imaging. Applied Spectroscopy, 71(10), 22862301. http://dx.doi.org/10.1177/0003702817718807. PMid:28627234.
http://dx.doi.org/10.1177/00037028177188...
).
SVM solving the classification hyperplane problem is equivalent to solving the following equation (C, ${\epsilon}_{i}$ are the corresponding parameters, and $\theta $ is the nonlinear mapping function,${y}_{i}\in \left[\mathrm{1,1}\right]$) (Equations 1112)
Obtained by solving the saddle point of the Largerange function (Equations 1314):
Among them, K (·) is the kernel function that satisfies the Mercer condition, and the corresponding SVM discriminant function is (Equation 15):
The structure of the support vector machine is shown in Figure 5.
5.2 Linear Discriminant Analysis (LDA)
LDA is a supervised linear discriminant analysis algorithm based on Fisher criterion. The concept of LDA is: in a given training sample set, try to project the samples on a straight line so that the projection points of similar samples are as close as possible, and the projection points of heterogeneous samples are as far as possible; when classifying new samples, Project it onto the same row, and then determine the new sample type according to the position of the projection point. Xiangxiang Zheng et al. combined principal component analysis (PCA) with linear discriminant analysis (LDA) to distinguish echinococcosis patients from healthy volunteers (Zheng et al., 2021Zheng, X., Yin, L., Lv, G., Lv, X., Chen, C., & Wu, G. (2021). Serum logtransformed Raman spectroscopy combined with multivariate analysis for the detection of echinococcosis. Optik, 226(Pt 2), 165687. http://dx.doi.org/10.1016/j.ijleo.2020.165687.
http://dx.doi.org/10.1016/j.ijleo.2020.1...
). The core formula of LDA is (Equation 16):
W is a matrix composed of lowdimensional space basis vectors, where N is the number of sample categories.
5.3 Decision Tree (DT)
Decision tree is an algorithm for classifying and predicting new data by measuring historical data. Simply put, the decision tree algorithm is to find the characteristics in the data by analyzing the historical data with clear results. And use this as a basis to predict the newly generated data results (Sivakumar et al., 2018Sivakumar, S., Nayak, S. R., Vidyanandini, S., Kumar, J. A., & Palai, G. (2018). An empirical study of supervised learning methods for breast cancer diseases. Optik, 175, 105114. http://dx.doi.org/10.1016/j.ijleo.2018.08.112.
http://dx.doi.org/10.1016/j.ijleo.2018.0...
). The decision tree used in this study is a random decision tree, that is, when the training sample generates the decision tree, the features and thresholds selected by each node of each tree are randomly obtained. That is, the parent data set L of each node of the decision tree is divided into two complementary and disjoint data subsets based on randomly selected features and thresholds.
6 Fullband model analysis
Using support vector machine (SVM), linear discriminant analysis (LDA), decision tree (DT) three models combined with median filter algorithm (MF), multiple scattering correction algorithm (MSC), Savitzkygolay smoothing algorithm (SG) for peanuts The three storage periods are discriminated. It can be seen from Table 1 that the three models combined with the preprocessing methods have average discrimination accuracy, and the discrimination accuracy of the model does not reach more than 90%. In contrast, the SVMMF model has the highest discrimination accuracy, reaching 85.80%, log loss is 1618.58, the Hamming loss is 0.1419, and the fitting time is 0.068 s. This result preliminarily shows that the model and method are feasible, but the accuracy of the model is not enough, and there is still room for further improvement.
7 Feature band extraction
Hyperspectral imaging technology has the characteristics of high resolution, multiple bands, and rich information. At the same time, it also has the problems of serious spectral information overlap, more invalid information and redundant bands. These problems will affect the later modeling efficiency and Model accuracy. Therefore, screening the characteristic bands of hyperspectral information is an important prerequisite for establishing an inversion model for identifying peanut freshness (Gao et al., 2020Gao, J., Ni, J., Wang, D., Deng, L., Li, J., & Han, Z. (2020). Pixellevel aflatoxin detecting in maize based on feature selection and hyperspectral imaging. Spectrochimica Acta. Part A: Molecular and Biomolecular Spectroscopy, 234, 118269. http://dx.doi.org/10.1016/j.saa.2020.118269. PMid:32217452.
http://dx.doi.org/10.1016/j.saa.2020.118...
). This research adopts two feature band extraction algorithms, Catboost and Lgboost.
7.1 Catboost
CatBoost is a machine learning method that supports category features and is based on gradient boosting decision trees. The CatBoost algorithm selects the tree structure based on the greedy algorithm, finds all possible splitting methods, calculates the penalty function of each method, selects the smallest, and assigns the result to the leaf nodes, subsequent leaf nodes, repeat this process, before constructing a new tree Perform random rearrangement and construct a new tree according to the gradient descent direction. CatBoost uses different permutations in different gradient boost steps. The CatBoost algorithm uses the following formula to calculate the importance of feature variables (Bentéjac et al., 2021Bentéjac, C., Csörgő, A., & MartínezMuñoz, G. (2021). A comparative analysis of gradient boosting algorithms. Artificial Intelligence Review, 54(3), 19371967. http://dx.doi.org/10.1007/s10462020098965.
http://dx.doi.org/10.1007/s10462020098...
) (Equation 17).
Where $\text{}{c}_{1}$, ${c}_{2}$ are the number of nodes in the leaf node, $\text{}{v}_{1}$, ${v}_{2}$ are the value of the calculation formula in the leaf node.
7.2 Lgboost
Lgboost algorithm is a data model based on GBDT, which is an integrated learning algorithm that combines weak learners into powerful learners. The algorithm uses the regression tree as a weak learner. By using the residual of each prediction result and the target value as the next learning target, the current residual regression tree is obtained. Each tree learns the conclusions and residuals of all previous trees. The results of multiple decision trees are added together as the final prediction output. It is an efficient, highprecision and highperformance classification algorithm to use the histogram algorithm to presort the features and use the node expansion method to construct the tree (Akbilgic et al., 2021Akbilgic, O., Butler, L., Karabayir, I., Chang, P., Kitzman, D., Alonso, A., Chen, L., & Soliman, E. (2021). Artificial intelligence applied to ECG improves heart failure prediction accuracy. Journal of the American College of Cardiology, 77(18), 3045. http://dx.doi.org/10.1016/S07351097(21)044004.
http://dx.doi.org/10.1016/S07351097(21)...
).
7.3 Feature band extraction results
The original spectral wavelengths of the three types of peanut samples contain 256 characteristic variables, which are faced with problems such as multiband, large amount of information, weak information relevance, and multiple information redundancy. Therefore, the Catboost and Lgboost algorithms are used to extract feature bands, and finally the top 30 feature bands are selected. After optimizing the model parameters, n_estimators is 50, random_state is 2019, max_depth is 5, subsample is 0.8, and the learning rate is 0.1. The characteristic band weights obtained by the two characteristic band extraction methods are shown in Figure 6. The 30 characteristic bands extracted and retained by the Catboost algorithm are 387.15 nm, 389.54 nm, 394.31 nm, 396.7 nm, 399.09 nm, 401.49 nm, 403.88 nm, 406.28 nm, 411.07 nm, 413.47 nm, 437.52 nm, 456.85 nm, 490.87 nm, 495.74 nm, 507.96 nm, 517.76 nm, 544.8 nm, 574.46 nm, 591.85 nm, 614.28 nm, 631.81 nm, 864.98 nm, 960 nm, 962.66 nm, 1016.15 nm, 1024.22 nm, 1026.91 nm, 1029.6 nm, 1032.29 nm, 1034.98 nm. Among them, the wavelength with the largest extraction weight ratio is 1034.99 nm. Lactose has strong reflectivity in the wavelength of 1021 nm1037 nm (Kou et al., 2022Kou, X., Zhao, Y., Xu, L., Kang, Z., Wang, Y., Zou, Z., Huang, P., Wang, Q., Su, G., Yang, Y., & Sun, Y. (2022). Controlled fabrication of coreshell gammaFe2O3@Creduced graphene oxide composites with tunable interfacial structure for highly efficient microwave absorption. Journal of Colloid and Interface Science, 615, 685696. http://dx.doi.org/10.1016/j.jcis.2022.02.023. PMid:35168017.
http://dx.doi.org/10.1016/j.jcis.2022.02...
). This shows that the lactose content of peanuts changes greatly during storage, and starch has a strong reflection at the wavelength of 455 nm505 nm. Amino acids have a strong reflectivity between 700 nm900 nm (Zhou et al., 2020Zhou, S., Sun, L., Xing, W., Feng, G., Ji, Y., Yang, J., & Liu, S. (2020). Hyperspectral imaging of beet seed germination prediction. Infrared Physics & Technology, 108, 103363. http://dx.doi.org/10.1016/j.infrared.2020.103363.
http://dx.doi.org/10.1016/j.infrared.202...
). The three characteristic bands extracted by the Catboost algorithm are between 455 nm505 nm, and one is between 700 nm900 nm, indicating the starch content during storage. And the content of amino acids has also changed a lot. The 30 characteristic bands extracted and retained by the Lgboost algorithm are 387.15 nm, 389.54 nm, 391.92 nm, 394.31 nm, 396.7 nm, 399.09 nm, 401.49 nm, 403.88 nm, 406.28 nm, 408.67 nm, 394.31 nm, 512.86 nm, 532.49 nm, 552.2 nm, 634.31 nm, 636.82 nm, 962.66 nm, 970.65 nm, 986.67 nm, 994.7 nm, 1010.78 nm, 1013.47 nm, 1018.84 nm, 1024.22 nm, 1026.91 nm, 1034.29 nm, 1034.99 nm. Among them, the wavelength with the largest weight ratio is 396.7 nm, which is located near the spectral reflectance of starch. The prediction accuracy results of the two feature band extraction methods are shown in Figure 6.
Feature band extraction map: (a) Catboost feature band extraction; (b) Lgboost feature band extraction.
8 Model evaluation after feature band extraction
Since the fullband model has a large amount of information, is not strong in information correlation, and has a lot of information redundancy, the top 30 characteristic bands with weights are extracted for modeling, and the spectral data is subjected to median filter or multivariate scattering correction. (MSC) carried out preprocessing, and finally made the model accuracy significantly improved. It can be concluded from Table 2 that the DTCatboostMF model has the best effect, with a discrimination accuracy of 97.53%, a logarithmic loss of 962.35, and a Hamming loss of 0.0247, the fitting time is 0.2530 s. After feature band extraction and spectral data preprocessing, the model performance is significantly improved, the discrimination accuracy is increased by 10%20%, the logarithmic loss and Hamming loss are significantly reduced, and the fitting time is also significantly reduced.
The two feature band extraction methods have improved the efficiency of model identification. From the results of Log Loss, Hamming Loss, Accuracy, and Fit time, the Catboost feature band extraction algorithm is better, and the confusion matrix can clearly see the model. In the case of misjudgment, the confusion matrix results of SVMMFCatboost algorithm, LDAMFCatboost algorithm, and DTMFCatboost algorithm are plotted in Figure 7.
Confusion matrix: (a) SVMMFCatboost algorithm confusion matrix; (b) LDAMFCatboost algorithm confusion matrix; (c) DTMFCatboost algorithm confusion matrix.
9 Conclusion
This study aims to explore the feasibility of hyperspectral technology to detect peanut freshness. According to the results and drawings of this article, hyperspectral imaging technology is a fast and nondestructive detection method, which replaces the traditional timeconsuming peanut freshness detection method with high efficiency. The hyperspectral system was used to extract the spectral characteristic data of peanuts with different storage times, and a variety of preprocessing methods were used to establish multiple prediction models for the obtained data. However, the amount of data obtained was large and the redundancy was high, resulting in large calculations and inconvenient data processing in actual online detection. Therefore, multiple algorithms were used to reduce the data dimension, improve the accuracy of model classification, and extract peanut seeds. The feature bands reduced redundant information and had relatively small feature weights. The results showed that the detection performances obtained by different preprocessing methods,feature band extraction methods, and models. Among them, the DTMFCatboost model performed best in peanut storage time detection, with log loss of 962.35, Hamming loss of 0.0247, fitting time of 0.2530, and accuracy of identifying peanut storage time of 97.53%.
In summary, the feature spectrum is used to fuse texture data and the DTMFCatboost fusion algorithm could be used to quickly and nondestructively identify the freshness of peanuts, which could provide ideas and references for other studies.

Practical Application: Hyperspectral identification of peanut seed storage time.
References
 Akbilgic, O., Butler, L., Karabayir, I., Chang, P., Kitzman, D., Alonso, A., Chen, L., & Soliman, E. (2021). Artificial intelligence applied to ECG improves heart failure prediction accuracy. Journal of the American College of Cardiology, 77(18), 3045. http://dx.doi.org/10.1016/S07351097(21)044004
» http://dx.doi.org/10.1016/S07351097(21)044004  Bentéjac, C., Csörgő, A., & MartínezMuñoz, G. (2021). A comparative analysis of gradient boosting algorithms. Artificial Intelligence Review, 54(3), 19371967. http://dx.doi.org/10.1007/s10462020098965
» http://dx.doi.org/10.1007/s10462020098965  Chen, M., Ni, Y. L., Jin, C. Q., Liu, Z., & Xu, J. S. (2022). Spectral inversion model of the crushing rate of soybean under mechanized harvesting. Food Science and Technology, 42, e123221. http://dx.doi.org/10.1590/fst.123221
» http://dx.doi.org/10.1590/fst.123221  Eshkabilov, S., Lee, A., Sun, X., Lee, C. W., & Simsek, H. (2021). Hyperspectral imaging techniques for rapid detection of nutrient content of hydroponically grown lettuce cultivars. Computers and Electronics in Agriculture, 181, 105968. http://dx.doi.org/10.1016/j.compag.2020.105968
» http://dx.doi.org/10.1016/j.compag.2020.105968  Gao, J., Ni, J., Wang, D., Deng, L., Li, J., & Han, Z. (2020). Pixellevel aflatoxin detecting in maize based on feature selection and hyperspectral imaging. Spectrochimica Acta. Part A: Molecular and Biomolecular Spectroscopy, 234, 118269. http://dx.doi.org/10.1016/j.saa.2020.118269 PMid:32217452.
» http://dx.doi.org/10.1016/j.saa.2020.118269  Jin, N., Huang, W., Ren, Y., Luo, J., Wu, Y., Jing, Y., & Wang, D. (2013). Hyperspectral identification of cotton verticillium disease severity. Optik, 124(16), 25692573. http://dx.doi.org/10.1016/j.ijleo.2012.07.026
» http://dx.doi.org/10.1016/j.ijleo.2012.07.026  Kou, X., Zhao, Y., Xu, L., Kang, Z., Wang, Y., Zou, Z., Huang, P., Wang, Q., Su, G., Yang, Y., & Sun, Y. (2022). Controlled fabrication of coreshell gammaFe2O3@Creduced graphene oxide composites with tunable interfacial structure for highly efficient microwave absorption. Journal of Colloid and Interface Science, 615, 685696. http://dx.doi.org/10.1016/j.jcis.2022.02.023 PMid:35168017.
» http://dx.doi.org/10.1016/j.jcis.2022.02.023  Kucha, C. T., Liu, L., Ngadi, M., & Claude, G. (2021). Hyperspectral imaging and chemometrics as a noninvasive tool to discriminate and analyze iodine value of pork fat. Food Control, 127, 108145. http://dx.doi.org/10.1016/j.foodcont.2021.108145
» http://dx.doi.org/10.1016/j.foodcont.2021.108145  Li, L., Ge, H., & Gao, J. (2017). A spectralspatial kernelbased method for hyperspectral imagery classification. Advances in Space Research, 59(4), 954967. http://dx.doi.org/10.1016/j.asr.2016.11.006
» http://dx.doi.org/10.1016/j.asr.2016.11.006  Li, Y.H., Tan, X., Zhang, W., Jiao, Q.B., Xu, Y.X., Li, H., Zou, Y.B., Yang, L., & Fang, Y.P. (2021). Research and application of several key techniques in hyperspectral image preprocessing. Frontiers in Plant Science, 12, 627865. http://dx.doi.org/10.3389/fpls.2021.627865 PMid:33679841.
» http://dx.doi.org/10.3389/fpls.2021.627865  Mesa, A. R., & Chiang, J. Y. (2021). Multiinput deep learning model with RGB and hyperspectral imaging for banana grading. Agriculture, 11(8), 687. http://dx.doi.org/10.3390/agriculture11080687
» http://dx.doi.org/10.3390/agriculture11080687  Ravikanth, L., Jayas, D. S., White, N. D. G., Fields, P. G., & Sun, D.W. (2017). Extraction of spectral information from hyperspectral data and application of hyperspectral imaging for food and agricultural products. Food and Bioprocess Technology, 10(1), 133. http://dx.doi.org/10.1007/s1194701618178
» http://dx.doi.org/10.1007/s1194701618178  Ruffin, C., King, R. L., & Younan, N. H. (2008). A combined derivative spectroscopy and SavitzkyGolay filtering method for the analysis of hyperspectral data. GIScience & Remote Sensing, 45(1), 115. http://dx.doi.org/10.2747/15481603.45.1.1
» http://dx.doi.org/10.2747/15481603.45.1.1  Selci, S. (2019). The future of hyperspectral imaging. Journal of Imaging, 5(11), 84. http://dx.doi.org/10.3390/jimaging5110084 PMid:34460507.
» http://dx.doi.org/10.3390/jimaging5110084  Şengör, G. F. Ü., Balaban, M. O., Topaloğlu, B., Ayvaz, Z., Ceylan, Z., & Doğruyol, H. (2019). Color assessment by different techniques of gilthead seabream (Sparus aurata) during cold storage. Food Science and Technology, 39(3), 696703. http://dx.doi.org/10.1590/fst.02018
» http://dx.doi.org/10.1590/fst.02018  Sivakumar, S., Nayak, S. R., Vidyanandini, S., Kumar, J. A., & Palai, G. (2018). An empirical study of supervised learning methods for breast cancer diseases. Optik, 175, 105114. http://dx.doi.org/10.1016/j.ijleo.2018.08.112
» http://dx.doi.org/10.1016/j.ijleo.2018.08.112  Sun, J., Wang, G., Zhang, H., Xia, L., Zhao, W., Guo, Y., & Sun, X. (2020). Detection of fat content in peanut kernels based on chemometrics and hyperspectral imaging technology. Infrared Physics & Technology, 105, 103226. http://dx.doi.org/10.1016/j.infrared.2020.103226
» http://dx.doi.org/10.1016/j.infrared.2020.103226  Wang, J., Shi, L., Liu, Y., Zhao, M., Wang, X., Qiao, L., Sui, J., Li, G., Zhu, H., & Yu, S. (2020). Development of peanut varieties with high oil content by in vitro mutagenesis and screening. Journal of Integrative Agriculture, 19(12), 29742982. http://dx.doi.org/10.1016/S20953119(20)631823
» http://dx.doi.org/10.1016/S20953119(20)631823  Wang, X. W., Xing, X. Y., Zhao, M. C., & Yang, J. R. (2021). Comparison of multispectral modeling of physiochemical attributes of greengage: Brix and pH values. Food Science and Technology, 41(Suppl. 2), 611618. http://dx.doi.org/10.1590/fst.21320
» http://dx.doi.org/10.1590/fst.21320  Yang, D., He, D., Lu, A., Ren, D., & Wang, J. (2017). Detection of the freshness state of cooked beef during storage using hyperspectral imaging. Applied Spectroscopy, 71(10), 22862301. http://dx.doi.org/10.1177/0003702817718807 PMid:28627234.
» http://dx.doi.org/10.1177/0003702817718807  Yu, H.D., Qing, L.W., Yan, D.T., Xia, G., Zhang, C., Yun, Y.H., & Zhang, W. (2021a). Hyperspectral imaging in combination with data fusion for rapid evaluation of tilapia fillet freshness. Food Chemistry, 348, 129129. http://dx.doi.org/10.1016/j.foodchem.2021.129129 PMid:33515952.
» http://dx.doi.org/10.1016/j.foodchem.2021.129129  Yu, Y., Shao, M., Jiang, L., Ke, Y., Wei, D., Zhang, D., Jiang, M., & Yang, Y. (2021b). Quantitative analysis of multiple components based on support vector machine (SVM). Optik, 237, 166759. http://dx.doi.org/10.1016/j.ijleo.2021.166759
» http://dx.doi.org/10.1016/j.ijleo.2021.166759  Zhang, L., Rao, Z., & Ji, H. (2019). NIR hyperspectral imaging technology combined with multivariate methods to study the residues of different concentrations of omethoate on wheat grain surface. Sensors, 19(14), 3147. http://dx.doi.org/10.3390/s19143147 PMid:31319577.
» http://dx.doi.org/10.3390/s19143147  Zhang, L., Sun, H., Rao, Z., & Ji, H. (2020). Nondestructive identification of slightly sprouted wheat kernels using hyperspectral data on both sides of wheat kernels. Biosystems Engineering, 200, 188199. http://dx.doi.org/10.1016/j.biosystemseng.2020.10.004
» http://dx.doi.org/10.1016/j.biosystemseng.2020.10.004  Zhang, L., Zeng, J., Gao, H., Zhang, K., & Wang, M. (2022). Effects of different frozen storage conditions on the functional properties of wheat gluten protein in nonfermented dough. Food Science and Technology, 42, e97821. http://dx.doi.org/10.1590/fst.97821
» http://dx.doi.org/10.1590/fst.97821  Zheng, X., Yin, L., Lv, G., Lv, X., Chen, C., & Wu, G. (2021). Serum logtransformed Raman spectroscopy combined with multivariate analysis for the detection of echinococcosis. Optik, 226(Pt 2), 165687. http://dx.doi.org/10.1016/j.ijleo.2020.165687
» http://dx.doi.org/10.1016/j.ijleo.2020.165687  Zhou, S., Sun, L., Xing, W., Feng, G., Ji, Y., Yang, J., & Liu, S. (2020). Hyperspectral imaging of beet seed germination prediction. Infrared Physics & Technology, 108, 103363. http://dx.doi.org/10.1016/j.infrared.2020.103363
» http://dx.doi.org/10.1016/j.infrared.2020.103363  Zou, Z., Chen, J., Zhou, M., Zhao, Y., Long, T., Wu, Q., & Xu, L. (2022). Prediction of peanut seed vigor based on hyperspectral images. Food Science and Technology, 42, e32822. http://dx.doi.org/10.1590/fst.32822
» http://dx.doi.org/10.1590/fst.32822
Publication Dates

Publication in this collection
02 Sept 2022 
Date of issue
2022
History

Received
05 May 2022 
Accepted
09 July 2022