Estimation of leaf nutrient concentration from hyperspectral reflectance in Eucalyptus using partial least squares regression

Leaf hyperspectral reflectance has been used to estimate nutrient concentrations in plants in narrow bands of the electromagnetic spectrum. The aim of this study was to estimate leaf nutrient concentrations using leaf hyperspectral reflectance and verify the variable selection methods using the partial least squares regression (PLSR). Two studies were carried out using stands with Eucalyptus clones. Study I was established in Eucalyptus stands with three clones, classifying leaves into five colour patterns using the Munsell chart for plant tissues. Immediately after leaf collection, leaf reflectance was read and the chemical analysis was performed. Study II was carried out in commercial clonal stands of Eucalyptus performing the same leaf sampling and chemical analysis as used in Study I. All leaf reflectance spectra were smoothed and three more pre-processing procedures were applied. In addition, three methods of PLSR were tested. The first derivative was more accurate for predicting nitrogen (Rcv 2 = 0.95), phosphorous (Rcv 2 = 0.93), and sulphur concentration (Rcv 2 = 0.85). The estimates for concentrations of calcium (Rcv 2 = 0.81), magnesium (Rcv 2 = 0.22), and potassium (Rcv 2 = 0.76) were more accurate using the logarithm transformation. Only the estimates for iron concentrations were performed with higher accuracy (Rcv 2 = 0.35) using the smoothed reflectance. The copper concentrations were more accurate (Rcv 2 = 0.78) using the logarithm transformation. Concentrations of boron (Rcv 2 = 0.68) and manganese (Rcv 2 = 0.79) were more accurate using the first derivative, while zinc (Rcv 2 = 0.31) concentration was most accurate using the second derivative.


Introduction
Non-destructive methods for measuring nutrient concentration, involving different sensors, have been used in forest and agricultural nutritional management (Lu et al., 2018;Oliveira et al., 2017;Pandey et al., 2017). Among these sensors, hyperspectral reflectance stands out, as it allows obtaining information on the relationships between plants and electromagnetic energy in narrow bands of the electromagnetic spectrum. Didactically, the electromagnetic spectrum is divided into regions, according to wavelengths. The visible (VIS, 400 -700 nm), near infrared (NIR, 700 -1300 nm) and shortwave infrared (1300 -2500 nm) regions of the spectrum are the regions that mostly interact with plants in vivo, since the leaf pigments in plants, especially chlorophylls, interact with the electromagnetic energy in the VIS region and the leaf structure interacts in the NIR region (Gates et al., 1965).
Not only do leaf pigments and structures relate to electromagnetic energy in the VIS-NIR regions, but leaf nutrients also do (Abdel-Rahman et al., 2017;Mahajan et al., 2014;Pimstein et al., 2011. Nutrients have several functions in plant metabolism in the areas of leaf structure and pigment synthesis, plant energy, and metabolism as well as the electron transport chain, among others (Marschner, 1995). Moreover, under deficiency of some nutrients, leaf chlorophyll concentrations are reduced and leaf reflectance is changed (Adams et al., 2000a, b;Mariotti et al., 1996). For this reason and due to other metabolic alterations, it is possible to detect when plants experience nutrient deficiencies through leaf and tissue chloroses and necrosis (Dell et al., 1996).
Because of the complexity of nutrient functions and the number of plant chemical compounds, estimating leaf nutrients by means of hyperspectral reflectance is not a simple procedure with difficulties with autocorrelation and collinearity (Blackburn, 2007;Wold et al., 2001). To overcome these problems, some modelling techniques have been used to predict leaf nutrient concentrations through hyperspectral reflectance, such as the partial least squares regression (PLSR) linked to variable selection methods (Mehmood et al., 2012). Therefore, this study aimed to estimate leaf nutrient concentrations using leaf hyperspectral reflectance and verify the variable selection methods using the PLSR.

Material and methods
Two studies (Study I and Study II) were carried out using Eucalyptus stands in the municipalities of Lassance and Três Marias, Minas Gerais, Brazil, about 800 m a.s.l. (Figure 1).
Study I was established in Eucalyptus stands which were 25 months old, established with 7.0 × 1.3 m tree spacing, with three clones (E. urophylla x E. grandis: GG680, E. urophylla x E. grandis: GG682 and hybrid of E. urophylla ST Blake: I144). In these stands, nine plots of 10 ha were allocated, three per clone. Leaves from the lower part of tree crowns were visually classified into five

Soils and Plant Nutrition
Research Article using partial least squares regression colour patterns using the Munsell chart for plant tissues (Gretag-Macbeth, New Winsor, NY, USA). The leaf colour patterns were defined by the clear expression of the biochemical cycling of nutrients (Saur et al., 2000) (Table 1).
For each colour pattern, 30 leaves were collected in each of the nine plots. A leaf was collected from each tree in a random zigzag walk. The set of 30 leaves constituted a composite sample, totalling 45 composite samples (3 clones × 3 plots × 5 leaf colour patterns). This sampling was performed to obtain a wide variation in leaf nutrient concentrations.
Immediately collection, leaf reflectance (400-900 nm) was read in the abaxial part of each leaf, 10 mm from the lower border, on the left side of the leaves using a CI-710 mini-spectrometer (CID Bio-Science -Camas, Washington, USA). Leaf reflectance was analysed using SpectraSnap! (Software version 1.1.3.150, CID Bio-Science) with 300 milliseconds of integration time, a boxcar with 10 points and two scans for averaging. From the set of 30 smoothed spectra in each sample, the average value for leaf reflectance was obtained for each of the 45 composite samples.
Subsequently, the 30 leaves of each composite sample were placed in paper bags and oven-dried with forced air circulation at 65 ºC. After drying, the 45-composite samples were digested in nitro-perchloric solution and concentrations of calcium (Ca), magnesium (Mg), sulphur (S), zinc (Zn), iron (Fe) and manganese (Mn) were determined by spectrophotometry. The phosphorus (P) concentration was determined by colorimetry, the potassium (K) concentration by flame photometry and the total nitrogen (N) concentration by the Kjeldahl method following sulphuric digestion.
Study II was carried out in commercial stands of the hybrid of Eucalyptus urophylla ST Blake (I144 clone), which were 9, 12, 15, and 25 months old. Sixteen plots of 10 ha were allocated, four plots per age (Figure 1). Leaves from 25 trees in these plots were sampled. They were taken from tall trees in the upper canopy of the stands (80 % percentile). For each tree, four completely expanded leaves, without physical damages, were col-lected at the four cardinal points near in the middle of the Eucalyptus crowns. The 100-leaf set comprised a composite sample to determine nutrients and measure leaf reflectance. The procedures for leaf collecting and chemical analysis were the same as in Study I.
The results obtained in Studies I and II were grouped into a single database with 45-leaf nutrient concentrations and reflectance samples from Study I and 16-leaf nutrient concentrations and reflectance samples from Study II. All leaf reflectance was smoothed by the Savitzky-Golay algorithm using polynomial 2 (SR, Savitzky and Golay, 1964). In addition, three more pre-processing procedures were used comparatively: leaf reflectance logarithmic transformation (LT, Eq. 1), first (FD) and second derivative (SD). (1) where: rl j -leaf reflectance in the j wavelength. The pre-processed reflectance values for each composite sample (n = 61) were set in PLSR as a set of predictors variables and a single wavelength represented a variable (l = 2884). The leaf nutrient concentrations, obtained by the chemical analysis, were set as dependent variables (n = 61) in PLSR. For each nutrient, a PLSR was adjusted through the SIMPLS algorithm on all predictors and dependent variables after pre-processing (Eq. 2 and 3).
where: y i -nutrient concentration in the i sample; T ik -latent variable k in the i sample; r ij -leaf reflectance of sample i in wavelength j; m -total number of wavelengths, 2884; n -total number of samples; e i -error; b k -regression coefficients of the k latent variable; r -number of latent variables; c kj -coefficients of the k latent variable in wavelength j.  For each nutrient, three methods of PLSR were tested for each reflectance pre-processing procedure. The method called PLS1 was a fit for the model using the hyperspectral reflectance in all wavelengths. Moreover, the methods wrapper and filter of variable selection were used to select the wavelengths (inputs) for the PLSR models. The iterative predictor weighting PLS (IPW) was performed using 100 iterations and selecting the wavelengths by means of the regression coefficients. In each iteration, the predictor importance was computed after PLS modelling and this importance was used to re-scale the reflectance in each single wavelength and eliminate the least important wavelengths with values below 0.1 before subsequent model re-fitting (Forina et al., 1999).
Another variable selection method used was the variable importance in PLSR projection (VIP) (Wold et al., 1993). This method accumulated the reflectance importance in each wavelength j that was reflected by the weight (w) from each component and selected the wavelength with VIP > 1 (Eq. 4).
where: SS a -sum of squares explained by the ath component; W W aj a / ( ) 2 -importance of the jth wavelength. The number of latent variables (LV) was based on the minimum predicted residual error sum of squares (PRESS) to avoid over fitting of the model. During the process of training and modelling of validation, several models were created to predict concentration of each nutrient. The model that presented less root mean squares error (RMSE train and RMSE cv , Eq. 5), as well as the highest coefficient of determination in the training (R train 2 ) and validation (R cv 2 ), was selected. The leave-one-out crossvalidation (LOOCV) was used, as it provides lower bias and fewer errors when it is used with a limited number of samples (Cawley and Talbot, 2003).

RMSE Yi yi n
where: Yi is the nutrient concentration of sample i estimated by the equation; yi is the nutrient observed in the laboratory of sample i; n is the total number of samples. RMSE: g kg -1 for N, P, K, Ca, Mg and S; mg kg -1 for B, Zn, Mn, Fe and Cu. All statistical procedures were carried out using the software R Core Team (2017) version 3.4.0, platform support R Studio version 1.0.143. A method process simplification can be visualised in the workflow in Figure 2.

Results
The SR presented higher standard deviation in the VIS region, especially at wavelengths between 550 and 700 nm ( Figure 3A). Similarly, the LT increased the standard deviation mainly at the wavelength near the blue region (400 -500 nm) (Figure. 3B). In the FD pre-pro-cessing, the point of maximum inflection of the curve in the red edge (IPP, ~700 nm) was the region with the highest standard deviations ( Figure 3C). As observed in FD, the wavelengths close to 700 and 730 nm represented the region with the highest standard deviation in SD ( Figure 3D). Table 2 shows the descriptive statistics for nutrient concentrations of Eucalyptus leaves. Large standard deviations are observed in practically all nutrient concentrations, as well as the large range between the nutrient concentration minimum and maximum. On average, nutrient concentrations were classified as optimal for the crop, except for N, P and Fe, which were below optimal.
None of the models used to predict macronutrients showed better accuracy when using the SR and SD pre-procedures (Table 3). The FD was more accurate to predict N and P concentrations using six LV and PLS1. The same pre-processing was used to predict S concentration using 16 LV and select the wavelengths with the VIP method. The estimates of Ca, Mg, and K concentrations were more accurate using the LT. The Ca concentration estimate was more accurate using the 12 LV and the IPW methods. On the other hand, estimates of Mg and K concentration were more accurate using 11 LV while selecting wavelengths using the VIP method.
Among the models developed for micronutrient concentrations, only the estimates for Fe concentrations were performed with high accuracy using the SR preprocess and the VIP method (Table 4). Estimates of Cu concentrations were more accurate using the LT pre-process, with PLS1 and 12 LV. Estimates of B and Mn concentrations were more accurate using the FD with seven LV and selecting the wavelengths using the VIP method. Estimates of Zn concentration were more accurate using the SD with just two LV and the same variable selection method, similar to estimates for B and Mn concentrations. Estimates of N concentration were more accurate than all estimates for nutrient concentrations (Table 3), resulting in an accurate model prediction ( Figure 4). Moreover, concentrations of P, Ca, S, Mn, and Cu were estimated with high accuracy (Table 3) and resulted in an excellent model prediction (Figure 4). Although accuracy for K and B estimates were not as high as for the nutrient concentrations described previously, estimates for K and B were still fairly accurate (Table 3) with a few errors (Figure 4). Estimates for Mg, Zn, and Fe concentrations were the least accurate and included a large number of errors than for nutrients mentioned previously.
Regarding PLSR coefficients, the absolute values indicate the contribution of each wavelength in the predictive models. In other words, the higher the absolute coefficient value, the greater the influence of wavelength on the model. In the N concentration model, the coefficients for wavelengths at 740 and 780 nm presented more absolute values ( Figure 5). Moreover, the pattern of coefficient values observed in the N modelling is similar to that found for the P concentration modelling.
However, the high absolute coefficient value was around 680 nm for the P concentration modelling.
For the K concentration modelling, the VIP method selected wavelengths in the regions for blue, green, red, and a small part of NIR. In this model, the coefficient assigned to 400 nm had a greater absolute value ( Figure 5). Since the Ca concentration modelling used the IPW variable selection method, wavelengths in all hyperspectral reflectance regions were used in the modelling. Nevertheless, higher absolute coefficient values were assigned to wavelengths in the red edge and NIR regions.
The Mg concentration was estimated using wavelengths in the blue, red, red edge and NIR regions, with higher absolute values in the blue region and around 820 nm ( Figure 5). The S concentration was predicted using almost all regions included in this study, with a gap between 706 and 870 nm and higher absolute coefficient values at 705 and 880 nm. Similarly, the B concentration was estimated using all regions of the electromagnetic spectrum with a higher absolute coefficient value around 790 nm.    In the Zn concentration estimates, the higher absolute values were assigned to wavelengths for the blue and NIR regions, while the wavelengths for the blue, red, red edge and NIR regions were used to predict Mn concentration with higher absolute coefficient values at 680 nm and around the NIR region. The wavelengths in the blue region also had higher absolute coefficient values in the estimates for Fe and Cu concentrations. More specifically, in Cu concentration estimates, 520 nm was the wavelength with the highest absolute coefficient value.

Discussion
The pre-processing caused changes in different regions of the spectrum (Figure 3), which is important for modelling leaf nutrient concentration with large standard deviations (Table 2). A higher standard deviation for leaf reflectance indicates that these regions may be more favourable for extracting differences in independent samples and undertaking robust modelling. In the SR and LT pre-processes, the highest standard deviations at 550 and 700 nm and below 550 nm are comparable to those in several plant species (Asner et al., 2011;Dechant et al., 2017). Likewise, regions of leaf reflectance related to physiological traits, like the wavelengths at around 700 nm (Horler et al., 1983), presented higher standard deviations in FD and SD (Figures 3 C, D).
In this study, the number of LV used to estimate nutrient concentrations ranged between 2 and 16 ( Table  3, 4). Pandey et al. (2017) found similar results when predicting nutrient concentrations using hyperspectral images in maize and soybean. In the PLSR modelling for hyperspectral data and biochemical traits in plants, it is usual to use more than 10 LV (Nguyen et al., 2006;Ramoelo et al., 2013). However, high LV values may undergo overfitting, which could compromise the model predictive power (Wold et al., 2001). On the other hand, in some studies, biochemical traits were predicted with less than 10 LV, for example in Cirtrus sinensis (L.) Osbeck cv Tarocco, Picea rubens Sarg., Abies balsamea (L.) Mill. and Beta vulgaris L. (Swiss chard) (Abdel-Rahman   Menesatti et al., 2010). Therefore, the LOOCV was performed to evaluate the model predictive capacity (Tables 3, 4). In general, the variable selection methods increased the PLSR models (Tables 3, 4). In estimates for K, Mg, S, B, Zn, Mn, and Fe concentrations, greater accuracy was obtained with the VIP method. The IPW method was only used with the highest accuracy for estimates of Ca concentration. In general, variable selection methods increase accuracy of hyperspectral modelling and the use of whole wavelengths has a negative influence on the predictive PLSR capacity (Abdel-Rahman et al., 2017;Filzmoser et al., 2012). However, estimates of N, P, and Cu concentrations were more accurate using the wavelengths for the entire reflectance spectrum (400 -900 nm).
It is known that the relationship between N, P, and Cu concentrations and the reflectance in the studied region of reflectance spectrum is very strong (Pimstein et al., 2011;Mahajan et al., 2014;Oliveira et al., 2017). For this reason, even using the whole reflectance spectrum, the model accuracy for these nutrients was high ( Figure   4). Nevertheless, the variable selection methods increase accuracy for estimates of N and P concentrations in oilseed rape (Brassica napus L.) (Zhang et al., 2013).
Estimates of N concentration had the greatest accuracy, followed by estimates of P concentration ( Figure  4). In general, the concentrations of these nutrients are the most accurate in spectral data estimates (Asner et al., 2011;Pandey et al., 2017;Ramoelo et al., 2013). Both nutrients are highly related to photosynthetic traits in plants (Dechant et al., 2017). However, the same result may not be observed in all plant species (Abdel-Rahman et al., 2017;Menesatti et al., 2010). In addition, Asner et al. (2011), using spectroscopy to determine nutrients in humid tropical forest canopies, were only successful in estimating N and P concentrations with less accuracy.
The higher absolute coefficient values related to wavelengths in the red and red edge regions explained the strong relationship between these wavelengths and N and P concentrations ( Figure 5). The red and red edge regions are widely used for estimates of leaf nutrient for several forest and agricultural crops (Mutanga et al., 2005;Oliveira et al., 2017;Schlemmer et al., 2013;Yu et al., 2014).  Regarding estimates of K concentration, several wavelengths in the VIS and NIR regions were selected ( Figure 5). These regions are related to changes in K concentration in leaves Zhai et al., 2013). In the VIS region, the blue, green, red and red edge regions were selected to estimate K concentration using the VIP method ( Figure 5). These same regions were used to classify canola (Brassica napus L.), plants with K deficiency using a hyperspectral camera (Severtson et al., 2016).
The red edge and NIR presented higher absolute coefficient values for estimates of Ca concentration (Figure 5). This is probably due to the relationship between the structural components and the energy in these regions. One of the functions of Ca in higher plants is cell wall synthesis and cell membrane integrity; thus, Ca has a structural function (Marschner, 1995). Moreover, leaf reflectance in NIR regions demonstrates a strong relationship with the structural components of plants (Gates et al., 1965).
Higher absolute coefficient values in the blue region and near 820 nm were obtained for estimates of Mg concentration ( Figure 5). Mg is the central atom in the chlorophyll molecule ring and indirectly, Mg concentration influences leaf reflectance in the VIS region chang- ing chlorophyll concentration in leaves (Marschner, 1995). Leaf chlorophyll has absorption peaks in the electromagnetic spectrum in the blue and red regions (Lichtenthaler, 1987). Most likely, higher absolute coefficient values are in wavelengths in the blue region.
Concentration of S was estimated using almost all regions in the electromagnetic spectrum with higher absolute coefficient values closer to 700 and 880 nm (Figure 5). Wavelengths in these regions have been used in the management of S in wheat (Triticum aestivum L.) and rice (Oryza sativa L.) (Mahajan et al., 2014;. Using the VIP method, wavelengths in all regions of the spectrum were selected for estimates of B concentration with higher absolute coefficient values at approximately 790 nm ( Figure 5). Similar to Ca, B plays a structural role in plants (Marschner, 1995) and has a strong relationship with the NIR region, as explained previously. In general, there have been problems with hyperspectral sensors to estimate B concentrations, namely in soybean (Glicine max L., variedade Thorne) and maize (Zea mays, B73 inbred) crops . In this study, although accuracy of B concentration estimates was not the highest, leaf concentration was estimated with R 2 cv = 0.68 and RMSE cv = 22.66 mg kg -1 (Table 4 and Figure 4). Estimating leaf nutrient concentration Sci. Agric. v.77, n.6, e20180409, 2020 Estimates of Zn concentration had the lowest accuracy with higher absolute coefficient values related to wavelengths in the blue and NIR region (Table 4 and Figure 5). In the plant, Zn is as metal ion; thus, higher precision in its estimate using leaf hyperspectral reflectance was expected . However, estimates of Zn concentration using leaf reflectance are not frequently reported and the weak relationship may be a reason for this. For instance, Zn deficiency was not detected by reflectance in soybean plants under Zn omission in the growth solution (Adams et al., 2000a, b).
Higher absolute coefficient values were attributed to wavelengths in the blue, red, red edge and NIR regions to estimate Mn concentration ( Figure 5), while higher absolute coefficient values were related to wavelengths in the blue region for estimates of Fe concentration. Estimates of Cu concentration had higher absolute coefficient values in wavelengths at around 520 nm (Figure 5). Zhang et al. (2017) also selected a wavelength in the green region to estimate Cu concentration in vegetation stress. Nevertheless, as described for Zn, the relationship between leaf reflectance and micronutrients is little explained in the literature (Adams et al., 2000a;Pandey et al., 2017;Zhang et al., 2017).
Only in estimates of B and K concentration, wavelengths around 425 -435 nm and 460 -468 nm were not used, respectively ( Figure 5). Since 91 % of macroand micronutrients models used leaf reflectance in these wavelengths, it can be an indicative of the importance of these wavelengths in leaf nutrient modelling of concentrations. On the other hand, only estimates of N, P, and Cu concentrations used leaf reflectance in the wavelengths 517 -532 nm, 763 -778 nm, 790 -798 nm, 805 -819 nm, 832 -841 nm, and 866 -890 nm. As previously discussed, these wavelengths are avoided to estimate most nutrient concentrations in Eucalyptus clones.
In this study, we used more than 2,000 predictors in 61 samples of leaf reflectance and nutrient concentrations. Although there are more predictors than dependent variables, PLSR is a robust method than can overcome this problem and deal with small sets of samples and large numbers of estimated parameters associated to models for hyperspectral data (Abdel-Rahman et al., 2017;Ramoelo et al., 2013). Moreover, more advanced analytical and sensing methods were previously used to estimate macro and micronutrient concentrations in several plant species (Pullanagari et al., 2016;Abdel-Rahman et al., 2017;Pandey et al., 2017;Wang et al., v.77, n.6, e20180409, 2020 2017). However, the use of PLSR in leaf hyperspectral reflectance presented a rapid and accurate method to estimate leaf macro-and micronutrient concentrations in Eucalyptus clones.

Conclusion
Leaf hyperspectral reflectance was capable to estimate leaf nutrient concentration in Eucalyptus clones using partial least squares regression and variable selection methods. In general, the variable selection methods increased accuracy of nutrient concentration estimates. Although the number of studies on relationships between nutrient concentrations and leaf reflectance is increasing, the biochemical meaning behind their models is still poorly described.
The NIR region evaluated in this work was up to 900 nm, the complete NIR and SWIR regions (until 2500 nm) may have wavelengths that could be useful to estimate accurately leaf nutrient concentrations in Eucalyptus clones. The authors recommend that the results be up-scaled to another time and space points to evaluate the operational use of this approach.