Digital Soil Mapping of Soil Properties in the “ Mar de Morros ” Environment Using Spectral Data

Quantification of soil properties is essential for better understanding of the environment and better soil management. The conventional techniques of laboratory analysis are sometimes costly and detrimental to the environment. Thus, development of new techniques for soil analysis that do not generate residues, such as spectroscopy, is increasingly necessary as a viable way to estimate a wide range of soil properties. The objective of this study was to predict the levels of organic carbon (OC), clay, and extractable phosphorus (P), from the spectral responses of soil samples in the visible and near infrared (Vis-NIR), medium infrared (MIR), and Vis-NIR-MIR using different preprocessing methods combined with five prediction models. Soil samples were collected in Iconha, Espírito Santo State, Brazil, in the Ribeirão Inhaúma basin. A total of 184 samples were collected from 92 sites at two depths (0.00-0.10 and 0.10-0.30 m). Physical, chemical, and spectral analyses were performed according to routine soil laboratory methods. Random selection was made of 70 % of total samples for training and 30 % for validation of the models. The coefficient of determination (R) and root mean square error (RMSE) were calculated in order to assess model performance. The standardized indexes of prediction error RPD and RPIQ were also calculated. For clay and OC, the best R was found in the MIR spectrum, at 0.69 and 0.65, respectively, and for P, it was 0.57 in Vis-NIR. The MSC (Multiplicative Scatter Correction), CR (Continuum removal), and SNV (Standard Normal Variate) preprocesses were most efficient for predicting clay, OC, and P, respectively, while the PLSR Partial Least Squares Regression (OC and P) and SVM Support Vector Machine (clay) gave the best predictions and are therefore recommended for modeling these properties in the study area. The models identified in this study can be used to discriminate soils according to a critical test value for clay, OC, and P.


INTRODUCTION
Pedometry uses modern techniques of mining and data analysis to quantify soil properties, and it is one of the most promising fields of soil property prediction from its relationship with spectral responses in different reflectance ranges (Adeline et al., 2017;Nouri et al., 2017).Conventional laboratory analyses used for quantification of soil properties are costly (Bogrekci and Lee, 2005;Viscarra Rossel et al., 2006;Bashagaluke et al., 2015) and likely to have environmental effects because of the chemical reagents they use (Nanni and Demattê, 2006).Furthermore, the time from sampling to acquisition of results is long (Bashagaluke et al., 2015).
In recent years, reflectance spectroscopy, both in near-infrared (NIR) and mediuminfrared (MIR) spectra, in addition to visible bands, has drawn attention for its potential use in non-destructive, fast, and efficient methods for quantifying soil properties (Bashagaluke et al., 2015).In the study conducted by Mohamed et al. (2018), the NIR was the most efficient systematic strategy for characterizing and identifying soil properties, whereas Abdi et al. (2016) found visible spectroscopy (Vis-NIR) to be the most efficient strategy.
Quantitative analysis based on NIR or MIR spectra requires the development of calibrations that relate spectral information to known analyte contents (Reeves III and Smith, 2009).The precise quantification of different soil properties is performed using large libraries, with many samples (Brown et al., 2006;Fernández-Pierna and Dardenne, 2008;Vasques et al., 2008;Genot et al., 2011;Viscarra Rossel and Webster, 2012), and a way to deal with this large number of covariate samples, such as infrared spectra, is the selection of those with the highest predictive power (Minasny and McBratney, 2008).
Thus, preprocessing strategies are incorporated as a necessary step prior to prediction calibration to improve extraction of useful information from both additive and multiplicative effects superimposed on reflection spectra (Peng et al., 2014).That way, the efficiency of the mathematical models that are used to link the generated spectra (predictor variables) to the soil properties (response variables) is increased (Nouri et al., 2017).These include the Support Vector Machine (SVM) (Stevens et al., 2010), Partial Least Squares Regression (PLSR) (McCarty et al., 2002), Random Forest (RF), Artificial Neural Networks (ANN), and Gaussian Process Regression (GPR).
The PLSR method is probably the most commonly used multivariate statistical technique for spectral calibration and prediction of soil properties (Nouri et al., 2017).This technique reduces the predictors for a set with a smaller number of components that are not correlated with each other, and then uses these to perform least squares regression.In addition, the SVM is a computational technique based on pattern recognition, so it determines decision limits where optimal separation occurs between classes with minimization of error (Nascimento et al., 2009).
The RF method was developed by Breiman (2001) for classification and regression.In this technique, decision trees are formed using an initial random set, that is, each tree is generated from the values of a random vector.It is robust and insensitive to noise.The ANNs are usually organized in layers; they are constituted by a series of interconnected nodes that contain an activation function.The patterns are provided to the network through the input layer, which communicates with one or more hidden layers where actual processing is done through a weighted connection system; most ANNs contain some form of learning rule that modifies the weights of the connections according to the input patterns (Mohamed et al., 2018).Gaussian Process Regression is a geostatistical equivalent to kriging interpolation widely known and used in pedometric research; however, rather than using geographic coordinates as input data, it uses spectral data (Ramirez-Lopez et al., 2013).
Rev Bras Cienc Solo 2018;42:e0170413 In preprocessing strategies, smoothing is a simple moving mean of spectral data using a convolution function (Stevens et al., 2013).The preprocessing of normalization refers to the creation of displaced and staggered versions of spectral data, where these normalized values eliminate scattering effects (Rinnan et al., 2009).The Savitzky-Golay derivatization algorithm (Savitzky and Golay, 1964) requires the selection of smoothing points (filter width), polynomial orders, and derivative orders.The CR technique proposed by Clark and Roush (Clark and Roush, 1984) consists of removing continuous spectral characteristics and is often used to isolate specific absorption characteristics present in the spectrum to minimize noise.The continuum is represented by a mathematical function used to separate and highlight specific absorption bands of the reflectance spectrum (Mutanga et al., 2005).
Based on the hypothesis that it is possible to predict soil properties from their spectra, this study aimed to predict the organic carbon (OC), clay, and extractable P contents from the MIR, Vis-NIR, and Vis-NIR-MIR using different preprocessing methods (Continuum Removal -CR; Absorbance -ABS;

MATERIALS AND METHODS
Soil samples were collected in the Ribeirão Inhaúma basin, in Iconha, ES, Brazil, near coordinates 21° 10' 58.82" S and 41° 00' 08.87" W, with an area of 2,403.9ha.According to the Köppen classification system, the climate of the study area is Aw, and its relief is mountainous with steep areas.We used 184 samples, obtained at 92 sites at two depths (0.00-0.10 and 0.10-0.30m).The sampling sites were determined using a conditioned Latin hypercube method, with consideration given to access because of difficulties of movement in the area due to topographic relief.A 200-m buffer along the roads was determined.At each point, coordinates were recorded using a dual frequency GNSS receiver (L1, L2), whose data were processed in the Leica Geo Office 8.0 program.The base was adjusted from the fixed station of the IBGE in Vitória, ES, Brazil.
The samples were air dried, crushed, and sieved through a 2-mm mesh to quantify the clay, extractable P, and OC contents.Clay was quantified by the pipette method, extractable P by Mehlich-1, and organic carbon by Walkley and Black (Donagema et al., 2011).
In spectral analysis, about 5 g of sieved soil of less than 2 mm was used for bidirectional reflectance (350 to 2,500 nm); and for diffuse reflectance (medium infrared -2,500 to 25,000 nm), another 5 g were processed, which was milled, homogenized, and sieved to 0.149 mm (100 mesh).To obtain the bidirectional data in the Vis-NIR bands, the samples were packed in petri dishes and leveling was performed to reduce the roughness of the surface.For each sample, 300 readings were performed automatically by the sensor, with 100 readings every 90°.The final value considered was the mean of three measurements.The calibration of the sensor was done using a Spectralon (barium sulfate) plate, with reflectance of 100 %.Calibration was repeated every 20 minutes.
The FieldSpec Pro (Analytical Spectral Devices, Boulder, Colorado) sensor (Hatchell, 1999) was used, whose resolution is 2 nm for the bands of 1,100 to 2,500 nm, and 1 nm for the other wavelengths.It was positioned vertically at 8 cm from the sample, with an 18° field of view.Two 50 W halogen lamps were used as a source of illumination, which were positioned 35 cm from the platform, with a zenith angle of 30°.At the end, the bidirectional reflectance factor was calculated, given by the ratio between the spectral radiance reflected by the soil sample and the radiance reflected by the Spelactron plate.
Rev Bras Cienc Solo 2018;42:e0170413 The Alpha Sample Compartment RT-DLaTGS ZnSe (Bruker Optik GmbH) equipped with a diffuse reflectance acquisition accessory (Drift) was used to obtain data in the mediuminfrared range (diffuse reflectance).The device uses a He-Ne laser as an internally positioned light source and calibration standard for each wavelength.The sensor has a KBr beam splitter that allows wide amplitude of radiation incident on the sample (from 2,500 to 25,000 nm).About 1 cm³ of soil sample was placed for reading in the equipment container.Sixty-four readings were made every second per spectrum and these were acquired at a 2 cm -1 resolution.Before each reading, the calibration of the sensor, with diffuse gold plate, was performed to remove the background radiation from the sample spectrum.
Of the samples, 70 % were randomly selected for training and 30 % for validation (external set of samples) of the models.The preprocessing methods were Continuous Removal (CR), Absorbance (ABS), Savitzky Golay Derivative (SGD), Standard Normal Variate (SNV), and Multiplicative Scatter Correction (MSC).We used Software R, version 3.4.
The CR is obtained using the prospectr package, where it is calculated according to the following mathematical description: in which x i is the original reflectance value; c i is the reflectance value of the continuum at the i th wavelength of a set of p wavelengths; and φ i is the final value of the reflectance after removal of the continuum.
Absorbance is calculated by performing equation 2 on R.
A = log 10 1 R Eq. 2 in which A is the absorbance; log 10 is the logarithm base 10; and R is the reflectance.
The SGD is implemented by the savitzkyGolay function in the prospectr package.The mathematical description is given by equation 3.
in which x j is the new value; N is a normalization coefficient; m is the number of neighboring values on each side of j; and c h are precalculated coefficients, which depend on the chosen polynomial and derivative orders.
The SNV is implemented by the standard Normal Variate function in the prospectr package, according to: in which x i is the original reflectance; x̅ i is the mean of the original reflectance; and s i is the standard deviation of the original reflectance.
The pls package includes the msc function for MSC preprocessing in R. The mathematical description of the MSC is given by equation 5.

MSC =
x i -a i b i Eq. 5 in which x i is the original reflectance value; and a i and b i are the regression coefficients for sample i.
Rev Bras Cienc Solo 2018;42:e0170413 These preprocesses were combined with the following multivariate regression algorithms: Partial Least Squares Regression (PLSR), Artificial Neural Network (ANN), Random Forest (RF), Gaussian Process Regression (linear and radial GPR), and Support Vector Machine (SVM), for a total of 30 tests.The data were processed using R 3.4 software, with the Alrad Spectra package.This package is not in CRAN, but in the GitHub repository.Instructions for installing, loading, and booting Alrad Spectra in R (Dotto et al., 2017).
For the Neural Network model, the parameters used were as follows: resampling method = 10 k-fold cross-validation; activation function = purelin (linear); and hidden units = 10.The elmtrain function of the elmNN package employs the best fitted parameters and executes the final ANN model.The caret package adjusts the SVM model, and the best parameters are used to generate the final model with the SVM function available in the e1071 package.The adjustment parameters for SVM were as follows: resampling method = 10 k-fold cross-validation, and Kernel parameter = Support Vector Machine with Linear Kernel.
For accuracy analysis of the models, the coefficient of determination R² was calculated according to the equation 6: in which P i and o i , are the values predicted and observed at location i, respectively; and n is the number of samples.
The RMSE, and the RPD (Residual Prediction Deviation) and RPIQ (Ration of Performance to Interquartile Distance) indexes were also calculated.The RMSE was obtained using the equation 7: in which RMSE is the square root of the mean error; ẑ (S j ) are the estimated values; z* (S j ) are the validation data; and ι is the number of points for validation.
The Residual Prediction Deviation (RPD) (Williams, 1987) and RPIQ (Bellon-Maurel et al., 2010) were calculated, respectively, by the equations 8 and 9: Eq. 9 in which SD is the standard deviation; IQ is the difference obtained between the value referring to the 3 rd and 1 st quartile of the data distribution; and RMSE is the square root of the mean error.
Interpretation of the RPD values regarding the quality/reliability of the prediction was made according to the criteria proposed by Chang and Laird (2002) and Chang et al. (2001): values greater than 2.00 indicate excellent models for accurate prediction of properties; values between 1.40 and 2.00 indicate reasonable models; and values below 1.4, unreliable models.The same analogy was applied for RPIQ analysis (Terra, 2011).

RESULTS AND DISCUSSION
Descriptive statistics are presented in table 1; to meet the assumptions of normality skewness and kurtosis, data should be close to 0 and 3 (Groeneveld and Meeden, 1984).Due to the nature of the distribution of the properties in the soils, it's have a normal distribution (Bellon-Maurel et al., 2010).Soil properties generally have high spatial variability, even in homogeneous agricultural fields, and therefore many samples should normally be collected and analyzed to capture this spatial variability and to adequately estimate its properties (Bilgili et al., 2010).Positive results are usually obtained for more heterogeneous samples, such as the effective calibrations achieved by Fidêncio et al. (2002), Kemper et al. (2005), and Brown et al. (2006).
Increased variability in the training phase of a statistical model leads to improved model robustness and greater ability to characterize a diverse variety of samples.Nevertheless, this variability may also lower prediction accuracy (McCarty et al., 2002).A widely applicable model for predicting OC should be based on a wide range of OC values in soils and on soils with different mineralogical contexts, as indicated by Hartmann and Appel (2006).Soil mineralogy is one of the main factors that causes differences in reflectance.Preprocess proved to be effective, although Kooistra et al. (2001) reported better predictions for clay and organic matter (OM) without spectral preprocessing.
Results are shown in table 2 for the clay property.When the coefficient of determination (R²) was analyzed, the highest values were found in the Vis-NIR (R 2 = 0.69, RMSE = 4.38) and Vis-NIR-MIR (R 2 = 0.54, RMSE = 5.88) spectra.Small improvements in clay prediction using the combination Vis-NIR-MIR were found by Viscarra Rossel et al. (2006).The Vis-NIR spectra contained valuable information to predict soil texture as reported by Mohamed et al. (2018), in agreement with the results obtained in our study.According to Hunt (1980), many clay minerals have unique spectral reflectance at visible wavelengths and NIR-SWIR (Near-Infrared--Short-wave Infrared).
For the Vis-NIR set, the best R 2 value (0.69) was obtained from the combination of linear MSC-SVM algorithms.For Vis-NIR-MIR, it was from the combination SGD-PLSR (0.54).
In the MIR range, the highest values of RPIQ in the validation set were found for the preprocessing combination SGD and PLSR predictor (1.82), and highest values of RPD for the combination MSC with PLSR and RF predictors (1.38).By RPD index analysis, we have an "unreliable" model (RPD less than 1.4), while the RPIQ raises the prediction for reasonable models.In the other spectra, maximum values of 2.15 for RPIQ and 1.62 for RPD were found in Vis-NIR; and 1.87 for RPIQ and 1.31 for RPD in Vis-NIR-MIR.By RPIQ analysis, the models were classified as excellent (2.15) and reasonable (1.87), whereas RPD classified them as reasonable (1.62) and unreliable (less than 1.4).
In relation to the prediction capacity analyzed within each preprocessing method, according to RPIQ and RPD values, we observed that for the clay variable, PLSR was higher in all spectrum bands except for RPD in the Vis-NIR, whose SVM was higher.It is important to note that, although it was presented as a method that reached higher index values, the maximum value was found by the SVM-linear, GPR-linear, and RF algorithms.In relation to clay preprocessing, the SGD method stood out, with most of the RPIQ values higher than 1.4 in the MIR range; the MSC method and the SNV algorithm in the Vis-NIR spectrum and MSC in Vis-NIR-MIR also stood out.In the Vis-NIR range, RPIQ values higher than 2.0 were found.Regarding RPD, the MSC method was superior in the MIR range, while the SNV method was more efficient in the Vis-NIR range.In contrast, Absorbance (ABS), Continuum Removal (CR), and Savitzky Golay Derivative (SGD) were more efficient in Vis-NIR-MIR.Values of RPD greater than 1.4 were found only for MSC in the Vis-NIR range.As in the MIR range, the models classified by the RPIQ index exhibited higher quality than that determined by RPD.For clay prediction, some satisfactory results were found by Chang et al. (2001) and Morón and Cozzolino (2003).In the studies conducted by Ben-Dor and Banin (1995) and Islan et al. (2003), the clay was modeled from the NIR spectrum and found R 2 of 0.56 and 0.75, respectively.While Chang et al. (2001) and Shepherd and Walsh (2002) worked on the Vis-NIR spectrum, obtaining slightly better values, of 0.67 and 0.78.It is possible to predict clay in the NIR region using PLSR according to Silva et al. (2016).Viscarra Rossel et al. (2006) modeled clay and found R² values of 0.43 (Vis), 0.60 (NIR), 0.67 (MIR), and 0.67 (Vis-NIR-MIR); while Kania and Gruba (2016) tested clay prediction by NIR spectra and found R² values of 0.57 and 0.21.These values are lower than the values found in this study, while Ben-Dor and Banin (1995) obtained higher values, with R 2 = 0.86.
The significant wavelengths to estimate clay content in the NIR range are 1,600, 1,800, 2,000, and 2,100 nm (Viscarra Rossel and McBratney, 1998).In contrast, Nawara et al. (2016) reported that the wavelength of 2,206 nm would be the ideal band for quantification of this property.In this study, the most important variables in the prediction (Figure 1) were the spectra around 1,500, 1,800, and 2,100 nm (Viscarra Rossel and McBratney, 1998).
The results for the organic carbon (OC) variable are shown in table 3.In the MIR spectrum, the best R 2 value was 0.65, with RMSE of 8.31.This is obtained by the combination CR-PLSR.
The highest RPIQ and RPD indices were 1.58 and 1.32, the first being obtained by SGD preprocessing and the PLSR predictor; and the second by the SNV-PLSR combination.
In the Vis-NIR and Vis-NIR-MIR bands, the R 2 values were slightly lower than in the MIR, while the RPIQ and RPD values were slightly higher.According to Vohland et al. (2014), the physical mechanisms differ basically between the Vis-NIR and MIR domains, while the fundamental molecular vibrations of the soil components can be measured only in the MIR.In the NIR range, the repercussions and combinations of these fundamental vibrations are detected.However, Adeline et al. (2017) emphasize that PLSR is the most widely used multivariate statistical method in Soil Science for chemometrics.This method highlights calibration of soil reflectance to estimate soil properties (Viscarra Rossel et al., 2006) and is superior to traditional methods in dealing with multicollinearity in high-dimensional data (Bilgili et al., 2010).In the study performed by Bilgili et al. (2010), the authors worked with 512 samples and different soils and were able to predict OM with an R 2 of 0.73 from Vis-NIR with data preprocessing using SGD.
There are several studies that demonstrate the possibility of OC prediction based on MIR (Zimmermann et al., 2007;Bornemann et al., 2008;Yang et al., 2012).Studies that compare MIR and NIR in the same samples show that MIR consistently outperforms NIR in soil analysis, especially for C and N fractions (McCarty et al., 2002;Reeves III et al., 2002;Madari et al., 2005).The MIR spectroscopy produced better models (from 10 to 40 %) than models developed from NIR spectra in soil carbon studies (Bellon-Maurel and McBratney (2011).Nevertheless, none of the studies analyzed compared NIR and MIR spectroscopy in the same soil samples (Knox et al., 2015).
In general, MIR is considered superior to Vis-NIR (Vohland et al., 2014).The MIR seems better than NIR and Vis-NIR for estimation of soil carbon contents, as indicated in the literature (McCarty et al., 2002;McCarty and Reeves III, 2006).However, this superiority has not been recognized in all studies (Madari et al., 2006;Ludwig et al., 2008;Michel et al., 2009).In this study, the best OC prediction was obtained in the MIR spectrum, in relation to R 2 , but the best values of RPIQ and RPD were in Vis-NIR.
This may be related to the fact that separation of the contribution of each soil component in the Vis-NIR spectra is a challenging task due to the complex nature of the soil matrix, with multiple overlays of spectral characteristics, as well as the strong collinearities among soil properties (Gobrecht et al., 2013).According to Knox et al. (2015), the MIR spectroscopy has generally been shown to predict OC and total carbon with greater accuracy than the Vis-NIR derived models.These authors found R 2 values ranging from 0.58 to 0.87 for Vis-NIR, 0.87 to 0.96 for MIR, and 0.88 to 0.95 for Vis-NIR-MIR.However, they used a much larger range of samples than this study, with 1,014 sites being sampled, using 696 for calibration and 296 for validation.
The R 2 value of 0.60 was found by Vohland et al. (2014) for OC in Vis-NIR; for MIR, the R 2 value was 0.78; RPD values were 1.58 and 2.12, respectively.Analyses were performed in a set of 60 soil samples extracted from arable land with different soil types and different textures, developed from different bedrocks and in different landscape positions.Despite having only a few samples, the heterogeneity of the analyzed material was perceived, unlike the results of our study.
Values of R 2 of 0.8 for OC was found by Shepherd and Walsh (2002) using a spectral library with more than 1,000 samples, while Bashagaluke et al. (2015), predicting carbon, found R 2 of 0.72 using 530 composite soil samples.Similar accuracie (R 2 = 0.73) was found by Viscarra Rossel et al. (2006) for a validation set of 118 samples in an area of 18 ha in Australia.Value of R 2 equal to 0.86 for OM prediction was found by Daniel et al. (2003) in a study developed in Thailand using artificial neural networks in the Vis-NIR spectrum.
A higher correlation between measured and predicted carbon in the MIR range, among the spectra tested, with R 2 between 0.63-0.85was found by Arachchi et al. (2016).Kania and Gruba (2016) tested to predict total carbon using the NIR spectra and found R 2 values for calibration of 0.80 and 0.48, and for validation, of 0.03 and 0.22, values lower than those found in this study.Summers et al. (2011), predicting OC in the range of 400-2500 nm, with 228 samples, achieved R 2 of 0.57 with a RPD of 1.8.Reeves III and Smith (2009) predicted OC in the MIR and NIR spectra and found R² values of 0.58 in MIR and 0.53 in NIR.
Determination of organic carbon or OM is generally feasible, but confounding factors such as particle size, soil color, and soil type, among others, may cause problems for development of calibration (Reeves III (2010).It has also been specifically observed that the use of the Walkley-Black procedure (Walkley and Black, 1934) for OC can be problematic due to non-linearity in the measured values.However, this procedure is the most commonly used method (Malley et al., 2004).These factors could explain the values found, slightly below those in some studies cited in the literature with R 2 above 0.8 and RPD-RPIQ higher than 2.0.
Infrared spectroscopy is well adapted and used to predict soil OC (Bellon-Maurel et al., 2010;Reeves III, 2010).Soil OC absorbs directly in the infrared region due to the high sensitivity of this region to groups such as C-H, C-O, and C-N prevailing in OM (Soriano-Disla et al., 2014).According to Dor et al. (1999), the OH and CH groups dominate NIR and the electronic transitions of the visible portions of the electromagnetic spectrum.
The OM includes the living biomass of plants and remnants of vegetation (Bartholomeus et al., 2008), and CO is an indication of the organic matter content since it is one of the main components of OM (Steiner et al., 2011).According to Beyer et al. (2001), CO contain biochemical components, such as chlorophyll, cellulose, pectin, starch, lignin, and humic acids, which influence the visible (400-700 nm) and near-infrared (700-1,400 nm) reflectance of the electromagnetic spectrum.The variation in cellulose concentrations was explained by Hartmann and Appel (2006) through the NIR spectrum.The Vis-NIR was applied by Viscarra Rossel and Hicks (2015) to predict carbon fractions.
Theoretically, the OM, due to its complexity, is spectrally active in practically the entire NIR region (Ben-Dor and Banin, 1990), but it is often reported that organic matter signals in this region may be weak (Viscarra Rossel and McBratney, 1998) as there may be overlapping spectral characteristics of some minerals and organic matter (Ben-Dor and Banin, 1990;Viscarra Rossel and McBratney, 1998).Bands of 1,744, 1,870, and 2,052 nm were important for organic carbon predictions according to Ben-Dor and Banin (1990).
In this study, the best prediction was found in the MIR spectrum, whose most important prediction variables are presented in figure 2. The most relevant bands were 18,193 and 16,879 nm, followed by bands 17,247 and 17,326 nm.
For the phosphorus variable (P), the values of R 2 , RMSE, RPD, and RPIQ are shown in table 4. The R 2 values were the lowest among the properties modeled in this study.
Influence between the prediction of available P in the Vis-NIR spectrum and the type of solution used in extraction during soil analysis in the laboratory was reported by Abdi et al. (2016).According to Minasny et al. (2009), available P extracted by either bicarbonate or the Bray method is not well predicted.Adequate accuracy of Vis-NIR was  found when available P was extracted using the Olsen method (Van Groenigen et al., 2003).Nduwamungu et al. (2009) showed that P extracted with Mehlich-3 (M-3) was poorly predicted by Vis-NI.Soil chemical extractions that alter the balance between phases may complicate interpretation of the results even more (Viscarra Rossel et al., 2006).Historically, the soil system and its quality were evaluated through this type of laboratory analysis (Viscarra Rossel et al., 2006).
In the MIR spectrum, no satisfactory R 2 value was found.In the Vis-NIR spectra, the R 2 value was 0.57 (SNV-PLSR with RMSE = 3.09).Maleki et al. (2006) observed the difference in Vis-NIR spectral reflectance according to variation in P content and they hypothesized that P correlates indirectly with the near-infrared through different soil components.
For RPIQ, the highest value was found in Vis-NIR-MIR, at 2.04 (SNV-ANN), followed by Vis-NIR, at 1.92 (SNV-PLSR) and MIR, at 1.64 (SGD-SVM linear).For RPD, the Vis-NIR band had a value of 1.53 (SNV-PLSR).Thus, the Vis-NIR-MIR spectrum was highlighted by the RPIQ index and was classified as excellent, while the others were considered reasonable; by the RPD index, Vis-NIR was qualified as a reasonable model.
Regarding the RPIQ and RPD values for the variable P, PLSR was higher in Vis-NIR and in MIR together with radial GPR (for the RPIQ index); RF and ANN stood out in Vis-NIR-MIR.
In phosphorus preprocessing, SGD stood out, with most RPIQ values higher than 1.4 in the MIR and SNV range.In the other values, had RPIQ values higher than 2.0.As for RPD, only the Vis-NIR range presented values above 1.4 with SNV, standing out as the best preprocessing for the variable.According to Malley et al. (2004) predicting P with NIR spectra is less frequent.Possible causes are the nature of the element studied, interaction with other elements, the extraction method used (Chang et al., 2001), and to sample set heterogeneity and calibration.
In contrast, Niederberger et al. (2015) obtained results with adequate accuracy working in the NIR spectrum, where they realized that predicted models based on the organic P fractions had results superior to the inorganic P, because the organic compounds were more easily excited in the NIR spectrum.Values of R 2 between 0.7 and 0.8 were found by Abdi et al. (2012) predicting total P, using the Vis-NIR spectrum for mapping element contents in an extremely sandy soil in Canada, attributing these values to the correlation between the total P and organic matter.Viscarra Rossel et al. (2006) found R 2 values of 0.06 (Vis), 0.01 (NIR), 0.20 (MIR), and 0.02 (Vis-NIR-MIR) when modeling available P. While Janik et al. (1998) modeled available P contents with the MIR spectrum and found R 2 values of 0.07, whereas Daniel et al. (2003) found R 2 values of 0.81 with Vis-NIR.
Phosphorus was predicted by Lee et al. (2003) in 540 soil samples from four major soil types in Florida in the 400 to 2,500 nm wavelength range; they worked with PLSR and R 2 values that ranged from 0.52 to 0.66.Values of R 2 ranging from 0.51 to 0.95 for different bands of the spectrum were found by Knox et al. (2015), while Minasny et al. (2009) did not find satisfactory values in prediction of available P in the MIR spectrum; McCarty and Reeves III (2006) also did not consider P prediction to be satisfactory using this wavelength.
The NIR could be satisfactory to predict some properties but did not obtain adequate results or obtained results with great variability from study to study for mineral forms of Ag, Al, Cd, Cu, Co, Fe, K, N, P, Pb, Na, Ni, Se, Si, Zn, and pH (Reeves III and Smith, 2009).These authors found values of R 2 of 0.85 and 0.09 for P in the MIR and NIR spectra, respectively.Bogrekci and Lee (2007) found coefficients of determination from 0.76 to 0.93 in the near-infrared region using PLSR for total P and P extracted by Mehlich-1, respectively, and values between 0.61 and 0.83 for water-soluble P using visible region of the spectrum.
Considering what was affirmed by Bellon-Maurel and McBratney (2011) regarding where it would be preferable to evaluate the RPIQ index on the RPD, this study established satisfactory models (RPIQ and/or RPD superior to 1.4) and showed excellent (RPIQ higher than 2.0) prediction capability for P, OC, and clay, which can be used to classify soils according to their properties, as suggested by Shepherd and Walsh (2002).
In the 1980s and 1990s quantification surveys were performed with only a few wavelengths (more specific, and varying according to the purpose, but always within the range of 1 to 30 bands) since there was no software powerful enough to analyze large amounts of data.After 2000, with the advent of new and powerful computers and programs, Rev Bras Cienc Solo 2018;42:e0170413 this has become possible, and now the PLSR system can analyze many simultaneous wavelengths (e.g., 2,500 bands), and it evaluates which wavelengths make higher or lower contributions in quantification of a given element.This upgrade in software and computers has allowed a great leap in quantification methodologies, as it analyzes all bands and gives weight to each one, thereby disarticulating the effects of collinearity.This caused an increase in the number of articles published in the field of chemometrics, as reviewed by Nocita et al. (2015).

CONCLUSIONS
Clay, carbon, and extractable P elements were able to be quantified with R 2 parameters in the range of 0.69, 0.65, and 0.57, respectively.
The spectral range of Vis-NIR was the best for clay and P, whereas that of MIR was best for organic carbon.The unification of all bands produced an increase in R 2 for the clay and P properties in relation to the MIR range.
The MSC, CR, and SNV preprocesses were the most efficient for predicting clay, OC, and P, respectively, whereas the PLSR (OC and P) and SVM (clay) methods gave the best predictions and, thus, are recommended for modeling these properties in the study area.

Figure 1 .
Figure 1.Important variables for clay prediction in the Vis-NIR (Visible and Near Infrared) spectrum using Multiplicative Scatter Correction (MSC) preprocessing and the Support Vector Machine (SVM) model.

Figure 2 .
Figure 2. Important variables for prediction of organic carbon in the MIR (Medium Infrared) spectrum using Continuous Removal (CR), Preprocessing, and the Partial Least Squares Regression (PLSR) model.

Table 2 .
Values of R 2 , RMSE, RPD (ratio of performance/prediction to deviation), and RPIQ (ratio of performance/prediction to interquartile range) for the clay property found for the different models and preprocessing methods in the MIR, Vis-NIR, and Vis-NIR-MIR spectra Continuum Removal; MSC = Multiplicative Scatter Correction; SGD = Savitzky Golay Derivative; SNV = Standard Normal Variate; PLSR = Partial Least Squares Regression; ANN = Artificial Neural Network; RF = Random Forest; GPR linear and radial = Gaussian Process Regression; SVM = Support Vector Machine; MIR = Medium Infrared; Vis-NIR = Visible and Near Infrared; Vis-NIR-MIR = Visible, Near, and Medium Infrared.