Estimation of soil phosphorus availability via visible and near-infrared spectroscopy

Spectroscopic techniques have great potential to evaluate soil properties. However, there are still questions regarding the applicability of spectroscopy to analyze soil phosphorous (P) availability, especially in tropical soils with low nutrient contents. Therefore, this study evaluated the possibility to estimate P availability in soil and its pools (labile, moderately labile and non-labile) via Vis-NIR spectroscopy based on intra-field calibration. We used soils from two different locations, a plot experiment that received application of phosphate fertilizers (Field-A) and a cultivated field where a grid soil sampling was performed (Field-B). We used the technique of diffuse reflectance in the visible and near-infrared (Vis-NIR) to obtain the spectra of soil samples. Predictive modeling for P availability and labile, moderately labile and non-labile pools of P in soil were obtained via partial least squares (PLS) regression; classification modeling was performed via Soft Independent Modeling of Class Analogy (SIMCA) on three P availability levels in order to overcome the limitation on quantifying P via Vis-NIR spectroscopy. We found that isolating P contents as the only variable (Field-A), Vis-NIR spectroscopy does not allow estimating P pools in the soil. In addition, quantification of P available in the soil via predictive modeling has limitations in tropical soils. On the other hand, estimating P content in soil through classes of availability is a feasible and promising alternative.


Introduction
Phosphorous (P) supply to plants is an essential factor to ensure proper crop development and high yields (Ziadi et al., 2013). In tropical soils, such as Oxisols, phosphate fertilization generally requires attention due to adsorption reactions of the element with soil constituents, which become a drain instead of a source of P for plants (Barbieri et al., 2009;Novais and Smyth, 1999). The study on P availability for plants can be accomplished through its division into fractions, providing knowledge on soil dynamics, guiding soil fertilization in an economical and environmental manner (Cross and Schlesinger, 1995). The methodologies commonly used are based on sequential extractions with chemical reagents (e.g. Hedley fractionation -Hedley et al., 1982); however, this laboratory procedure is time-consuming, expensive, laborious and easily susceptible to mistakes (Cécillon et al., 2009).
In this context, Vis-NIR spectroscopy is a promising approach to evaluate soil properties, because a single measurement allows inferring several properties (Nocita et al., 2015;Soriano-Disla et al., 2014;Wetterlind et al., 2008). However, its use has limitations to estimate nutrients availability in the soil, since it is a complex and variable matrix involving interactions between mineral and organic materials (Mouazen et al., 2007;Shepherd and Walsh, 2007). Furthermore, results on prediction of P availability via Vis-NIR spectroscopy are not yet sufficient to ensure satisfactory performance in most cases (Abdi et al., 2016;McCarty and Reeves, 2006;Terra et al., 2015). Therefore, questions remain regarding applicability of Vis-NIR spectroscopy to analyze P in the soil, especially in tropical soils with low content of organic P, due to its low P availability and the complexity of P interactions with the minerals in this soil type.
In this context, Niederberger et al. (2015) obtained excellent results predicting labile, moderately labile and non-labile pools of P in soils, arguing for the potential of this approach. However, the authors used a soil classified as Entisol (Soil Survey Staff, 2003), characterized by a low-to-medium weathering level of the parent material, which favors greater availability of P compared to Oxisols. In addition, as the soil was sampled along an area of 8,100 ha, the samples had different physicochemical properties. Thus, the objective of our study was to evaluate the possibility to estimate soil P availability and its pools (labile, moderately labile and non-labile) in Oxisoil via Vis-NIR spectroscopy, including the use of a model classification technique based on intra-field calibration.

Soil sampling and laboratory analyses
The soil samples used in this study were collected from two experimental sites of commercial cultivation of sugarcane. The first site (Field-A) was located in Agudos, SP (22°33'22" S, 49°06'15" W, 715 m altitude), with the soil classified as Arenic Ustox, according to the Soil Taxonomy System (Soil Survey Staff, 2003), and described as a Dystrophic Red-Yellow Latosol (Latossolo Vermelho Amarelo Distrófico) according to the Brazilian Soil Classification, with sandy loam texture. Soil sampling was carried out in a small area of the field, totaling 1.5 ha, where a plot experiment was installed to test phosphate fertilizers (variation of sources and rates of triple phos-Note Soil and Plant Nutrition Spectroscopy for soil P estimation Sci. Agric. v.77, n.5, e20180295, 2020 phate or natural phosphate from Gafsa, whether or not associated with filter cake from the production of ethanol). The samples were collected from sugarcane planting rows at depths: 0-0.1 m; 0.1-0.2 m and 0.2-0.4 m, resulting in 90 samples. The variation of fertilizer sources and rates and sampling depths aimed to obtain high variations of P contents in the soil fractions studied, providing proper predictive modeling via spectroscopy.
The second site (Field-B) was located in Tabatinga, SP (21°38'6" S, 49°39'7" W, 490 m altitude), with two soils classified as Ustox, according to the Soil Taxonomy System (Soil Survey Staff, 2003); according the Brazilian Soil Classification, these soils are described as Dystrophic Red-Yellow Latosol (Latossolo Vermelho Amarelo Distrófico) and Dystrophic Red Latosol (Latossolo Vermelho Distrófico), with sandy and clay loamy texture. Samples were collected at depth 0-0.2 m, regularly spaced at every 100 m, resulting in 238 samples. This spatial grid allowed obtaining samples with high variability in all soil properties, not only in the P content.
For soil samples from Field-A, besides the usual chemical analysis performed in a commercial laboratory, where available P is quantified via the extraction method with anion exchange resin   (Table  1), sequential extraction was performed to determine P fractions in the soil according to its availability, as proposed by Hedley et al. (1982), with modifications made by Condron et al. (1985). The extraction generates results of P quantification that were grouped into three pools: labile (organic and inorganic labile fractions extracted via NaHCO 3 0.5 mol L -1 ), moderately labile (organic and inorganic moderately labile fractions extracted via NaOH 0.1 mol L -1 and HCl 1.0 mol L -1 ) and non-labile (organic and inorganic non-labile fractions extracted via NaOH 0.5 mol L -1 ). All results are expressed as mg dm -3 , the standard procedure at Brazilian soil laboratories, with P measurements based on the soil volume rather than the soil mass. Regarding particle size, only one sample was collected in the experimental size (size of 900 m 2 ) for soil characterization, resulting in 859 g kg -1 of sand, 16 g kg -1 of silt and 125 g kg -1 of clay. The results for P pools in the soil from Field-A (Table  1) were used only for predictive modeling, due to the reduced number of samples (n = 90).
For the soil from Field-B, the results of available P obtained via the anion exchange resin method   (Table 2) were used for both predictive and classification modeling procedures.

Vis-NIR Soil Spectroscopy
For the analyses via spectroscopy, the first step was sample preparation. Samples from Field-A were dried at      . Asym. = asymmetry; σ = standard deviation; CV% = coefficient of variation; OM = organic matter; CEC = cation-exchange capacity; BS = base saturation. Spectroscopy for soil P estimation Sci. Agric. v.77, n.5, e20180295, 2020 45 °C and sieved through 2 mm mesh (Nanni and Demattê, 2006;Udelhoven et al., 2003;. Samples from Field-B were dried at 45 °C and sieved through 0.25 mm mesh for greater homogeneity and avoid soil texture influence, since the sampling was performed within a whole field; moreover, this procedure allowed comparison of performances with the study conducted by Niederberger et al. (2015). Furthermore, spectra were collected via diffuse reflectance technique using the Fieldspec 4 Standard-Res Spectroradiometer (Analytical Spectral Devices Inc., Boulder, Colorado, USA), which operates within 350-2500 nm range, 3.0 nm with spectral resolution for 350-1000 nm and 10.0 nm for 1001-2500 nm, providing a 1.0 nm resolution via software correction, recorded in absorbance values. We used an accessory for soil readings (MugLight) with its own light source (100 W halogen bulb).
Each soil sample was divided into three parts. Three spectra were obtained from each part, resulting from the mean of ten scans. At the end, the mean of nine spectra for each sample was calculated. The reading of the white reference (Spectralon®) was conducted every 15 min.

Data analysis and modeling
In the modeling stage, the first procedure was the division of training and validation sets. To that end, 70 % of samples were selected for training and 30 % for external validation, resulting in 63 calibration samples and 27 validation samples for Field-A and 167 and 71 samples for Field-B. This procedure was performed using the Kennard-Stone algorithm (Kennard and Stone, 1969), ensuring the homogeneous and representative selection of both sets based on the spectra.
For P quantification (predictive modeling), we used the partial least squares (PLS) regression, having as response variables P pools in the soil (Field-A) and P content available in the soil (Field-B). Due to the low performance in P prediction, we also divided the P content into three levels of availability. We used Field-B for this classification, due to the larger number of samples. We conducted the classification via Soft Independent Modeling of Class Analogy (SIMCA), with the classes: low (0-12 mg dm -3 ), medium (13-30 mg dm -3 ), and high (>30 mg dm -3 ) P content availability. These intervals were determined based on the table of interpretation limits of P contents in soils described by Raij et al. (1997).
The soil spectra were mean-centered (MC) and preprocessed to correct non-linearity, scattering, particle size effect, baseline and noise (Stenberg et al., 2010). To that end, pre-processing procedures were tested separately and/or simultaneously through multiplicative scatter correction (MSC) and standard normal variate (SNV), first (1D) and second (2D) derivatives, and Savitzky-Golay smoothing (SMT). The last three methods were applied with windows ranging from 5 to 25 points at intervals of 2 points. The number of PLSR components in each model, as well as the better preprocessing of the spectra, was defined via the leave-one-out cross-valida-tion procedure, seeking a high coefficient of determination (R 2 ), and low root mean square error of prediction (RMSE). The selected pre-processing procedures were MSC + 1D (window of 25 points), with 7 components to the response variable P available (Field-B), none preprocessing with 5 components for labile P and with 7 components for non-labile P, and MSC with 3 components for moderately labile P (Field-A). In the classification procedure, 2D (7 point-window) was used. Finally, the quality of predictive models for the external validation dataset was evaluated through parameters R 2 and RPD (Ratio of Percent Deviation) (Williams, 1987). Detection and quantification limits were defined, respectively, as the lowest content that could be detected by certain equipment and lowest content that the method is capable of quantifying (Rambo et al., 2013). These values were calculated according to the equations proposed by Shrivastava and Gupta (2011). Classification models were evaluated according to the figures of merit: Selectivity, Sensitivity, Accuracy, False Positive Rate (FPR) and Kappa coefficient.

Quantification of P pools in soil
Quantification of P pools in the soil via spectral data in Vis-NIR regions faces several limitations ( Figure  1). Models obtained for all P pools were classified as E category by the RPD (RPD values shown in Figure 1), that is, with poor and unreliable performance to predict the variables of interest (Viscarra Rossel et al., 2006).
Although Niederberger et al. (2015) obtained high efficiency of prediction models for soil P pools (RPD classified as A or B), we believe that the contrasting results were due to differences in the methodologies used. These authors used data from samples collected in 8,100 ha in China, which allowed alterations in several soil properties from one sample point to another, rather than dealing with changes only in P concentration. This may have aided in the construction of calibration models, since there was variation in other soil properties that directly influence the spectrum while having a specific relation with P pools. Thus, this variation may govern P availability in the soil and influence the spectra obtained, which enables the modeling of P pools. Instead, the soil used in our study was collected in the same field (Field-A), varying only in the experimental plot and, consequently, in P contents. In addition, the low P availability range in samples (Table 1) may have also contributed to the low quality of models. Thus, the Vis-NIR spectroscopy is not recommended for the analysis of P pools in the soil, since it is not sensitive to unique variations in P contents.

Quantification of soil P availability
As sampling of a single plot experiment did not generate satisfactory results, since the only factor that changed between samples was the P content, we test-Spectroscopy for soil P estimation Sci. Agric. v.77, n.5, e20180295, 2020 ed spatial sampling (Field-B), because the variation of other soil properties could allow the creation of predictive models from Vis-NIR spectra (Oliveira et al., 2015). However, this approach also showed limitations. The model was classified as E category by the RPD value (RPD values are shown in Figure 2), that is, with poor and unreliable performance to predict variables of interest (Viscarra Rossel et al., 2006). The main difficulties to obtain good prediction results of available P via spectroscopy are that this fraction is related to the soil solution and chemistry of the soil matrix (Janik et al., 1998) and does not present direct spectral response (Stenberg et al., 2010;Oliveira et al., 2015). Still, Coutinho et al. (2019) argue that available level of P shows relationship with iron and aluminum oxyhydroxides in the soil, allowing its indirect prediction via spectroscopy. Furthermore, low contents found in highly weathered soils (Oxisoil) might compromise even more its detection. The presence of high portion of organic P could improve the spectral response and allow better modeling performance, because organic compounds can be more easily excited by irradiation (Niederberger et al., 2015;Oliveira et al., 2015). However, most tropical soils present low organic matter content, as we found in our study (Table  1) and, therefore, tend to show low portion of organic P, limiting efficiency of soil spectroscopy.
Concentrations very similar or below the detection and/or quantification limits of the spectroscopic technique, which can prevent identification of the property of interest and, consequently, its prediction, resulting in a model with low performance (Shrivastava and Gupta, 2011). In this context, the values obtained for detection and quantification limits were, respectively, 0.4 and 1.2 mg dm -3 . Thus, these limits did not impair the modeling, since most samples showed higher P levels (Table 2).
Regarding the poor performance of the models, it is important to highlight the quality of the reference method. In general, there are uncertainties in soil wetchemical analyses in laboratories to determine nutrient availability. Cantarella et al. (2015) reported in their certification process that the amount of discrepant results for P availability is relatively high (coefficient of variation ~ 16 %) after more than 2,000 determinations originating from 122 laboratories of soil analyses. This deviation in the reference data may mask the actual availability of P in the soil, impairing fertilization prescriptions. This low accuracy of the reference data may explain the poor performance of the predictive modeling. Thus, as the reference values show accuracy limitations, some discrepancy is expected to occur when obtaining quantifications via spectroscopic models (Coûteaux et al., 2003).  Classification models for P available in the soil As the quantification/prediction of available P was unsatisfactory, we invested in the proposal of creating classification models (Field-B). Thus, we verified that the classification of the P content in availability classes is a promising approach. The classification model presented substantial performance (category B) according to the Kappa coefficient (= 0.614), which indicates the agreement between predicted and observed classes (Landis and Koch, 1977). This was corroborated by the classifier accuracy that achieved ~89 %, which means that almost 90 % of the samples (validation dataset) were classified in the correct P availability class.
We used ranges of P availability to establish the classes to facilitate the learning of the models, because they are categorical variables rather than absolute numbers (quantification). Thus, the classification model has to classify samples properly on a few P availability classes, while the prediction model must predict sample values (P availability) accurately. This contributed to better classification of results (substantial performance according Kappa index) compared to the previous predictive modeling (unreliable performance according RPD value). Thus, the use of Vis-NIR spectroscopy is capable of identifying broad classification levels of available P.
The aforementioned classification approach is promising, since soil fertilization could be based on tables of fertilizer prescription, which traditionally indicate the application rate based on the soil availability class of nutrients. The class establishment could be useful for precision agriculture, considering that the creation of variable-rate prescription maps shows several uncertainties (errors) due to data interpolation (Mueller et al., 2004). Therefore, the creation of maps based on few levels of P availability could improve the applicability of this precision agriculture approach, which will be tested by our research group in further studies.

Conclusion
Intra-field quantification of phosphorus availability in soils by Vis-NIR spectroscopy via predictive modeling has limitations in tropical soils with low P content in the organic form.
In addition to low P content, we showed that by isolating P contents as the only variable, the Vis-NIR spectroscopy does not allow estimating P pools in the soil, hindering its use.
On the other hand, the use of Vis-NIR spectroscopy to estimate the soil P content through availability classes is a promising approach, since the prescription for fertilization tends to follow this classification.