Estimate of carbon stock in the soil via diffuse reflectance spectroscopy (vis/nir) air and orbital remote sensing

ABSTRACT Current procedures for determining soil organic carbon (SOC) content are costly, time-consuming, and generate polluting chemical waste. Therefore, developing new protocols using aerial and orbital remote sensing and diffuse reflectance spectroscopy (DRS) for digitally mapping the stock of soil organic carbon (CS) is essential for promoting actions of research and monitoring SOC in Brazilian soils. Given this, three areas of commercial plots in the region of the Middle North of Mato Grosso were studied, where sampling was carried out for the determination of SOC in the layer from 0 to 30 cm, evaluated by the dry combustion method and estimated through DRS in the visible to near -infrared region - Vis-NIR-SWIR/350-2500 nm). To obtain the images by aerial remote sensing, the Carcará II® Unmanned Aerial Vehicle was used, with a MicaSense® multispectral camera (RGB + NIR + RedEdge) attached. The orbital sensors used were the Sentinel 2® and Planet® satellites. This study showed that soil carbon stock values could be predicted using different modeling approaches based on field and laboratory spectral measurements. Predictive models to estimate SOC can be established using remote and near sensing, thus allowing a better understanding of spatial patterns of SOC in crop fields.


INTRODUCTION
Faced with a scenario of increasing greenhouse gases (GHG) and global warming, there is a growing international interest in improving soil management to increase organic carbon (SOC), thus contributing to climate change mitigation.International initiatives such as '4 p 1000', launched during CPO 21 with the aspiration to increase global stocks of soil organic matter by 0.4% per year as compensation for global greenhouse gas emissions from anthropogenic sources, have brought to the fore the importance of accurately estimating the soil carbon stock (MINASNY, et al., 2017).The challenge posed by global warming has imposed an environmental agenda based on agricultural practices that aim to improve the resilience of global agroecosystems and will be decisive in future multilateral agreements involving the ability of soils to sequester carbon.According to the Intergovernmental Panel on Climate Change (IPCC, 2022), it is estimated that the global implementation of best agricultural and livestock production practices can provide 20% to 40% of the mitigation of GHG emissions to meet the objective of the Agreement of Paris, which is to limit global warming to between 1.5°C and 2°C by the end of the next century (IPCC, 2022).
To define which best practices should be adopted, it is first necessary to accurately estimate the amount of carbon that soils can store.The method of quantifying the SOC via dry combustion is considered an international reference, however, it demands considerable financial resources since the elemental analysis equipment has a high cost of acquisition and maintenance.VERRA, one of the main certifiers of carbon projects for carbon credits, internationally recognized, in its VM0042, enables the use of emerging technologies for quantification of SOC via remote sensing, as long as the uncertainties of the measurements are known (VERRA, 2020).
Although technologies to accurately measure soil C concentrations and stocks are already available in some parts of the world (ANGELOPOULOU et al., 2019;PAUSTIAN et al., 2019;VISCARRA-ROSSEL et al., 2022), information is often fragmented, and data availability is often limited, there is still a need to develop innovative solutions to help laboratories rapidly characterize soils and adopt quality control with lower costs and ecosystem impacts (SMITH et al., 2012).
Soil spectroscopy emerged as an alternative to wet chemistry and has already proven a reliable tool for determining soil organic carbon.However, soil laboratories still do not widely use this technology in their routines, mainly due to the lack of standards and protocols, spectral libraries, and professionals specializing in chemometrics to estimate robust spectral methods (POPPIEL et al., 2022).The accuracy of the spectral models ranges depending on some factors such as: soil characteristics, quantity, and sampling amplitude of environmental covariates of the samples used in the calibration and validation of the models (DEMATTÊ et al., 2016;RAMIREZ-LOPEZ et al., 2014;WIGHT;ASHWORTH;ALLEN, 2016).Among such models, aerial and orbital SR has become a substantial alternative for planning and rational use of the Earth's natural resources (ALVARENGA et al., 2005), being used to monitor forests, combat deforestation, and increase agricultural yield.Associated with geoprocessing techniques, studies to estimate plant biomass and SOC mapping stand out (DUNCAN et al., 2018;FATOYINBO et al., 2018;BONFATTI et al., 2016;ANGELOPOULOU et al., 2019;BANGELESA et al., 2020).
Thus, there is a need to use techniques that can measure the amount of SOC stored in the soil on a large scale, with good accuracy and low cost.In this context, the present study aimed to evaluate the potential of estimating soil carbon from aerial, orbital, and near remote sensing techniques in areas of commercial plantation in the state of Mato Grosso.

Study area
The present study was conducted in the State of Mato Grosso, in the Center-South and Mid-North macro-regions, in the municipalities of Diamantino (Aterrado farm) and Lucas of Rio Verde (Capuaba and Palminha farms) (Figure 1).Aterrado farm is located in the municipality of Diamantino (14º24'31'' S, 56º26'46'' W, and altitude of 269 m).The area has been cultivated under the succession system, with soybean in the first crop and corn in the second crop in the last four years.In 2019, second-crop corn was cultivated intercropped with brachiaria (Urochloa genus).The soil in the sampled area is a Latossolo Vermelho Amarelo Distrófico típico with medium texture, according to SIBS (SANTOS et al., 2013).Capuaba (13°15'8.47"S, 56°04'52.58"W, altitude of 425 m.) and Palminha (13°27'2.37"S,56°04'39.56"W,altitude of 437 m.) farms are located in the municipality of Lucas of Rio Verde, geographically very close, and both are based on soils with more limited drainage, classified according to the SIBCS (SANTOS et al., 2013), as Latossolo Vermelho-Amarelo distrófico plintossólico with very clayey texture (Capuaba farm) and clayey texture (Palminha farm).At Palminha farm, the area has been cultivated under the succession system, with soybean in the first crop and corn in the second crop in the last four years.(Figure 1B).Capuaba farm (Figure 1C) has more compatible management with sustainable practices.The no-tillage system with crop rotation has been established for many years.(Figure 2).According to the Köppen classification, the climate is humid tropical (Awtype), with an average temperature of 24.4 °C and an average annual rainfall of 1791 mm (ROCHA et al., 2018).
The study areas have distinctive characteristics, covering different levels of organic matter, fertility, and soil texture (Table 1).These areas were chosen to contemplate greater variability of soil attributes.

Sample collection and design
In the cultivated area, sampling for SOC analysis was carried out in the 0-30 cm layer at a sampling soil bulk density of 1 sample/5ha.The number of samples varied depending on the size of the area of each farm, Aterrado farm -30 samples, Palminha farm -31 samples, and Capuaba farm -26 samples, the collection of these samples was carried out in December 2020 during the soybean cycle (V6 to V8), concentrating the sampling between the rows of the plants.For the regional DRS model of Vis-NIR, another set of 149 samples from plots neighboring the study area of Aterrado farm, collected and analyzed in 2019, were used, and for the general model, another set of samples (150) of different locations in the state of Maranhão, collected and analyzed in early 2021.Reference samples were collected at the same depth, in triplicate, in the native forest (reference) of the respective study areas (three points per farm), and these samples were used to correct the soil layer by soil bulk density.The disturbed samples were collected with a Dutch auger, homogenized in a plastic bucket, and packed in duly identified plastic bags.Subsequently, they were dried in the shade at room temperature (air-dried fine earth), crushed, and sieved through a 2 mm mesh for quantification of the SOC by the dry combustion method, carried out in the elemental analyzer LECO ® and later in the reflectance spectrometer Veris VIS-NIR (Veris Technologies).The undisturbed samples were collected with a 100 cm 3 ring (Kopeck) sampler in the center of the layer.Subsequently, they were dried in an oven at 105ºC for 48 hours to determine soil bulk density (Db) (Equation 1).
The carbon stock in the soil was calculated according to the equation below, correcting the soil layer by the nearest native vegetation soil bulk density (Equation 2).

Spectral Measurements in the Laboratory
The Vis-NIR reflectance spectrometer used was the Veris 3150 ® (Veris Technologies Inc., Salina, Kansas, KS/ USA ), capable of acquiring soil reflectance information in two bands of the electromagnetic spectrum: visible range (VIS) 350 and 700 nm and near infrared (NIR) 700 and 2223 nm, with a spectral resolution of 8 nm.Readings were taken in bench mode.For calibration, reference plates of the equipment were used.The equipment was calibrated every 50 samples or when there was a sign of failure in the reading (red or orange).

Air Remote Sensing
An Unmanned Aerial Vehicle (UDAV) Carcará II ®  = (1) from Santos Lab was used to obtain images by aerial remote sensing, coupled with a MicaSense RedEdge multispectral sensor has five bands in the blue, green, and red spectrum ranges, rededge and infrared (NIR).The flight to obtain the images was performed in the soybean crop in the vegetative stages between V8 and R1.

Orbital Remote Sensing
The orbital sensors used were the Sentinel 2 ® and Planet ® satellites.The Sentinel 2 ® satellite has a multispectral sensor with 13 spectral bands of high and medium spatial resolution (10, 20, and 60 m) and 12 bits of radiometric resolution (ESA, 2022), and high temporal resolution (10 days or 5 days with two satellites), which ensure the continuity of data needed for general land monitoring (VAN DER MEER; VAN DER WERFF; VAN RUITENBEEK, 2014).The PlanetScope constellation has around 120 satellites generating images of the planet daily.This satellite has an average spatial resolution of 3 m, a radiometric resolution of 12 bits, and offers 4 spectral bands in the visible and infrared (PLANET, 2022).The acquisition of Planet ® images was through the free availability of Planets education and research program (PLANET 2022).

Image processing
Eighty-seven (87) Planet ® images were acquired for each study area, and 52 images from the Sentinel ® 2 satellite, covering the period from February 2018 to December 2020.The interval of satellite images was approximately 10 days, depending on the availability of images without clouds.From the orbital images, the Normalized Difference Vegetation Indexes (NDVI) of the pixels of each sampling point were extracted to generate mathematical models for predicting carbon in the soil.
The study areas were flown over with Carcará 2® during the 2020/2021 soybean harvest, in the vegetative stages from V8 to R1.To carry out the modeling, images in the red, green, blue, NIR, and RedEdge bands were used.After processing the images and preparing the orthomosaics, data were extracted at the soil collection points for the five bands and the following vegetative indexes: Normalized Difference Vegetation Index (NDVI), Green Normalized Difference Vegetation Index (GNDVI), Normalized difference red edge index (NDRE), Chlorophyll Index -Red-Edge (CIRE), Atmospherically Resistant Vegetation Index (ARVI), Visible Atmospherically Resistant Index (VARI), and Green Chlorophyll Index (GCI).

Calibration and Validation of models
Seventy percent (70%) of the traditional and spectral analysis samples were destined for calibration and 30% for validation.Electromagnetic spectra in the Vis-NIR and SWIR wavelength range were extracted from the Veris 3150 ® Vis-NIR reflectance spectrometer.In the calibration stage, different pre-processing and modeling methods were performed to obtain models with the best adjustments and accuracy.Models were then generated for each Farm in particular, called the regional set, composed of samples from all three Farms and a general set corresponding to samples from the three areas and the spectral library of the company Santos Lab.The models were generated using the Alrad spectra graphical interface (DOTTO et al., 2019) in the R software (R CORE TEAM, 2017).The tests were performed with the "Support Vector Machine" (SVM), "Gaussian Process Regression" (GPR), "Partial Least Square Regression" (PLSR), and "multiple linear regression" (MLR) models with a cross-validation system.The relationship between observed and predicted values was analyzed by a coefficient of determination (R 2 ), root mean square error (RMSE), and Ratio of Performance to InterQuartile distance (RPIQ) indexes.
The RMSE calculates the average error between the observed and predicted values, the smaller the values for this variable, the better the models.High RPIQ values indicate a strong concentration of estimates around conditional means.The R² indicates how much of the total variation of the response variable can be explained by the predictive variables that make up the predictive model.Sayes, Mouazen and Ramon (2005) classified models based on R² values, models with R² between 0.50 and 0.65 make it possible to identify high and low concentrations, R² between 0.66 and 0.80 are acceptable, R² from 0.81 to 0.90 are good, and above 0.90 are models with excellent prediction.

RESULTS AND DISCUSSION
The results in Table 2 showed that the SOC values ranged from 5.7 g.kg −1 (crop field) to 43.6 g kg −1 (reference), with an average value of 16.2 g kg −1 .For the CS, values ranged from 20 (crop field) to 98.1 Mg ha -1 (reference), with an average value of 46.8 Mg ha -1 The values found are consistent with those found in other studies in Brazil (BAYER et al., 2006;MAIA et al., 2010;MIRANDA et al., 2016).

Variable
Aterrado The results of the statistical parameters of each model for each area can be seen in Table 3.It is noted that there was variation in the results for the sample sets and studied variables.This variation in statistical parameters is due to the different characteristics of each sample group that influence both the spectral curve and the prediction models (VISCARRA-ROSSEL et al., 2016).Thus, a study developed in South Africa defends the idea that to be used in soil attribute predictions, it is necessary to have a good calibration with the conventional method, generally requiring a large sample set for the method to be effective ( VAN VUUREN;MEYER;CLAASSENS, 2006).
Using the partial least squares model (PLSR), of the five sample sets, three obtained good results for SOC in g kg -¹ (Regional and General), one for soil bulk density (Db) (Regional), and one for soil organic carbon stock (CS) (Aterrado Farm).Summers et al. (2011), also using PLSR, obtained R² values of 0.57 for SOC at validation, and 0.68 for a percentage of SOC, with a sample set of 303 samples of various types of soil in Rio Grande do Sul (DOTTO et al., 2014).Such results are similar to those obtained in the present study (R²= 0.67), which used a regional set of 322 samples.
The model presented R² above 0.5 for Palminha farm for the Regional and General sets.For Db, the model was reasonable for the General and Regional sets.For CS, only Aterrado farm reached values of R² above 0.5.The MLR model is considered the simplest form of linear regression analysis, proving effective for determining soil organic  2012) mentioned, reaching an R² of 0.74 with 164 samples.In the present study, an R² of 0.90 was obtained for Capuaba Farm for calibration and 0.54 for validation, using 28 samples, while for Aterrado farm, an R² of 0.72 was obtained for calibration and 0.81 for validation with 33 samples.Viscarra-Rossel and Behrens (2010), using 1104 soil samples from Australia, reached R² of 0.81, values higher than those found in this study.
The SVM model had good results for the General and Regional set for SOC in percentage (R²cal 0.73 and 0.77; R²Val 0.62 and 0.67), respectively, and for SOC in g kg -¹ (R²cal 0.73 and 0.76; R² Val 0.62 and 0.68), respectively.The SVM model had good results for the percentage of SOC for Capuaba farm (R²cal 0.74 and R²Val 0.87).Boser et al. (1992) proposed that the Support Vector Machine (SVM) is a nonlinear method widely used in multivariate classification and calibration problems (KOVACEVIC et al., 2009).In general, the GPR model had better development for predicting the percentage of SOC than the other models.This model also had satisfactory results for SOC in g kg -¹ and Db in the regional and general sets.The GPR model allows obtaining non-parametric regressions (DOTTO et al., 2018), equivalent to interpolation by kriging in geostatistics, but it does not use information from geographic coordinates but from spectral data (RAMIREZ-LOPEZ et al., 2013).
As the GPR model showed the best performance for both farms for % SOC, a comparison was made between the predicted carbon and elemental analyzer results (Figure 3A).At the Palminha farm, the model obtained a lower R 2 (0.49) than the others and, therefore, a lower predictive capacity, overestimating the % SOC values.On the Aterrado farm, the results of % SOC showed greater similarity with the measured values and less variation (Figure 3B), this is due to the better distribution of values on the perfect fit line, the smaller square root of the mean error, as well as the various carbon footprint in that area.For the Capuaba farm (Figure 3B), the % SOC values were underestimated to a greater degree than the Aterrado farm.An advantage of the GPR model over other machine learning approaches is that it models both the expectation and the random variable, thus making it possible to map the prediction uncertainty.In addition, it also allows finding the noise of the input data and processing them to avoid overfitting (BALLABIO et al., 2019).Dotto et al. (2019), when using the GPR model, obtained R² of 0.81 in calibration and 0.73 in validation for SOC and concluded that GPR needs to be considered a SOC prediction method for Vis-NIR spectroscopy.
The calculation of the bias for the GPR model showed a result of -1.64, 7.67, and 14.10 for the Capuaba, Aterrado, and Palminha farms, respectively, where the greatest bias for the Palminha farm indicates the greatest error, which may explain the lack of coincidence between the maps with the predicted and observed values.The dispersion of the observed and predicted values when each model was applied only to the spectral curves of the respective observed values can be observed for each study area in Figure 4  In general, there were several spectral ranges of importance, justified by the fact that the SOC comes from organic matter and has an absorption in the entire spectrum range (BEN-DOR; BANIN, 1995), being associated with hydroxyl groups, sulfates, carbonates, and combinations of water and carbon dioxide existing in the organic matter composition (CLARK; RENCZ, 1999).
The most important spectral bands of the study were from 400 to 800 nm and 1400 to 1800 nm.Viscarra-Rossel et al. ( 2016) observed a correlation in the spectral ranges between 500 and 850 nm with the SOC, corroborating the study.Absorptions around 1400 nm may be related to the water molecule in clay minerals and linked to cellulose in organic waste (MOURA-BUENO, 2018).Jiang et al. (2017) state that regions between 400-800, 1900, and 2000-2350 are important in estimating SOC by Vis-NIR, as observed in this study, where the range from 1400 to 1800 was decisive for predicting SOC in the area of Aterrado Farm, which may explain the greater accuracy of the maps for this area.

Orbital Remote Sensing
In general, the remote orbital sensors used in this study (Planet ® and Sentinel 2 ® ) showed lower performance for predicting SOC and CS when compared to Vis-NIR.Such results are expected, considering the distance between the satellite and the earth's surface, the atmosphere layer, and electromagnetic radiation dissipating agents (DEMATTÊ; TOLEDO; SIMÕES, 2004).
For the Planet ® satellite, the Capuaba farm area obtained two satisfactory R² results from the MLR models for carbon in percentage and PLSR for CS.The other areas of the other two farms did not present reliable results for predicting the studied variables (Table 4).For the Sentinel 2 ® satellite, Capuaba farm obtained R² of 0.69 and 0.50 for calibration and validation in the PLSR model, respectively.Aterrado Farm presented satisfactory results for the GPR and SVM models for SOC in g kg -¹ (Table 5).
Satellite images have been used for various scientific or environmental applications (HUANG; ROY, 2021).Asner et al. (2017) used the Planet ® satellite to classify land cover, while Csillik et al. (2019) used it to map CS in a tropical forest.More accurate results were observed for SOC in the soil by using hyperspectral images, with a neural network and multivariate regression, but evaluating the soils without the plant component, that is, exposed soil (MCCARTY et al., 2010).However, in agricultural areas of the Cerrado, soil exposure has been an infrequent practice since the 2000s.Thus, the results of the present study, even with lower accuracies than those performed with exposed soils, show the potential of NDVI time series (of the canopy) in reflecting the CS in the soil.Zhang et al. (2019) obtained prediction accuracies similar to those presented in the present study and better prediction performances when using NDVI time series instead of single data (specific dates).This finding highlights the importance of CS for the vegetative vigor of plants and, consequently, the ability of this indirect parameter to explain CS variations in the soil.
Jaber and Al-Qinna (2011) also obtained low accuracy in the prediction of SOC with orbital remote sensing, and the authors attributed this result to the weak correlation between the total carbon and the reflectance of the satellite bands.The soil bulk density of samples used in generating the models is also a possible reason for the low accuracy of the models (GUO et al., 2018).
He et al. ( 2021), working with a Sentinel 2 ® satellite and using various vegetative indexes, found the best prediction with an R² of 0.38.Wiesmeier et al. (2019) found R² of 0.21, Guo et al. (2021) obtained R² of 0.48 in calibration and 0.17 in validation using the PLSR model.All these results demonstrate similarity with those found in this study.

Air Remote Sensing
The airborne remote sensor used in this work presented satisfactory results only for the regional group with the SOC variable (g kg -1 ) with the SVM model (Table 6).The better performance of the SVM and GPR models in the regional group can be explained by the greater amount of data that make up the model and the greater variation in the values of the covariates.Gilliot et al. (2017), flying over an area of 13 hectares in Versalhes with exposed soil and using a multispectral camera, obtained R² from 0.80 to 0.90 for SOC.These results indicate that, with time constraints of information for airborne sensors, flights in ground moments exposed provide better predictive capabilities for SOC mapping.This research demonstrates the need for further studies to predict SOC by orbital and aerial remote sensing.Close sensing through Vis/NIR spectroscopy showed satisfactory results for most sets and models for each area under study, demonstrating to be equipped with a high level of reliability, with greater accuracy when used with regional or general databases since the greater variability in soil characteristics (mineralogy, particle-size, carbon content, among others) allows for greater amplitude in the models.

CONCLUSIONS
This study emphasized the importance of using remote sensors for predicting SOC and CS, identifying some vulnerabilities in predicting SOC and CS from airborne and orbital remote sensors.SOC estimation with different modeling techniques and various pre-treatment algorithms shows that the predictive ability of spectral data can be improved depending on the machine learning algorithm used and the remote sensors used.There is enormous potential in this field, with immediate applications of techniques that allow maximizing mapping and monitoring of carbon stock in the soil.

Figure 2 .
Figure 2. Crop rotation cycle of the last four harvests at Capuaba farm.
O. C. O. FARIA et al.
ESTIMATE OF CARBON STOCK IN THE SOIL VIA DIFFUSE REFLECTANCE SPECTROSCOPY (VIS/NIR) AIR AND ORBITAL REMOTE SENSING O. C. O. FARIA et al.
O. C. O. FARIA et al. carbon, as Bayer et al. (

Figure 3 .
Figure 3. Soil organic carbon variation maps predicted by the GPR model of the three farms (A: Palminha, B: Aterrado, C: Capuaba) compared to that measured by the dry combustion method.
, where the estimated values above the observed values indicate an overestimation of the model and below it indicates an underestimation of the model.

Figure 4 .
Figure 4. Data dispersion concerning the perfect fit line.
Organic Carbon; SP: Statistical Parameter CS: Soil Organic Carbon Stock; Cal: calibration; Val: validation; Cap: Capuaba; Pal: Palminha; Reg: Regional GPR: Gaussian Process Regression SVM: Support Vector Machine; MLR: multiple linear regression.Biney et al. (2021), using a Micasense camera with two sensors (RGB + Multispectral) with the last band being a thermal infrared sensor, coupled to a fixed-wing aircraft, found R² ranging from 0.11 to 0.29 using an SVM model and bank of data composed of local samples.The results obtained by Biney et al. (2021) are similar to those found in the present study when implementing modeling with restricted or local databases.
O. C. O. FARIA et al.

Table 1 .
Contents of nutrients and particle-size composition of soils in the study areas.

Table 3 .
Coefficient of determination (R²), root mean square error (RMSE), and ratio of performance to interquartile distance (RPIQ) for samples analyzed in Vis-NIR.
ESTIMATE OF CARBON STOCK IN THE SOIL VIA DIFFUSE REFLECTANCE SPECTROSCOPY (VIS/NIR) AIR AND ORBITAL REMOTE SENSING O. C. O. FARIA et al.
SOC: Soil Organic Carbon; D b : Soil Bulk Density; CS: Soil Organic Carbon Stock; Cal: calibration; Val: validation; Cap: Capuaba farm; Pal: Palminha farm; At: Aterrado farm; Reg: Regional; Ge: General; GPR: Gaussian Process Regression; SVM: Support Vector Machine; PLSR: Partial Least Square Regression; MLR: multiple linear regression.ESTIMATE OF CARBON STOCK IN THE SOIL VIA DIFFUSE REFLECTANCE SPECTROSCOPY (VIS/NIR) AIR AND ORBITAL REMOTE SENSING O. C. O. FARIA et al.

Table 4 .
Coefficient of determination (R²), root mean square error (RMSE), and ratio of performance to interquartile distance (RPIQ) for samples analyzed on the Planet ® orbital remote sensor.Mossoró, v. 36, n. 3, p. 675 -689, jul.-set., 2023 ESTIMATE OF CARBON STOCK IN THE SOIL VIA DIFFUSE REFLECTANCE SPECTROSCOPY (VIS/NIR) AIR AND ORBITAL REMOTE SENSING O. C. O. FARIA et al.

Table 5 .
Coefficient of determination (R²), root mean square error (RMSE), and ratio of performance to interquartile distance (RPIQ) values for samples analyzed on the Sentinel 2 ® remote orbital sensor.

Table 6 .
Coefficient of determination (R²), root mean square error (RMSE), and ratio of performance to interquartile range (RPIQ) values for samples analyzed in the airborne remote sensor.