Mapping a hydrophysical soil property through a comparative analysis of local and global

: Current available soil information allows building baselines to improve research, such as sustainable resource management; however, its use requires analysis of accuracy and precision that describes specific variables on local and global scales. Therefore, this study evaluated differences in the spatial distribution of water retention capacity (WRC) of the soil at a depth of 0.3 m, calculated from local general soil surveys and the global gridded soil information system (SoilGrids), using detailed or semi-detailed soil surveys as a reference, in two regions of Colombia (A and B). The qualitative and statistical analyses evaluated differences in WRC surfaces generated by the information sources. Neither information sources described WRC accurately, achieving correlations between -0.15 and 0.49 and average absolute errors between 9.65 and 19.52 mm for zones A and B, respectively. However, studies on the local scale remain within the ranges observed in the most detailed local studies. The use of products on the global scale is subject to regional analyses; nevertheless, they can be included as a covariate in digital soil mapping studies on more detailed scales.


Introduction
Soil moisture is a fundamental environmental property for physical and biological processes of terrestrial systems (Dobriyal et al., 2012;Kearney and Maino, 2018;Rodríguez-Iturbe, 2000). Due to its importance for agricultural systems and their production, information on the spatial distribution of the amount of water in the soil is an input for modeling ecohydrological processes on the landscape scale (Selle et al., 2006).
The water storage capacity is defined as the water layer that the soil can store and that the plants can use and it is calculated as the difference between water content at field capacity (FC) and the permanent wilting point (PWP) (Kirkham, 2014). Moreover, this capacity depends on other soil properties, such as texture, bulk density (BD), real density (RD), and organic carbon (OC) (Malagón, 2015;Mikutta et al., 2006), features that establish soil quality and productivity (Burk and Dalgliesh, 2013). However, water content at FC and PWP have been poorly mapped (Wösten et al., 2001;Wösten et al., 2013) due to intensive field and laboratory work that their estimation requires.
Global initiatives, such as GlobalSoilMap.net (Arrouays et al., 2017) and more recently SoilGrids (Hengl et al., 2017), have required identification of soil characteristics at specific depths. These characteristics provide global predictions, with a spatial resolution of 250 m on a specific number of soil properties (e.g., OC, BD, pH, and textural fraction). They are generated through remote sensors using assembled machine learning models. Both initiatives have stressed the need to develop maps based on detailed soil information in order to improve environmental and agricultural planning and monitoring (Ramos et al., 2017;Sánchez et al., 2009).
In Colombia, soil mapping is conventional, mostly products of soil surveys carried out by the Instituto Geográfico Agustín Codazzi (IGAC), which show a series of delineated polygons based on qualitative soil characteristics called cartographic soil units (CSU). They portray soil heterogeneity and describe structural patterns across the landscape (Lin, 2003).
This study evaluated differences in the spatial distribution of water retention capacity (WRC) of the soil at a depth of 0.3 m, calculated based on local general (1:100,000 or 100K) soil surveys and the global gridded soil information system (SoilGrids), using detailed (1:10,000, or 10K) or semi-detailed (1:25,000, or 25K) soil surveys as a reference, in two regions of Colombia.

Study site
Two sites were considered in this study. Area A covers 83,596 ha, corresponding to the flat zones of the Bogota savanna region (Department of Cundinamarca, Colombia), between latitudes 4°38.8' and 5°8.1' N and longitudes 73°47.1' and 74°23.8' W. The average altitude of this area is 2,572 m, with an annual rainfall that varies between 591 and 1,352 mm, and the annual average temperature is between 11.7 °C and 13.7 °C. Geomorphologically, there is predominance of a landscape with plains of lacustrine origin with contributions of volcanic ashes of different degrees of alteration. There are also reliefs of terraces and small valleys, and in lesser proportion there are hills, fans, and accumulation glacis that form a hilly landscape (IGAC, 2012a) ( Figure 1A). Area B covers 45,494 ha that corresponds to the irrigation district of the Zulia River (Department of Norte de Santander, Colombia), located between latitudes 8°4.6' and 8°23.0' N and longitudes 72°22.1' and 72°35.5' W. The area has an average altitude of 78 m, with annual rainfall between 1,500 and 3,000 mm and an average annual temperature between 26.0 °C and 28.0 °C. In this site, two landscapes were identified: 1) rolling hills, whose relief includes hills, soft hills, and small valleys, and 2) valleys, including flood plain relief (active and inactive), small valleys, and alluvial terraces. The parent material comprises fine-to-coarse alluvium in the valley landscape and sedimentary rocks (sandstone and claystone) in the rolling hills landscape (IGAC, 2015) ( Figure 1B).

Software and data
We used software R version 3.3.2 with the RStudio© version 1.1.423 interface and QGIS Desktop© version 2.18.16. The following digital files were used: soil maps on a scale of 1:10,000 of the flat areas of the Bogota savanna region in vector format (IGAC, 2012a), a soil map on a scale of 1:25,000 of the irrigation district of the Zulia River in vector format (IGAC, 2015), two soil maps on a scale of 1:100,000 of the departments of Cundinamarca and Norte de Santander in vector format (IGAC, 2012b), raster layers of FC and PWP at a resolution of 250 m (Hengl et al., 2017), and detailed, semi-detailed, and general soil profiles in vector format (IGAC, 2012a;IGAC, 2012b;IGAC, 2015).

Methods
The WRC calculated on a general scale (100K) and SoilGrids were compared with the values generated on detailed scales (25K or 10K). For these values, the parameterization of spatial and thematic information, WRC calculation at a given depth, and model validation were employed, as shown below.

A. Parameterization of spatial and thematic information
Soil maps in vector format were projected and rasterized at a spatial resolution of 50 m, and the grid value was adjusted according to the effective mapping scale (Hengl, 2006). The raster layers of SoilGrids were extracted corresponding to the moisture content in a volumetric fraction at field capacity (FC) (-31.6 kPa or pF 2.5) and permanent wilting point (PWP) (-1,500 kPa or pF 4.2) at three standard layer: sl2 (0.05 m), sl3 (0.15 m), and sl4 (0.30 m). The SoilGrids layers were clipped according to the study sites and were projected and resampled to obtain the defined spatial resolution.

B. Calculation of the WRC
In the case of the SoilGrids layers, WRC was calculated at 0.3 m, applying map algebra operations in QGIS. The calculation consisted of the difference at each depth between FC and PWP, all based on the previously parameterized layers (Equation 1) (Silva et al., 2014). The calculation of FC and PWP via SoilGrids was based on the pedotransference function (PTF) developed in tropical soils by Hodnett and Tomasella (2002) and Wösten et al. (2013), which is based on variables such as the content of sand, silt, clay, BD, OC, pH, and cation exchange capacity (CEC).
where: N is the number of standard layers, D is the standard layer depth, FC is the field capacity in the standard layer, and PWP is the permanent wilting point in the standard layer.
The modal profile for each CSU was identified from vector layers retrieved from the detailed and semi-detailed scales, where soil consociations predominated. Once the modal profile was defined, the volumetric moisture retention capacity at 0.30 m was calculated according to the physical properties recorded in the technical reports of the soil surveys considered. The calculated WRC value for the modal profile was assigned to the entire CSU using QGIS.
For vector layers on a general scale (100K), in which associations predominate, calculation of the retention capacity was the product of the weighted average of modal profiles that represent each CSU. However, as on this scale the studies do not record the physical properties of all the modal profiles, FC and PWP was calculated using the PTF in SoilGrids.

C. Validation of methods
Statistical functions and the goodness of fit of graphs between the observed (obtained in the soil studies on detailed and semi-detailed scales) and the predicted values (obtained in studies on a general scale and by SoilGrids) were used to evaluate WRC values obtained.
For the quantitative statistics, we used the mean error (ME), absolute mean error (AME), and mean square error (MSE), whose optimal values are 0; furthermore, the optimal values for the root-mean-square error (RMSE) or its normalized value (NRMSE) are the standard deviation and 0 %, respectively (Brus et al., 2011;Yapo et al., 1996). Additionally, the Levene test was carried out with Ho assuming equal variances and Student t-tests with Ho assuming equal means. On the other hand, at the spatial level, the error percentage of each of the methods was incorporated, based on the percentage difference between the observed and predicted values.

A. WRC behavior in both study sites
The WRC for the first 30 cm of soil depth in the sites evaluated can be seen in Figures   corresponded to Inceptisols (Soil Survey Staff, 2014) of a fine family under intensive livestock use. The zones with higher WRC values corresponded to Andisols of a fine family, where the allophane material presence favors moisture retention due to the formation of organo-mineral complexes (Mikutta et al., 2006), which give a porous soil structure (Chevallier et al., 2010). In area B, the zones with the lowest WRC values were sandy-to coarse-textured soils, which did not exhibit high moisture retention, due to their high number of pores and low content of organic matter (Malagón et al., 2015). In contrast, the soils that exhibited the highest WRC corresponded to Inceptisols and Entisols of the clay family, with higher OM content, but with a degradation process of their soil structure due to the mechanization work carried out in intensive rice cultivation fields under irrigation.

B. Comparison of surfaces
In area A, correlations of -0.007 and -0.149 were obtained for the general scale of 100K and SoilGrids, respectively (Table 1), which indicate a low-to-null correlation with the general study and an inverse correlation with SoilGrids. In the case of the semidetailed study of area B, correlations of 0.489 and 0.242 were determined for 100K and SoilGrids, respectively. The study of soils in area B, with a scale of 1:25,000, is a greater generalization of soils compared to the study at a scale of 1:10,000 in area A. Fewer correlations for area A were expected, according to Zhang et al. (2016), who maintained that any soil property has a higher probability of error as the study scale decreases.
In the two study sites considered, SoilGrids surfaces exhibited a lower correlation compared to 100K. This is due to the spatial resolution and the uncertainty increase of WRC surface of SoilGrids, which was not measured directly but calculated using the PTF proposed by Hodnett and Tormasella (2002), incorporating the contents of sand, silt, clay, OC, BD, CEC, and pH as independent variables (Leenaars et al., 2018). Further, Hengl et al. (2017) described coefficients of determination between 64.5 and 83.4. Figures 3A and 3B show a conditional quantile analysis for each of the methods used. The blue line indicates a perfect model, where the observed values are equal to predicted values. In area A, WRC values higher than 150 mm were recorded; however, when observing the quantiles and percentiles of the predicted values, both 100K and SoilGrids were never greater than 80 mm. These results indicate that outputs of 100K and SoilGrids underestimate WRC values in the detailed study. In contrast, in area B, predicted values for WRC of both 100K and SoilGrids were within the range of the observed values.
The histograms of original and predicted variables are shown in Figures 3A and 3B, where estimated WRC for both 100K and SoilGrids were different from the original variable in the two study areas. These results were corroborated by performing a Levene test, where equal variances were assumed, obtaining a p-value lower than 2.2 e -16 and a Student t-test assuming equal means with a p value also lower than 2.2 e -16 . For site A, the WRC estimate was generalized and concentrated around 60 to 70 mm in the case of 100K and between 40 and 50 mm for SoilGrids, indicating that the latter was the method that would most underestimate WRC in site A (ME 5.829 mm). This is expected if we consider that within site A, there are soils of the Andisol order with allophane materials that favor WRC, and although SoilGrids uses the soil orders  coverage where general aspects such as crops, forests, and pastures, among others, are reported (Hengl et al., 2017). Figures 4A and 4B show the areas that exhibited a greater error in the estimation of WRC for area A using 100K and SoilGrids. They were located to the southwest of the evaluated area, where WRC was low and the result of a soil compaction phase due to its intensive use in livestock systems. The areas mentioned above were reported and mapped on a 1:10,000 scale but not on a 100K scale, due to their generality. In the case of SoilGrids, a covariate that indicates soil compaction phase was not included. Additionally, SoilGrids surfaces show a greater area with underestimated values using 100K. Ramos et al. (2017) found similar results, indicating differences between 55 % and 75 % of the study site when comparing properties, such as CEC and OC, respectively.
according to FAO as one of its covariables (Hengl et al., 2017), the resolution and the discontinuity of these soils do not facilitate their mapping on a global scale.
For area B, using both 100K and SoilGrids, some estimated WRC values were outside the range of the original variable. This condition was more pronounced in the estimation with SoilGrids, where the values were concentrated around 60 mm, that is, this corresponds to an extreme, indicating that for this study area, WRC with SoilGrids was overestimated (ME -19.522 and AME of 19.526). These results are attributed to the specific process and local continuous degradation scale of the soil structure in the site, due to intensive irrigated rice cultivation activities. Further, this activity was not analyzed on a global scale and with the resolution of the covariates used by SoilGrids, because it uses surface  For study area B, the zones that exhibited the most significant error can be observed in Figures 5A and  5B. In the case of SoilGrids, the areas with the highest error correspond to soils with sandy, loamy sands and sandy loam textures, which are characterized by having low water retention (Malagón et al., 2015). These areas represent a minor zone and specific conditions within the study site, since they correspond to soils formed in a hilly relief from coarse-grained sedimentary rocks. In the case of 100K, these same sites exhibited an error between 0 % and 50 %, a product of the soil study scale used, where a landscape with rolling hills was identified, mapped, and characterized in the study.

Conclusions
The comparison of two WRC surfaces generated from general soil surveys and digital soil mapping dates shows that the best approximations use the most similar origin scales. The results showed better goodness of fit of parameters between a semi-detailed (area B) and a general study than those shown between a detailed (area A) and a general study. Although SoilGrids data do not have a similar scale, input data and covariates employed allow inferring that its surfaces are situated between an exploratory and a general study, that is, different from those generated by a detailed or a semidetailed study.
The surfaces obtained with the two methods in the areas evaluated differ from the original variable, with the results of SoilGrids tending to underestimate (area A) or overestimate (area B) WRC. In contrast, the results obtained with 100K exhibit much better data distribution. Although the results indicate that SoilGrids are not adequate for describing WRC in specific areas and should be used for larger areas, it is recommended as a data supplement (or a covariate) of obtained from more detailed studies to create spatial prediction models.