Soil variables as auxiliary information in spatial prediction of shallow water table levels for estimating recovered water volume

Spatial data became increasingly utilized in many scientific fields due to the accessibility of monitoring data from different sources. In the case of hydrological mapping, measurements of external environmental conditions, such as soil, climate, vegetation, are often available in addition to the measurements of water characteristics. An integrated modelling approach capable to incorporate multiple input data sets that may have heterogeneous geometries and other error characteristics can be achieved using geostatistical techniques. In this study, different physical hydric properties of soils extensively sampled and topography were used as auxiliary information for making optimal, point-level inferences of water table depths in forest areas. We used data from 48 wells in the Bauru Aquifer System in the Santa Bárbara Ecological Station (EEcSB), in the municipality of Aguas de Santa Bárbara in São Paulo State, Brazil. Using the resistance of soil to penetration and topography as auxiliary variables helped reduce prediction errors. With the generated maps, it was possible to estimate the volumes of water recovered from the water table in two periods during the monitoring period. These values showed that 30% of the recovered volume would be sufficient for a three-month supply of water for a population of 30,000 inhabitants. Therefore, this raises the possibility of using areas such as the EEcSB as strategic supplies in artificial recharging management.


INTRODUCTION
The future of natural resources has been the subject of much reflection in relation to the modern way of life, with the imminent scarcity of water resources among the greatest causes for concern.Shallow groundwater systems are vitally important to humankind as a source of water, for maintaining river discharge, and for sustaining wetland and riparian ecosystems.When shallow groundwater systems are unconfined, they are vulnerable to anthropogenic contamination.It is therefore important for hydrogeologists and natural resource managers to understand the processes of the unsaturated zone that link human activity at the soil surface with the underlying groundwater, and vice versa (DILLON; SIMMERS, 1998).Thus, the strategic importance of groundwater resources has increased, stimulating the development of efficient methods for their measurement and monitoring.Such techniques must be capable of assessing the quantity of groundwater resources and the seasonality effects and possible alterations in climatic conditions.This understanding would promote a balance of the interests around the multiple functions attributed to groundwater, making the knowledge regarding the very important spatiotemporal dynamics of the groundwater (VON ASMUTH; KNOTTERS, 2004).
Robust data collection is a fundamental requirement for a monitoring system intended to reflect a representative assessment of the state of natural resources (BAALOUSHA, 2010).Geospatial data often demonstrate incompatible heterogeneities with each other.For Cao et al. (2014) it can happens in terms of data nature (continuous or categorical), spatial support (areal or point-reference data), spatial scales, and sample locations (missing values).Considering also a complex spatial dependence and inter-dependence structures among spatial variables, these incompatibilities or heterogeneities render fusing these diverse sources of spatial information a rather challenging problem.Nguyen et al. (2012) regard data fusion as an inference problem: given two heterogeneous input datasets with different statistical characteristics, how do we optimally estimate the quantity of interest, and obtain uncertainty measures associated with these inferences?It would be ideal to fuse these diverse information efficiently to achieve a comprehensive perspective (CAO et al., 2014).
One technique for efficient and precise monitoring of groundwater is to use stochastic methods.These are able to assemble information on aquifers and the spatiotemporal variation of water table depths, allowing decisions to be made within the context of the strategic management of water resources (KNOTTERS; BIERKENS, 2001).Groundwater dynamics, which are governed by the systemic combination of natural and anthropic factors, require the use of data models to explain their complexity (KRESIC; MIKSZEWSKI, 2013).
Geostatistics is a branch of statistics that allows the simulation of the spatiotemporal distributions of variables that define the quantity and quality of natural resources (SOARES, 2006).Even though geostatistical models are technically complex and strict, they have been used widely because of their capacity to predict a variable with precision and to calculate the uncertainties involved (KITANIDIS, 1997;YAMAMOTO;LANDIM, 2015).The use of geostatistics for monitoring aquifers might facilitate data collection by minimizing hindrances caused by costs, natural limitations, and time (KITANIDIS, 1997), providing spatiotemporal analyses that benefit the management of groundwater resources.
Geostatistical models that consider correlations between variables in the estimation of a variable of interest might produce better estimations if the correlation is sufficiently strong and there is an inherent physical meaning behind it.Remote monitoring tools or even the detailed collection of low-cost sampling variables have been used successfully for this purpose by applying interpolators such as cokriging (AHMADI; SEDGHAMIZ, 2008), universal kriging (KAMBHAMMETTU; ALLENA; KING, 2011; MANZIONE; MARCUZZO; WENDLAND, 2012), and kriging with external drift (DESBARATS et al., 2002;PETERSON et al., 2011).The present work considered several characteristics and physical properties of the environment as secondary variables that could help predict shallow water table depths in areas belonging to the Cerrado biome and in forest crops in the Santa Bárbara Ecological Station (EEcSB) in São Paulo State, Brazil.The objective was to test the efficacy of soil granulometry, topography, soil humidity, and resistance to penetration as auxiliary variables when measuring the spatial variability of water table depths in an area of the Bauru Aquifer System (BAS) in comparison with the managed state in the EEcSB.Subsequently, the volume of water recovered during the rainy season was estimated and the exploitation capacity of the aquifer was evaluated using cokriging as the geostatistical interpolator.

Study area and available data
The EEcSB is located at 24°48´S, 49°13´E, near highway SP 261 (Km 58) in the municipality of Águas de Santa Bárbara in São Paulo State, Brazil.The EEcSB includes an area of 2,712 ha of native vegetation (Cerrado, swamps, and riverine forests), interspaced with exotic species resulting from reforestation (e.g., pine and eucalyptus).The existing crop configuration was initiated within the context of the state policy for reforestation during the 1960s.The total area of the EEcSB comprises 4,371 ha designated as belonging to State Forest and the Ecological Station.The study area is located above the Bauru Aquifer System (BAS), which is an aquifer in Upper Cretaceous sandstones, with a regional extension that occupies the geomorphological region of the occidental plateau of São Paulo State, in the sedimentary basin of Paraná.The outcropping surface of the BAS extends over more than 96,000 km 2 , representing an important source of water for the municipalities in São Paulo State.In the EEcSB, the aquifer units are represented by the Marília and Adamantina aquifers.
In our study, we used water table depths observed between September 5, 2014 and May 31, 2016, from 48 monitoring wells distributed within the EEcSB (Figure 1), with a minimum depth of 4 and maximum of 8 meters.The wells were constructed only for monitoring the groundwater level.
These measurements were divided in two groups: 1) 32 wells with data collected between September 2014 and August 2015, and 2) 48 wells with data collected between August 2015 and May 2016.Two periods of analysis were selected: (P1) November 21, 2014 to May 5, 2015 and (P2) October 16 to December 3, 2015, that included the period of water table depth increase during the wettest part of the hydrological year.These measurement periods encompassed the lowest level of water table depth in the period, were close to the start of the hydrological year, and included the peak elevation of water table depth in the rainy season (Figure 2).
As data from the monitoring wells are scarce, values of the physical hydric properties of the soil and topography were used to assist in the interpolation of the water table depth.The physical hydric properties used in the present study were soil granulometry (percentage of sand (SAN) and clay (CLA)), hydraulic conductivity (K), soil moisture (SM), and soil resistance to penetration (RP).Soil samplings and SM and RP measurements were obtained in November and December 2015, during the wet period.
Granulometry was determined by the collection of soil samples, followed by laboratory analyses to determine the sand, silt, and clay fractions.Overall, 113 soil samples were collected at depths of 50 cm.A granulometry analysis was performed using sand fractionation to determine the value of K based on the Hazen method (FETTER, 2001).The value of SM was measured using a time-domain reflectometer on 70 sampling areas.On these same areas, RP was measured using an automatic electronic meter.These variables were collected in the suroundings of the basins; the irregularity of the sampling mesh was due to restricted access to the available routes in the EEcSB.Those values were interpolated and tendency surfaces representing the variables of interest were created.Figures 3 and 4 show maps of the auxiliary variables selected for the interpolation of water table depth for the EEcSB area, based on the survey data.These maps were generated by kriging the collected data.Santarosa (2016) presents in detail the procedures performed; however, a brief outline is presented in this article.

Geographical data spatial analysis
For the spatial analysis of the geographical data, the formalism of the geostatistical application was considered according to Kitanidis (1997), which follows three steps: (1) exploratory data analysis, (2) parameter estimation, and (3) model validation.
Figure 3. Auxiliary variables considered for water table depth estimations in the Santa Bárbara Ecological Station.

5/13
The exploratory data analysis involves performing statistical calculations to describe the characteristics of the sampling group.This analysis includes measurements of dispersion and central tendency, data normality tests, identification of outliers, and data transformation.
Parameter estimation refers to the adjustment of theoretical semivariogram models to the experimental spatial correlation of the regionalized variable (YAMAMOTO; LANDIM, 2015).The estimation of the weight of each measurement should be sought objectively to reflect the true structure of the spatial correlation.Soil variables as auxiliary information in spatial prediction of shallow water table levels for estimating recovered water volume

6/13
An experimental variogram consists of the finite variance of a group of data at a predefined distance (lag) (HENGL, 2009).A semivariogram is calculated based on the arithmetical average of the squared differences between pairs of spots (Z(x i )-Z(x i + h)) separated by a vector h, as described in Equation 1: Once the hypothesis of second-order stationarity is satisfied, the model that best explains the evaluated phenomenon is selected, summarizing the main patterns of spatial continuity.At this moment, the continuity degrees, anisotropy relations, and other properties of the spatial process are identified by variography.
If the random functions are the basis of the selected model, this will condition the process of value estimation for the variable identified as the attribute under evaluation in non-sampled areas (SOARES, 2006).
In the case of two correlated variables, cokriging is used and a cross variogram is calculated, considering Z 1 and Z 2 as stationary random variables (Equation 2): Once the semivariogram is defined, geostatistical estimation is performed using interpolators.To incorporate the uncertainties associated with estimation, and to consider the structural continuity of a certain phenomenon under evaluation, geostatistical methodologies make a spatial inference about a variable that has not been sampled (Z(x o )) at a certain location x o , based on a linear combination of values measured for the same variable (Z(x α )) located at position x α (Equation 3): Soares ( 2006) mentions that weighting (λ α ) has the role of reflecting lower or higher proximities of the sample structures (Z(x α )) relative to the point to be estimated (Z(x o )), and that it should have the effect of dissociating biased groups.
In the case of cross variograms, different variables grouped together based on their spatial correlation can be estimated using cokriging.This is a multivariate extension of the kriging method, which for each a sampled location seeks to obtain a vector of values instead of a single value (YAMAMOTO; LANDIM, 2015).Cokriging is suitable for situations in which a second variable with higher sample density can be incorporated to help estimate the main variable.The secondary variable is incorporated in the estimation model, by considering the existence of correlation between the two variables (GOOVAERTS, 1997;SOARES, 2006).The main variable Z 1 (x i ) is known at N 1 sampled locations, and the secondary variable Z 2 is sampled at N 2 locations.Therefore, variable Z 1 (x o ) in a non-sampled location x o can be described by the linear interaction between the main and secondary variables (Equation 4): Weights a i and b j are distributed based on the spatial dependency of each of the variables on each other and on their cross correlations.An optimal estimator cannot be tendentious and it should present minimal variance (neither overestimating nor underestimating values) with maximum reliability in the estimations (ISAAKS; SRIVASTAVA, 1989).
Cokriging, as the estimation of a regionalized variable through two or more variables with the purpose of improving local predictions, considers additional information contributed by a variable different from that being predicted.The use of this method should be preferred when the main objective is to reduce prediction variance.The incorporation of an auxiliary variable might also add physical meaning and help circumvent operational and financial limitations in surveys involving groundwater and the spatial prediction of water table levels.Generally, in such a process, data on topography, soil use and occupation, soil characteristics, and other variables that might be associated with groundwater dynamics are used (BETTÚ;FERREIRA, 2005;DESBARATS et al., 2002;HOOSHMAND et al., 2011;MANZIONE;MARCUZZO;WENDLAND, 2012;PETERSON et al., 2011;ROCHA et al., 2009).
Based on a comparison of the application of kriging and cokriging on a group of data for mapping groundwater levels, Ahmadi and Sedghamiz (2008) concluded that cokriging, using the different responses of the water table depths to distinct climatic conditions as auxiliary variables, provided results that were more precise than achieved using kriging alone.
Finally, yet importantly, the validation of the model may be performed by cross validation.In this procedure, the value estimated for one of the measured points is ignored.Instead, its value is estimated based on the remaining values, and the process is repeated for all measurements.With the group of actual values (measured) and those estimated for the group of data under analysis, statistical analyses are performed to assess the quality of the utilized model (SOARES, 2006).
As the formalism of geostatistical analysis consists of steps, its modeling is regarded as excellent and impartial (i.e., non-tendentious), making spatial analysis richer by allowing the prediction of values for non-sampled areas and measuring the quality of the estimation (YAMAMOTO; LANDIM, 2015).In the cross validation, a good estimation should present a value of the Mean Standardized (MS) close to zero, the lowest possible Root Mean Square (RMS), an Average Standard Error (ASE) close to the RMS, and a value of the RMS Standardized (RMSS) close to 1 (JOHNSTON et al., 2001).The parameters ASE, RMSS, and the coefficient of determination (R 2 ) were considered crucial in the verification of the quality of the interpolation of the water table levels.
Correctly assessing the variability in prediction can be verified when the ASEs are close to the RMS prediction errors.If the ASEs are greater than the RMS prediction errors, the variability of the prediction is overestimated; if the ASEs are less than the RMS prediction errors, the variability of the prediction is underestimated.Alternatively, the RMSS errors should be close to 1 if the prediction standard errors are valid.If the RMSS errors are greater than 1, the variability of the prediction is underestimated; if the RMSS errors are less than 1, the variability of the prediction is overestimated.
The exploratory data analysis required by the geostatistical method, variography, data interpolation, and cross validation were all performed using the Geostatistical Analyst package of the Geographical Information System ArcGIS (JOHNSTON et al., 2001).

Estimation of the recovered volume by the aquifer during the rainy seasons
In the present study, one of the applications for the mapping of water table levels was the calculation of the recovered volume, performed by map algebra based on the surfaces given by the interpolation of the water table depth and structured from both time points of water table depth recovery (P1 and P2).For this purpose, Equation 5, which was adapted from Manzione et al. (2007), was used: ( ) where the volume of recharge (VR) of each recovery was calculated by the variation of the water table depth (unit: m), which was given by the difference between the initial and final levels (WTD f -WTD i ) in the evaluated period, multiplied by the values of the area (A) of each pixel (unit: m 2 ) and the effective porosity (η e ).Here, a value of effective porosity of 10% was used.This represents an average value of the effective porosity in the BAS (i.e., 5-15%), and it is close to the inferior limit of effective porosity for quartz sand soils (varying between 12% and 18%).This chosen value took into account data generalization considering both the soil and rock layers; i.e., the effective porosity value should be neither too low (simulating only the rock condition) nor too high (referring only to the condition of the soil).

Exploratory analysis
The summary of statistical measurements displayed in Table 1 reveals the differences between measurements made at the beginning of the hydrological year (with lower levels) and those that represent the peak elevation of water table depth.
The measured values did not show normal behavior, i.e., the asymmetry and kurtosis values diverged from the expected limits of 0 and 3, respectively.
When using an auxiliary variable, it should be correlated with the main variable.Therefore, the values for the coefficient of determination between the average water table depth and the auxiliary variables utilized were calculated (Table 2).
The R 2 values reveal the degree of dependence between the main variable and the secondary variables.In general the secondary variables showed moderate to strong correlation with the main variable ranging from 0.45 to 0.86.Some variations in the measuring methods and the complexity of water table dynamics can cause distortions in the spatial analysis, making erratic results and not reproducing the expected effect in the model even with a high correlation between target and ancillary variables.The behavior of RP can also have influence from land use and sampling scale.In this way, RP just present good results in the central part of the study area as showed at Figure 5.

Variographic analysis
With regard to the variographic parameters, it is important to verify the variance of the calculated values by the difference between the sill (C) and the nugget effect (C 0 ), with the expectation that the acquired values with cokriging would be lower than those obtained using ordinary kriging alone (Table 3).
The best results for the auxiliary variables in the interpolation were obtained using the SRTM for the measurement made on 11/21/2014.For the measurements on 05/05/2015, RP, K, and SM provided the best results.For the measurements made on 10/16/2015, RP and SAN were the most efficient, whereas ALT, RP, SAN, and SRTM provided the best results for the measurements made on 12/03/2015.

Cross validation
To analyze the interpolation quality, the values of ASE, RMSS, and R 2 were considered determinant in the cross validation evaluation (Table 4).
Analysis of the resulting ASEs verified that most of the auxiliary variables were efficient by decreasing the error to a value lower than that found using kriging.For the measurements made on 11/21/2014, 05/05/15, and 10/16/2015, all variables were efficient with alternation in the granulometry values (SAN and CLA).For the interpolation of the measurements made on 12/03/2015, only ALT, RP, CLA, and SRTM were able to reduce the ASE.The RMSS was considered adequate, although it varied above or below the ideal limit (should be close to 1), possibly contributing to the increased error in some predictions or influencing the lower error reduction.The R 2 verification showed punctuated improvement in the predictions.However, the prediction model showed moderate to low accuracy, reflecting the level of dependence between the measured and predicted values.Soil variables as auxiliary information in spatial prediction of shallow water table levels for estimating recovered water volume 10/13 Another way to verify the quality of the interpolations was to use maps incorporating the standard error of the interpolation (Figure 5), which made it possible to identify the best interpolations by cross-referencing with previously analyzed information.Therefore, it was possible to observe that the auxiliary variables ALT, RP, and SRTM gave the most consistent results in the interpolated measurements.
This differential behavior of the auxiliary variables in the results is because each soil property used presents different spatial variability behavior, necessitating different strategies of obtaining data for each auxiliary variable.Another fact considered is the variability in each of the methods of obtaining the values of the physical hydric properties of the soil.
The difference found between the measurements of the water table depth reveals the difference in the behavior of the fluctuation of the water level in each well under the influence of climatic dynamics and the characteristics of the aquifer.Another phenomenon that caused behavioral change in the parameters of the spatial prediction was the difference in the way the sample set related to each auxiliary variable.
In undisturbed groundwater systems, climatological conditions can be considered the only factor.This approach can be considered in the interpretation of the spatial variation of groundwater dynamics, because spatial differences in groundwater dynamics are determined by the spatial variation of the system properties, while the temporal variation is driven by the dynamics of the input into the system (VON ASMUTH; KNOTTERS, 2004).
The maps generated using cokriging and ALT were used to calculate the volumes recovered in the analyzed period.Using topographic data as auxiliary variables for the interpolation of water table depths, Rocha et al. ( 2009) also achieved better geostatistical modeling.By incorporating topographic data in the spatial analysis of groundwater, they observed an important decrease in the sample variation marked by the reduction of the variogram sill.
The results show that the application of the auxiliary variables and data fusion from different scales in groundwater mapping can favor and improve of the quality of the information provide for water management.It also could replicated similar environmental conditions to those found in the field trips at EEcSB, providing information to explore the seasonal variation of the water volume in areas with an imminent or permanent situation of water resources scarcity.

Extractable water volume
The best maps were selected (Figures 6 and 7) to calculate the volume of water that was recovered in studied periods P1 and P2.
Using this methodology, calculated the volumes of recovered water (VR) for each of the two periods (VR P1 and VR P2 ).The recharge volume was calculated according to the use of Equation 5and compared to the volume precipitated in the period.The total height of precipitation responsible for level recovery was 890 mm for VR P1 and 470 mm for VR P2 (total: 1360 mm).This value, converted to a volume relative to the surface of the evaluated basins (2549 hectares), represents a precipitated volume of 34,666,400.00 m 3 .Thus, the total recovered value is equivalent to 10% of the rainfall and it represents a recharge of 140 mm (Table 5).
Comparison of both recovery periods shows that rainfall was more widely distributed but with lower recharge in VR P1 , whereas in VR P2 , rainfall was more concentrated, resulting in greater recharge in a shorter period.It is possible to infer that the volume to be collected depends directly on precipitation behavior, because water table depth oscillation is very susceptible to precipitation events.
The average water consumption in the municipalities of Águas de Santa Barbara, Manduri, and Cerqueira Cesar, according to data from the National System of Information on Sanitation, is shown in Table 6.An evaluation of the calculated values based on the collection of 30% of the VR in the EEcSB basins is presented in Table 7.This volume would be sufficient to supply water to the municipalities for three months, even when considering a 20% loss in the water catchment and distribution system.It is only a simulation for the use of the seasonal oscillation of groundwater.
Average uncertainties were incorporated to the spatial prediction model in the extractable water volume calculations to measure the effects of the ASE in the method application.The considered variation refers to minimal deviation close to the monitoring wells, in the region where water table depth data collection are concentrated, since the uncertainty in areas where the prediction was extrapolated generates a higher variation than the natural oscillation verified in VRP1 and VRP2.This behavior exposes a limitation in the methodology application.Thus, it is necessary to undertake new experiments to test sampling configurations, because sampling optimization can reduce the uncertainties reducing the variance and minimizing errors, providing more accurate estimates for water resources management.

CONCLUSIONS
This study examined the efficacy of using different physical hydric variables of soils and topography as auxiliary depths in forest areas, based on geostatistical interpolation, and producing important information for the advancement of spatial data studies.The use of such information produced satisfactory results in the interpolation of water table depths, especially in cases when RP, ALT, and SRTM were used as secondary variables.
The use of cokriging as an interpolator gave superior results in comparison with those achieved by kriging alone, as seen from the variographic parameters in the cross validation and in the standard error maps.However, the uncertainty present in the spatial prediction exposes the limitation of the method.Thus, adaptations are required to promote changes in data collection to reduce uncertainties and to provide estimates that are more accurate for the management of water resources.
The generated maps constitute an important tool for the management of water resources in priority areas, where aquifers are particularly vulnerable both to anthropic factors and to the effects of climate change.Understanding the dynamics of water table depth oscillation might facilitate faster response in times of water scarcity, alleviating the impacts on society by including strategic reserves of short-term groundwater volumes during times of pronounced drought, as seen in 2014.

Figure 1 .
Figure 1.Location of the Santa Bárbara Ecological Station (EEcSB) and the positions of the monitoring wells.

Figure 2 .
Figure 2. Average water table depth and Precipitation in the Santa Bárbara Ecological Station (EEcSB) between September 5, 2014 and May 31, 2016.

Figure 4 .
Figure 4. Auxiliary variables considered for water table depth estimations in the Santa Bárbara Ecological Station.SRTM: Shuttle Radar Topography Mission.

Figure 5 .
Figure 5.Standard error map for all interpolations.RP: resistance to penetration; SAN: percentage of sand; CLA: percentage of clay; K: hydraulic conductivity; SM: soil moisture; ALT: altimetry of the topographic map; SRTM: altimetry of the SRTM product; KG: kriging.

Figure 6 .
Figure 6.Oscillation of water table depth in studied periods P1.

Figure 7 .
Figure 7. Oscillation of water table depth in studied periods P2.

Table 1 .
Descriptive statistics of measured water table depths.

Table 2 .
Coefficient of determination (R2) between water table depth and auxiliary variables.
ALT: altimetry of the topographic map; SRTM: altimetry of the SRTM product; SAN: percentage of sand; CLA: percentage of clay; K: hydraulic conductivity; RP: resistance to penetration; SM: soil moisture.Soil variables as auxiliary information in spatial prediction of shallow water table levels for estimating recovered water volume 8/13

Table 4 .
Cross validation for the interpolations regarding water table depths (m).

Table 5 .
Stored volumes of precipitation in the aquifer during the rainy season between November 2014 and December 2015.

Table 6 .
Water volume for the monthly supply of cities near the Santa Barbara Ecological Station.

Table 7 .
Extractable volume (m 3 ) and months of water supply.