Spatial soil sampling design using apparent soil electrical conductivity measurements

Soil sampling is an important stage of digital soil mapping. The objective of this study was to characterize the spatial design of soil sampling using soil apparent electrical conductivity (ECa) and its optimized spatial sampling. For the characterization, it was used the Spatial Simulated Annealing (SSA) technique, incorporated on the software SANOS 0.1, and the method of response surface sampling design on the software ESAP 2.35. The ECa was measured at 1,887 points in an area of 6 ha located in the northwestern region of Spain. The EM38-DD equipment (Geonics Limited 2005) was used at 2 depths: vertical dipole (1.5 m effective measurement depth) and horizontal dipole (0.75 m effective measurement depth). Semivariogram showed trend for ECa in vertical dipole (ECa-V) and ECa in horizontal dipole (ECa-H). Software SANOS 0.1 and ESAP 2.35 were used to obtain the 40-point sampling scheme, using the 2 schemes (SANOS and ESAP). ECa-V estimation values at the 1,887 points were calculated with residual ordinary kriging. The sampling scheme obtained from ESAP was better than with SANOS.

ABSTRACT: Soil sampling is an important stage of digital soil mapping.The objective of this study was to characterize the spatial design of soil sampling using soil apparent electrical conductivity (EC a ) and its optimized spatial sampling.For the characterization, it was used the Spatial Simulated Annealing (SSA) technique, incorporated on the software SANOS 0.1, and the method of response surface sampling design on the software ESAP 2.35.The EC a was measured at 1,887 points in an area of 6 ha located in the northwestern region of Spain.The EM38-DD equipment (Geonics Limited 2005) was used at 2 depths: vertical

Spatial soil sampling design using apparent soil electrical conductivity measurements inTRoduCTion
The soil sampling is an important stage of digital soil mapping (DSM).In DSM the principal idea is that soil properties have some sort of correlation with other environmental variables.These variables are known as secondary information in a regular grid or densely sampling, i.e. digital elevation models, geological maps, remote sensing images, apparent soil electrical conductivity (EC a ) or resistivity (McBratney et al. 2003).
EC a , measured by contact or by electromagnetic induction, is directly related with other physical and chemical soil properties.There are some factors that may affect the EC a : soil moisture, pore size distribution, salt content, particle size distribution, cation exchange capacity (CEC), clay mineral composition, pore geometry and tortuosity, concentration of dissolved electrolytes in soil water, amount and composition of colloids, and temperature of soil water (Sudduth et al. 2005;Kühn et al. 2008;Siqueira et al. 2015a).Several authors have used EC a to improve the estimation of other soil properties (McNeill 1980;Lesch et al. 2000;Schumann and Zaman 2003;Domsch and Giebel 2004;Sudduth et al. 2005;Amezketa 2007).Therefore, the use of EC a measured by electromagnetic induction is an important tool soil digital mapping, favoring the use of optimized sampling schemes.
Spatial sampling schemes are used to determine the sample location describing the spatial variability of determined variables.Besides the grid configuration, sampling density and distance are important factors for accurate spatial predictions (Minasny and McBratney 2007).Sampling schemes can be optimized using geostatistical methods as described by van Groenigen et al. (1999) as well as Diggle and Lophaven (2006).One of the main criticisms of geostatistical approach is the necessity of prior knowledge of attributed semivariogram.Other methods are based on geometric criteria, without semivariogram estimation, as it was proposed by Royle and Nychka (1998), or on statistical criteria, as the method proposed by Lesch et al. (1995), who used EC a data based on a multiple regression model to determine the samples' number and location.
Two complex mathematical models have been widely used to specify the optimum sampling schemes of soil and plant: SANOS 0.1 (van Groenigen et al. 1999) and Electrical Conductivity or Salinity, Sampling, Assessment and Prediction (ESAP) 2.35 software (Lesch et al. 2000).However, there is no study evaluating the accuracy of these methods in respect to the original data.SANOS 0.1 software has an open system that allows to determine any number of samples for a previously studied area, while the ESAP 2.35 software has limitations regarding the number of points to be optimized and it was developed specifically for soil electrical conductivity and salinity data.
Other authors have used the ESAP software to optimize the use of satellite images (Hunsaker et al. 2009).Ferreyra et al. (2002) used the SANOS software to reduce the number of soil moisture samples having as secondary tool for decision making the scaled semivariogram.Siqueira et al. (2015b) studied different sampling schemes and soil water contents.They observed that the choice of sampling scheme influenced the results, particularly in relation to small number of samples, preventing the use of geostatistical techniques and precision agriculture.
The objective of this study was to characterize the spatial design of soil sampling, using EC a and its optimized spatial sampling.For that, the Spatial Simulated Annealing (SSA) technique was used incorporated on SANOS 0.1 software, as well as the method of sampling design by response surface using ESAP 2.35 software.

MATERiAl And METHodS
The study area (6 ha) is located at Castro de Ribeiras de Lea, Lugo, Spain.The geographical coordinates are 43°09′49″N and 7°29′47″W, with average elevation of 410 m and average slope of 2% (Figure 1).The mean annual temperature is 11.2 °C and the mean annual rainfall is 930 mm (data: 1961 -1990) (Paz González et al. 1996).
Daily rainfall (mm) and crop reference evapotranspiration (mm) are shown in Figure 2. The soil area is classified according to FAO-ISRIC (1994) as Cambisol, with parent material of sediments from tertiary and quaternary periods (Paz González et al. 1997).Table 1 shows the particle size distribution and organic carbon.The organic matter content is higher in the horizon A p compared to the other soil horizons.Soil texture (size < 2 mm) is sandy-loam in the horizon A p , loam-clay-sandy in the horizon B w and clay in the horizon B tg .
The equipment consists of 2-unit measurements: one in a horizontal dipole (EC a -H), to provide an effective measurement depth of approximately 0.75 m, and another in a vertical dipole (EC a -V), with an effective measurement depth of approximately 1.5 m (McNeill 1980), which cover the complete root volume of a plant (Domsch and Giebel 2004).The principle of the EM38-DD device requires the instrument to be calibrated a new at each location and before each measurement (Geonics Limited 2005).The instrument must also be calibrated a new if the instrument's temperature changes because of alterations in the ambient temperature (Sudduth et al. 2001).Data were collected on March 14, 2008 and April 3, 2008 (Figure 3), using EM38-DD, a field computer and a GPS RTK StarFire (John Deere®) to georeference the electrical conductivity measurements.EC a -H and EC a -V were measured at 1,887 locations on March 14, 2008 and 1,751 locations on April 3, 2008.
Spatial analysis of data was carried out using geostatistical tools according to Vieira (2000).The experimental semivariogram, γ(h), of n spatial observations, Z(x i ), i = 1,..., n, can be estimated as: where: N(h) is the number of pairs of observations separated by a distance h.
Experimental semivariograms need to be fitted to some mathematical model which must meet the criteria of conditional positive definiteness (McBratney et al. 1982).
(1) Amongst all the variety of models which satisfy that condition, the fitting parameters that describe them are the nugget effect, C 0 , the sill, C 0 + C 1 (where C 1 is the structured variance coefficient to be defined later), and the range of spatial dependence a.
When the semivariogram of any particular variable does not stabilize at constant value for the sill, it is evident that the sill does not exist.Therefore, the stationarity of the mean may not be guaranteed because the variable increases in an unlimited way in some direction.In this situation, it is necessary to remove the trend before any geostatistical application based on the intrinsic hypothesis (Vieira 2000).One possible way to detrend the data is using a trend surface fitted to data through minimized squared deviations of the difference between the surface and the original data, producing a new residual variable (Vieira 2000).In this study, EC a (mS•m −1 ) showed some trend that was best fitted to a parabolic surface, Z trend , using the following equation: where: A 0 , A 1 , A 2 , A 3 , A 4 and A 5 are the fitting parameters.The residual variable, Z res , can be obtained by subtracting the estimated trend surface from the original values for each point: In Equation 3, the variable Z res is subsequently tested for the existence of a defined sill in the semivariogram.The kriging estimation is done on the residual values to which the estimated surface is added after the estimation is done.
As the semivariograms for the original data showed a very strong trend, violating the intrinsic hypothesis of geostatistics (Journel and Huijbregts 1978), parabolic trend surface equations were fitted to the data and subtracted from them.The residuals generated by the difference between originals and trend surface produced semivariograms that showed a very well-defined sill, and, for this reason, the residuals were used in the remaining analysis; the semivariogram model was fitted using the cross validation by means of the software Gstat (Pebesma 1992).
It was used the SANOS 0.1 software (van Groenigen et al. 1999) to locate the new sampling location points of EC a , with the criteria of optimization sample and the mitigation of ordinary kriging variance average in the study area, through a spatial simulation algorithm (SSA) that uses the fitting parameters of variable semivariograms.The parameters of the residuals in semivariogram models (EC a -V and EC a -H) were determined at 40 locations where EC a was measured.SSA is a continuous version of the discrete Simulated Annealing (SA) optimization method (Aarts and Korst 1989).The insensitivity of the algorithm to local extremes makes it very suitable for constrained optimization of spatial sampling schemes in the presence of complex pre-information.Consider the collection of possible sampling schemes consisting of n observations, S n , with a so-called fitness function φ(•):S n → View the MathML source+ to be minimized.Optimization starts with a random scheme S 0 ∈ S n .It then involves a sequence of proposed random perturbations S i + 1 of S 0 that have a probability P c (S i → S i+1 ) of being accepted.This transition probability is defined in the Metropolis criteria: (2) (3) (4) where: c denotes the positive control parameter, which is lowered as optimization progresses.
If S i+1 is accepted, it serves as a starting point for a next scheme S i+2 , and the process continues in a similar way (Aarts and Korst 1989).In SSA, random perturbations of the sampling scheme S i consist of transformations of randomly-chosen observations over a vector with random length and random direction.We will consider a vector δ in of n elements.At each step an element is selected at random, and it is assigned a random value.All other values are equal to 0. This vector has the following property: |δ in | → 0 when i → ∞.SSA can include earlier observations into the optimization by treating them as an integral but fixed part of the sampling scheme, i.e. with corresponding δ in values set equal to 0 for all i.Boundaries of the region and inaccessible sub-regions can be taken from a GIS file.
Until the present moment, several optimization criteria have been translated into a fitness function and applied in SSA.Two of those are important in this study: (a) MMSDcriterion, which aims a spreading of the observations over the entire research area by minimizing the distance between an arbitrarily chosen point and its nearest observation; (b) WM-criterion, which is taken from the literature and optimizes the fit of the realized distribution of point pairs for the experimental variogram to a pre-defined, ideal distribution (Russo 1984;Warrick and Myers 1987;Russo and Jury 1988).The desired distribution can be based upon expert judgment, allowing the user to give special attention to certain aspects of the variogram (for instance, the nugget).The minimization function is a simple sum of squares of the deviation between the desired distribution ζ* and the realized distribution ζS.
The ESAP 2.35 software (Lesch et al. 2000) was also used to determine the optimum position of the 40 sample points.For this, it was used the computational tool ESAP-RSSD (Response Surface Sampling Design) that uses a multiple linear regression model minimizing the chances of autocorrelation space between the surface of different models of response generated by the algorithm, as reported by Lesch et al. (1995) and Fitzgerald et al. (2006).In this method, optimized sampling scheme process begins with log-transformation (ln) of EC a measurements in vertical dipole (lnEC a -V) and in horizontal dipole (lnEC a -H), obtaining a variable with combined trend surface, considering the original location (x,y) (Lesch et al. 2000).Later, EC a -V = z 1 and EC a -H = z 2 are correlated with location coordinates of presumed surface.It is possible to minimize the values obtained in the prediction and estimate values that represent the different reply surfaces for each new configuration: where: x and y are the geographic coordinates; To choose the best location of EC a for the new sampling model, all the parameters have to be significantly different from zero (p > 0.05) and with the minimum predicted residual sum of squares.The spatial independence of residues was examined using the residual autocorrelation test of Moran (Brandsma and Ketellapper 1979;Lesch et al. 1995).ESAP-RSSD software lets the determination of 6; 12 and 20 points of optimized sampling scheme.Our study area was divided into 2 subareas (A and B).In each subarea the ESAP-RSSD software was used to determine the sampling scheme of 20 locations and to obtain, with the 2 subareas, 40 locations of sampling scheme for all the field.The SANOS 0.1 software does not have any restriction about the location number of sampling scheme in the field.
Using the sampling points chosen by SANOS and ESAP, and the measured values of EC a -V and EC a -H at those points, the semivariogram was performed using the residual values.The residual is the difference between the regionalized variable and the trend.The residual ordinary kriging was used to estimate residuals of 1,887 and 1,751 values of EC a -V measured with the EM38-DD (EC a -V mea ) with the Gstat software (Pebesma 1992).After the residual estimation, it was added to the trend estimation for each point, obtaining an estimate of EC a -V (EC a -V est ) from the 40 sampling points chosen by SANOS and ESAP.To estimate the performance of the 2 sampling schemes, the sum of squared errors (SSE) obtained with SANOS and ESAP was used in the Equation 7: semivariograms in both dates have had the same model, the spatial variability pattern was not the same.This difference is most important at the initial part of the semivariogram that represents the spatial variability at short distances.
The fitted parameters of semivariogram showed low spatial discontinuity between dates; it means that the nugget effect values (C 0 ) were low for all data sets (Figure 4).Range values (a) varied from 105 m for EC a -V in March 14, 2008 to 145 m for April 3, 2008.EC a -H data presented a low value of range for the second measurement date (40 m) in relation to the first one (40 m).The difference between EC a -V and EC a -H range values is mainly caused by the greater measurement volume in vertical dipole.This fact is responsible for a lower spatial discontinuity.
Sample scheme optimization using ESAP 2.35 and SANOS 0.1 software showed a different sample scheme for the 2 methods and for the 2 measurement dates (Figure 5).In SANOS the variation in semivariogram parameter values resulted in different sampling schemes.The SANOS software lets the choice of location number in the other side the ESAP software the user can only use 6; 12 or 20 sample locations.Therefore, if the user wants to use a greater location number in field, it is necessary to subdivide the fields in several subfields.
There is no agreement about the kriging variance minimization as measurement of local uncertainty.Goovaerts (1997) showed that kriging variance is mainly influenced by the spatial configuration of data.ESAP software uses the 2 measurements of soil apparent electrical conductivity (EC a -V and EC a -H) and their locations (Lesch et al. 2000).ESAP software has a simple use, because it is not necessary to know the variable semivariograms previously to sampling location optimization.
Experimental semivariograms were calculated using the measured EC a in the optimized sampling points (Figure 6).These semivariograms showed trends, which were removed, and residual semivariograms were obtained.

RESulTS And diSCuSSion
The data sets presented a log-normal distribution (Table 2).EC a -V showed the greatest variance values in both times, because the soil volume measurements in vertical dipole is greater than in the horizontal.In this way, the major differences are mainly caused by intrinsic soil factors as clay content, water content and dissolved cations in soil solution (Rhoades et al. 1976;Nadler and Frenkel 1980;McNeill 1980).The values of standard deviation (SD) and coefficient of variation (CV) followed the same pattern.
The EC a data showed trend, i.e. the semivariance values increase when distance increases (Figure 4).The observed trend for all EC a data was quadratic.In this case, the presence of trend is related to the area topography with a 2% slope (Figure 1), which contributes to water flow in downhill direction and causes the greatest values of water content in the lowest area in the field.According to Kuzyakova et al. (1997), soil topography is one of the main factors that affect the soil spatial variability; they described that landscape shape (concave, convex or lineal) has different spatial variability caused mainly by the different water flow in the soil.
The residual semivariograms of EC a -V were fitted to spherical model; meanwhile, EC a -H was fitted to the exponential one (Figure 4).Several authors have described that spherical model is the most common for soil and plant properties (McBratney and Webster 1986;Cambardella et al. 1994;Carvalho et al. 2002;Siqueira et al. 2008)        date.The reduction in the values of SSE may be related to the interaction between EC a and soil properties.The soil water content was more homogeneous in the second sampling date, and this fact causes EC a values more homogeneous.Statistical parameters showed that variance values were lower in March 14, 2008, mainly for EC a -H, and the CV was lower for the 2 dipole measurement modes EM38-DD (EC a -V and EC a -H) (Table 2).

Figure 1 .
Figure 1.Study area location and field digital elevation map.

Figure 2 .
Figure 2. Precipitation and potential crop reference evapotranspiration in the period of study.

Figure 3 .
Figure 3. Sampling scheme of soil apparent electrical conductivity.

Figure 4 .
Figure 4. Experimental and fitted model semivariogram of the studied variables slopes.

Figure 7 .
Figure 7. Map of the estimated values obtained by ordinary kriging for original data.

Figure 8 .
Figure 8. Map of the estimated values obtained by ordinary kriging for optimized sampling points.

Table 1 .
Soil analytical data for the study area.

Table 2 .
. The EC a -V semivariograms presented the same spatial variability pattern.Although the EC a -H Statistical parameters of soil apparent electrical conductivity in vertical and horizontal dipoles.