INTRODUCTION

The soil sampling is an important stage of digital soil mapping (DSM). In DSM the principal idea is that soil properties have some sort of correlation with other environmental variables. These variables are known as secondary information in a regular grid or densely sampling, i.e. digital elevation models, geological maps, remote sensing images, apparent soil electrical conductivity (*EC _{a}*) or resistivity (

^{McBratney et al. 2003}).

*EC _{a}*, measured by contact or by electromagnetic induction, is directly related with other physical and chemical soil properties. There are some factors that may affect the

*EC*: soil moisture, pore size distribution, salt content, particle size distribution, cation exchange capacity (CEC), clay mineral composition, pore geometry and tortuosity, concentration of dissolved electrolytes in soil water, amount and composition of colloids, and temperature of soil water (

_{a}^{Sudduth et al. 2005};

^{Kühn et al. 2008};

^{Siqueira et al. 2015a}). Several authors have used

*EC*to improve the estimation of other soil properties (

_{a}^{McNeill 1980};

^{Lesch et al. 2000};

^{Schumann and Zaman 2003};

^{Domsch and Giebel 2004};

^{Sudduth et al. 2005};

^{Amezketa 2007}). Therefore, the use of

*EC*measured by electromagnetic induction is an important tool soil digital mapping, favoring the use of optimized sampling schemes.

_{a}Spatial sampling schemes are used to determine the sample location describing the spatial variability of determined variables. Besides the grid configuration, sampling density and distance are important factors for accurate spatial predictions (^{Minasny and McBratney 2007}). Sampling schemes can be optimized using geostatistical methods as described by ^{van Groenigen et al. (1999)} as well as ^{Diggle and Lophaven (2006)}. One of the main criticisms of geostatistical approach is the necessity of prior knowledge of attributed semivariogram. Other methods are based on geometric criteria, without semivariogram estimation, as it was proposed by ^{Royle and Nychka (1998)}, or on statistical criteria, as the method proposed by ^{Lesch et al. (1995)}, who used *EC _{a}* data based on a multiple regression model to determine the samples’ number and location.

Two complex mathematical models have been widely used to specify the optimum sampling schemes of soil and plant: SANOS 0.1 (^{van Groenigen et al. 1999}) and Electrical Conductivity or Salinity, Sampling, Assessment and Prediction (ESAP) 2.35 software (^{Lesch et al. 2000}). However, there is no study evaluating the accuracy of these methods in respect to the original data. SANOS 0.1 software has an open system that allows to determine any number of samples for a previously studied area, while the ESAP 2.35 software has limitations regarding the number of points to be optimized and it was developed specifically for soil electrical conductivity and salinity data.

Other authors have used the ESAP software to optimize the use of satellite images (^{Hunsaker et al. 2009}). ^{Ferreyra et al. (2002)} used the SANOS software to reduce the number of soil moisture samples having as secondary tool for decision making the scaled semivariogram. ^{Siqueira et al. (2015b)} studied different sampling schemes and soil water contents. They observed that the choice of sampling scheme influenced the results, particularly in relation to small number of samples, preventing the use of geostatistical techniques and precision agriculture.

The objective of this study was to characterize the spatial design of soil sampling, using *EC _{a}* and its optimized spatial sampling. For that, the Spatial Simulated Annealing (SSA) technique was used incorporated on SANOS 0.1 software, as well as the method of sampling design by response surface using ESAP 2.35 software.

MATERIAL AND METHODS

The study area (6 ha) is located at Castro de Ribeiras de Lea, Lugo, Spain. The geographical coordinates are 43°09′49″N and 7°29′47″W, with average elevation of 410 m and average slope of 2% (Figure 1). The mean annual temperature is 11.2 °C and the mean annual rainfall is 930 mm (data: 1961 – 1990) (^{Paz González et al. 1996}).

Daily rainfall (mm) and crop reference evapotranspiration (mm) are shown in Figure 2. The soil area is classified according to ^{FAO-ISRIC (1994)} as Cambisol, with parent material of sediments from tertiary and quaternary periods (^{Paz González et al. 1997}). Table 1 shows the particle size distribution and organic carbon. The organic matter content is higher in the horizon A_{p} compared to the other soil horizons. Soil texture (size < 2 mm) is sandy-loam in the horizon A_{p}, loam-clay-sandy in the horizon B_{w} and clay in the horizon B_{tg}.

Horizons | Depth (m) |
C (%) |
Particle size distribution (USDA, %) |
|||
---|---|---|---|---|---|---|

Clay | Silt | Sand | Gravel | |||

A_{p} |
0.0 - 0.35 | 5.05 | 17.5 | 19.1 | 63.0 | 37.0 |

B_{w} |
0.35 - 0.70 | 0.72 | 19.2 | 20.7 | 59.1 | 44.8 |

B_{tg} |
+0.70 | 0.26 | 47.9 | 28.0 | 24.1 | - |

C = Total organic carbon.

The *EC _{a}* (mS∙m

^{−1}) was measured with an induction electromagnetic device EM38-DD (

^{Geonics Limited 2005}). The equipment consists of 2-unit measurements: one in a horizontal dipole (

*EC*), to provide an effective measurement depth of approximately 0.75 m, and another in a vertical dipole (

_{a}-H*EC*), with an effective measurement depth of approximately 1.5 m (

_{a}-V^{McNeill 1980}), which cover the complete root volume of a plant (

^{Domsch and Giebel 2004}). The principle of the EM38-DD device requires the instrument to be calibrated a new at each location and before each measurement (

^{Geonics Limited 2005}). The instrument must also be calibrated a new if the instrument’s temperature changes because of alterations in the ambient temperature (

^{Sudduth et al. 2001}). Data were collected on March 14, 2008 and April 3, 2008 (Figure 3), using EM38-DD, a field computer and a GPS RTK StarFire (John Deere®) to georeference the electrical conductivity measurements.

*EC*and

_{a}-H*EC*were measured at 1,887 locations on March 14, 2008 and 1,751 locations on April 3, 2008.

_{a}-VSpatial analysis of data was carried out using geostatistical tools according to ^{Vieira (2000)}. The experimental semivariogram, γ(*h*), of . spatial observations, *Z*(*x _{i}*),

*i*= 1,...,

*n*, can be estimated as:

where: *N*(*h*) is the number of pairs of observations separated by a distance *h*.

Experimental semivariograms need to be fitted to some mathematical model which must meet the criteria of conditional positive definiteness (^{McBratney et al. 1982}). Amongst all the variety of models which satisfy that condition, the fitting parameters that describe them are the nugget effect, C_{0}, the sill, C_{0} + C_{1} (where C_{1} is the structured variance coefficient to be defined later), and the range of spatial dependence *a*.

When the semivariogram of any particular variable does not stabilize at constant value for the sill, it is evident that the sill does not exist. Therefore, the stationarity of the mean may not be guaranteed because the variable increases in an unlimited way in some direction. In this situation, it is necessary to remove the trend before any geostatistical application based on the intrinsic hypothesis (^{Vieira 2000}). One possible way to detrend the data is using a trend surface fitted to data through minimized squared deviations of the difference between the surface and the original data, producing a new residual variable (^{Vieira 2000}). In this study, *EC _{a}* (mS∙m

^{−1}) showed some trend that was best fitted to a parabolic surface, Z

_{trend}, using the following equation:

where: A_{0}, A_{1}, A_{2}, A_{3}, A_{4} and A_{5} are the fitting parameters.

The residual variable, Z_{res}, can be obtained by subtracting the estimated trend surface from the original values for each point:

In Equation 3, the variable Z_{res} is subsequently tested for the existence of a defined sill in the semivariogram. The kriging estimation is done on the residual values to which the estimated surface is added after the estimation is done.

As the semivariograms for the original data showed a very strong trend, violating the intrinsic hypothesis of geostatistics (^{Journel and Huijbregts 1978}), parabolic trend surface equations were fitted to the data and subtracted from them. The residuals generated by the difference between originals and trend surface produced semivariograms that showed a very well-defined sill, and, for this reason, the residuals were used in the remaining analysis; the semivariogram model was fitted using the cross validation by means of the software Gstat (^{Pebesma 1992}).

It was used the SANOS 0.1 software (^{van Groenigen et al. 1999}) to locate the new sampling location points of *EC*_{a}, with the criteria of optimization sample and the mitigation of ordinary kriging variance average in the study area, through a spatial simulation algorithm (SSA) that uses the fitting parameters of variable semivariograms. The parameters of the residuals in semivariogram models (*EC _{a}-V* and

*EC*) were determined at 40 locations where

_{a}-H*EC*was measured. SSA is a continuous version of the discrete Simulated Annealing (SA) optimization method (

_{a}^{Aarts and Korst 1989}). The insensitivity of the algorithm to local extremes makes it very suitable for constrained optimization of spatial sampling schemes in the presence of complex pre-information. Consider the collection of possible sampling schemes consisting of

*n*observations, S

_{n}, with a so-called fitness function φ(∙):

*S*→ View the MathML source+ to be minimized. Optimization starts with a random scheme S

_{n}_{0}∈ S

_{n}. It then involves a sequence of proposed random perturbations S

_{i}+ 1 of S

_{0}that have a probability P

_{c}(S

_{i}→ S

_{i+1}) of being accepted. This transition probability is defined in the Metropolis criteria:

where: *c* denotes the positive control parameter, which is lowered as optimization progresses.

If *S _{i}*

_{+1}is accepted, it serves as a starting point for a next scheme

*S*

_{i}_{+2}, and the process continues in a similar way (

^{Aarts and Korst 1989}). In SSA, random perturbations of the sampling scheme

*S*consist of transformations of randomly-chosen observations over a vector with random length and random direction. We will consider a vector δ

_{i}_{in}of

*n*elements. At each step an element is selected at random, and it is assigned a random value. All other values are equal to 0. This vector has the following property: |δ

_{in}| → 0 when

*i*→ ∞. SSA can include earlier observations into the optimization by treating them as an integral but fixed part of the sampling scheme, i.e. with corresponding δ

_{in}values set equal to 0 for all

*i*. Boundaries of the region and inaccessible sub-regions can be taken from a GIS file.

Until the present moment, several optimization criteria have been translated into a fitness function and applied in SSA. Two of those are important in this study: (a) MMSD-criterion, which aims a spreading of the observations over the entire research area by minimizing the distance between an arbitrarily chosen point and its nearest observation; (b) WM-criterion, which is taken from the literature and optimizes the fit of the realized distribution of point pairs for the experimental variogram to a pre-defined, ideal distribution (^{Russo 1984}; ^{Warrick and Myers 1987}; ^{Russo and Jury 1988}). The desired distribution can be based upon expert judgment, allowing the user to give special attention to certain aspects of the variogram (for instance, the nugget). The minimization function is a simple sum of squares of the deviation between the desired distribution ζ* and the realized distribution ζS.

The ESAP 2.35 software (^{Lesch et al. 2000}) was also used to determine the optimum position of the 40 sample points. For this, it was used the computational tool ESAP-RSSD (Response Surface Sampling Design) that uses a multiple linear regression model minimizing the chances of autocorrelation space between the surface of different models of response generated by the algorithm, as reported by ^{Lesch et al. (1995)} and ^{Fitzgerald et al. (2006)}. In this method, optimized sampling scheme process begins with log-transformation (ln) of *EC _{a}* measurements in vertical dipole (ln

*EC*) and in horizontal dipole (ln

_{a}-V*EC*), obtaining a variable with combined trend surface, considering the original location (x,y) (

_{a}-H^{Lesch et al. 2000}). Later,

*EC*= z

_{a}-V_{1}and

*EC*= z

_{a}-H_{2}are correlated with location coordinates of presumed surface. It is possible to minimize the values obtained in the prediction and estimate values that represent the different reply surfaces for each new configuration:

where: *x* and *y* are the geographic coordinates; *z*_{1}*= b*_{1}[ln*EC _{a}-V* − average(ln

*EC*)]

_{a}-V*+ b*

_{2}[ln

*EC*average(ln

_{a}-H −*EC*)]; z

_{a}-H_{2}

*= b*

_{3}[ln

*EC*− average(ln

_{a}-V*EC*)]

_{a}-V*+ b*

_{4}[ln

*EC*− average(ln

_{a}-H*EC*)];

_{a}-H*b*

_{0},

*b*

_{1},

*b*

_{2},

*b*

_{3}and

*b*

_{4}are the estimated parameters;

*u =*(x

*−*min[

*x*])

*/k*;

*v =*(

*y −*min[

*y*])

*/k*;

*k*is the maximum value of (max[

*x*]

*−*min[

*x*])or (max[

*y*] − min[

*y*]).

To choose the best location of *EC _{a}* for the new sampling model, all the parameters have to be significantly different from zero (p > 0.05) and with the minimum predicted residual sum of squares. The spatial independence of residues was examined using the residual autocorrelation test of Moran (

^{Brandsma and Ketellapper 1979};

^{Lesch et al. 1995}). ESAP-RSSD software lets the determination of 6; 12 and 20 points of optimized sampling scheme. Our study area was divided into 2 subareas (

*A*and

*B*). In each subarea the ESAP-RSSD software was used to determine the sampling scheme of 20 locations and to obtain, with the 2 subareas, 40 locations of sampling scheme for all the field. The SANOS 0.1 software does not have any restriction about the location number of sampling scheme in the field.

Using the sampling points chosen by SANOS and ESAP, and the measured values of *EC _{a}-V* and

*EC*at those points, the semivariogram was performed using the residual values. The residual is the difference between the regionalized variable and the trend. The residual ordinary kriging was used to estimate residuals of 1,887 and 1,751 values of

_{a}-H*EC*measured with the EM38-DD (

_{a}-V*EC*

_{a}-V_{mea}) with the Gstat software (

^{Pebesma 1992}). After the residual estimation, it was added to the trend estimation for each point, obtaining an estimate of

*EC*(

_{a}-V*EC*

_{a}-V_{est}) from the 40 sampling points chosen by SANOS and ESAP. To estimate the performance of the 2 sampling schemes, the sum of squared errors (SSE) obtained with SANOS and ESAP was used in the Equation 7:

RESULTS AND DISCUSSION

The data sets presented a log-normal distribution (Table 2). *EC _{a}-V* showed the greatest variance values in both times, because the soil volume measurements in vertical dipole is greater than in the horizontal. In this way, the major differences are mainly caused by intrinsic soil factors as clay content, water content and dissolved cations in soil solution (

^{Rhoades et al. 1976};

^{Nadler and Frenkel 1980};

^{McNeill 1980}). The values of standard deviation (SD) and coefficient of variation (CV) followed the same pattern.

Date | Attribute (mS∙m ^{-1}) |
Mean | Variance | Standard deviation |
Coefficient of variation (%) |
Skewness | Kurtosis |
---|---|---|---|---|---|---|---|

03/14/2008 | EC_{a}-V |
10.48 | 4.42 | 2.10 | 20.07 | 0.527 | 0.124 |

EC_{a}-H |
14.10 | 0.60 | 0.77 | 5.53 | 0.065 | 1.810 | |

04/03/2008 | EC_{a}-V |
14.04 | 4.64 | 2.15 | 15.34 | 0.662 | 0.083 |

EC_{a}-H |
14.59 | 0.60 | 0.77 | 5.32 | 0.160 | 10.514 |

The *EC _{a}* data showed trend, i.e. the semivariance values increase when distance increases (Figure 4). The observed trend for all

*EC*data was quadratic. In this case, the presence of trend is related to the area topography with a 2% slope (Figure 1), which contributes to water flow in downhill direction and causes the greatest values of water content in the lowest area in the field. According to

_{a}^{Kuzyakova et al. (1997)}, soil topography is one of the main factors that affect the soil spatial variability; they described that landscape shape (concave, convex or lineal) has different spatial variability caused mainly by the different water flow in the soil.

The residual semivariograms of *EC _{a}-V* were fitted to spherical model; meanwhile,

*EC*was fitted to the exponential one (Figure 4). Several authors have described that spherical model is the most common for soil and plant properties (

_{a}-H^{McBratney and Webster 1986};

^{Cambardella et al. 1994};

^{Carvalho et al. 2002};

^{Siqueira et al. 2008}). The

*EC*semivariograms presented the same spatial variability pattern. Although the

_{a}-V*EC*semivariograms in both dates have had the same model, the spatial variability pattern was not the same. This difference is most important at the initial part of the semivariogram that represents the spatial variability at short distances.

_{a}-HThe fitted parameters of semivariogram showed low spatial discontinuity between dates; it means that the nugget effect values (*C*_{0}) were low for all data sets (Figure 4).Range values (*a*) varied from 105 m for *EC _{a}-V* in March 14, 2008 to 145 m for April 3, 2008.

*EC*data presented a low value of range for the second measurement date (40 m) in relation to the first one (40 m). The difference between

_{a}-H*EC*and

_{a}-V*EC*range values is mainly caused by the greater measurement volume in vertical dipole. This fact is responsible for a lower spatial discontinuity.

_{a}-HSample scheme optimization using ESAP 2.35 and SANOS 0.1 software showed a different sample scheme for the 2 methods and for the 2 measurement dates (Figure 5). In SANOS the variation in semivariogram parameter values resulted in different sampling schemes. The SANOS software lets the choice of location number in the other side the ESAP software the user can only use 6; 12 or 20 sample locations. Therefore, if the user wants to use a greater location number in field, it is necessary to subdivide the fields in several subfields.

There is no agreement about the kriging variance minimization as measurement of local uncertainty. ^{Goovaerts (1997)} showed that kriging variance is mainly influenced by the spatial configuration of data. ESAP software uses the 2 measurements of soil apparent electrical conductivity (*EC _{a}-V* and

*EC*) and their locations (

_{a}-H^{Lesch et al. 2000}). ESAP software has a simple use, because it is not necessary to know the variable semivariograms previously to sampling location optimization.

Experimental semivariograms were calculated using the measured *EC _{a}* in the optimized sampling points (Figure 6). These semivariograms showed trends, which were removed, and residual semivariograms were obtained. Residual semivariograms for the 40 optimized sampling locations by ESAP for the 2 dates were fitted to spherical model (Figure 6). The

*EC*residual semivariograms did not show spatial dependence in the studied scale for the first sampling date for the 2 used software (ESAP and SANOS).

_{a}-HFor the second sampling date (April 3, 2008) the *EC _{a}-H* residual semivariogram showed spatial dependence for the sampling scheme obtained by ESAP software and nugget effect with SANOS software. This finding shows that it is not spatial dependence between samples, or separation distance between samples is greater than the minimum radius of spatial variability areas (

^{Vieira 2000}). Nugget values (

*C*

_{0}) were low for all variables that showed spatial dependence. This fact indicates low variability of data. Range values (

*a*) increase 20 m for

*EC*residual from the first to the second sampling date in the sampling scheme obtained with ESAP. Meanwhile the range values of data obtained with SANOS remained stable.

_{a}-VThe *EC _{a}* spatial variability maps (

*EC*and

_{a}-V*EC*) for data measured in March 14, 2008 and April 3, 2008 showed a similar spatial pattern. Lower values of

_{a}-H*EC*were found in the northwestern area (Figure 7).

_{a}*EC*map was less smooth, because the samples are spatially correlated in a shorter distance than

_{a}-H*EC*, according to the range values (Figure 4).

_{a}-V*EC*had distance correlation between samples of 40 m, and

_{a}-H*EC*showed distance correlation around 105 – 145 m.

_{a}-VSpatial variability maps of *EC _{a}-V* and

*EC*made from the sampling schemes optimized by ESAP and SANOS (Figure 8) showed that maps obtained by ESAP were similar to the maps constructed using the measured values (Figure 7). This similarity is greater in the estimated maps for

_{a}-H*EC*.

_{a}-VThe values of *EC _{a}-V* calculated by ordinary kriging in the measurement locations using the data measured in these sampling schemes optimized by ESAP showed low values of SSE (Figure 8). The value of SSE for

*EC*in April 3, 2008 obtained with ESAP was greater. For this residual semivariogram the range value was low (65 m); this fact causes a smaller spatial variability area. The SSE values diminish from the first to the second sampling date. The reduction in the values of SSE may be related to the interaction between

_{a}-H*EC*and soil properties. The soil water content was more homogeneous in the second sampling date, and this fact causes

_{a}*EC*values more homogeneous. Statistical parameters showed that variance values were lower in March 14, 2008, mainly for

_{a}*EC*, and the CV was lower for the 2 dipole measurement modes EM38-DD (

_{a}-H*EC*and

_{a}-V*EC*) (Table 2).

_{a}-HCONCLUSION

Spatial variability of *EC _{a}* in the study area measured with vertical dipole (

*EC*) had the greatest spatial continuity according to nugget effect value. The map of spatial variability was smoother because of the greater range value, when compared to

_{a}-V*EC*measured in horizontal dipole (

_{a}*EC*). Sampling scheme optimization with ESAP 2.35 showed better results than with SANOS 0.1 software. Sampling locations calculated with SANOS 0.1 were able to estimate with best performance the

_{a}-H*EC*values in all locations measured in the studied area. ESAP software takes into account the measurement of

_{a}*EC*in vertical and horizontal dipoles together. Besides its use is easier, and the previous knowledge of semivariogram is not required when compared to SANOS.

_{a}