A Classification for a Geostatistical Index of Spatial Dependence

In geostatistical studies, spatial dependence can generally be described by means of the semivariogram or, in complementary form, with a single index followed by its categorization to classify the degree of such dependence. The objective of this study was to construct a categorization for the spatial dependence index (SDI) proposed by Seidel and Oliveira (2014) in order to classify spatial variability in terms of weak, moderate, and strong dependence. Theoretical values were constructed from different degrees of spatial dependence, which served as a basis for calculation of the SDI. In view of the form of distribution and SDI descriptive measures, we developed a categorization for posterior classification of spatial dependence, specific to each semivariogram model. The SDI categorization was based on its median and 3rd quartile, allowing us to classify spatial dependence as weak, moderate, or strong. We established that for the spherical semivariogram: SDISpherical (%) ≤ 7 % (weak spatial dependence), 7 % < SDISpherical (%) ≤ 15 % (moderate spatial dependence), and SDISpherical (%) > 15 % (strong spatial dependence); for the exponential semivariogram: SDIExponential (%) ≤ 6 % (weak spatial dependence), 6 % < SDIExponential (%) ≤ 13 % (moderate spatial dependence), SDIExponential (%) > 13 % (strong spatial dependence); and for the Gaussian semivariogram: SDIGaussian (%) ≤ 9 % (weak spatial dependence), 9 % < SDIGaussian (%) ≤ 20 % (moderate spatial dependence), and SDIGaussian (%) > 20 % (strong spatial dependence). The proposed categorization allows the user to transform the numerical values calculated for SDI into categories of variability of spatial dependence, with adequate power for explanation and comparison.


INTRODUCTION
The correct description of spatial dependence is fundamental for revealing both the degree of spatial continuity and the structure of variability of the spatial phenomenon studied (Seidel and Oliveira, 2014).For such description, we can use the semivariogram alone, as a whole, or complement the semivariogram with a single spatial dependence index.According to Seidel and Oliveira (2014), given that the semivariogram is a highly informative descriptor, with abundant graphic detailing, it may need a complementary single measure to summarize all the semivariogram information in regard to spatial dependence.
In geostatistical applications for soil science, agricultural engineering, forest science, and agriculture, among others, two consolidated single indexes have been used.The first index is the relative nugget effect (NE), initially suggested by Trangmar et al. (1985) and further explored in Cambardella et al. (1994), which relates the nugget effect parameter with the sill parameter.The second index, the degree of spatial dependence (SPD), less frequently used and presented in Biondi et al. (1994), relates the contribution parameter to the sill parameter.Both these indexes are complementary and, according to Biondi et al. (1994), are useful for comparing different spatial dependence scenarios.In addition to these two summary indexes, the NE single index has been the index chosen for most analyses of the spatial dependence structure in Brazil.
Since the indexes mentioned here do not consider all aspects of the semivariogram geometry, such as the influence of the range parameter and the relevance of the use of a single index, Seidel and Oliveira (2014) proposed the spatial dependence index (SDI).In addition to considering the contribution, nugget effect, and sill parameters, this index also considers the range parameter, the model factor (which reflects the specific shape of the adjusted curve), and the maximum distance between sampled points.According to Seidel and Oliveira (2014), the SDI presented useful results when applied to real data, and it can be used to substitute or combine with the previously existing indexes.However, when presenting the index, the authors did not propose a categorization of the SDI value scale.This limits its use for classification of the degree of spatial dependence, given that the user frequently expects to declare such a degree of dependence in non-numerical categories, such as weak, moderate, and strong.
The main reason for categorizing a numerical index is to allow for comparison and classification.In soil science, and more generally in the agricultural and environmental sciences, the task of comparing and classifying is fundamental in driving decisions and managing the systems under study.This study is justified by practices already consolidated in spatial research, given that in geostatistical studies an index is generally used to describe spatial dependence and is followed by use of the categorization to classify the degree of such dependence.
The objective of this study was to propose a categorization system for the spatial dependence index (SDI) proposed by Seidel and Oliveira (2014) to allow classification of spatial variability in terms of weak, moderate, and strong dependence.

MATERIALS AND METHODS
The first traditional index, conceptualized by Trangmar et al. (1985) and found in Cambardella et al. (1994), which expresses the relative nugget effect (NE), is given by the expression: Rev Bras Cienc Solo 2016;40:e0160007 in which C 0 is the nugget effect and C 1 is the contribution, both semivariogram parameters.According to Cambardella et al. (1994), NE has the following classification: strong spatial dependence (NE (%) ≤ 25 %), moderate spatial dependence (25 % < NE (%) ≤ 75 %), and weak spatial dependence (NE (%) > 75 %).This categorization seems inspired in the statistical quartile concept because Cambardella et al. (1994) did not use any real data analysis to come to this classification.
The second traditional index of spatial dependence (SPD), presented in Biondi et al. (1994), is given by the expression: in which C 0 is the nugget effect and C 1 is the contribution, both semivariogram parameters like the NE index.Adjusting the classification given by Cambardella et al. (1994), we have the following induced SPD classification: weak spatial dependence (SPD (%) ≤ 25 %), moderate spatial dependence (25 % < SPD (%) ≤ 75 %), and strong spatial dependence (SPD (%) > 75 %).We can observe that NE (%) = 100 % -SPD (%), that is, NE and SPD essentially provide the same information.Therefore, because of equivalence between the two indexes, in this article we will not regard the NE index as essentially different from the SPD index.Moreover, the SPD measure has a geometric justification shown by comparing spatial dependence areas as seen in Seidel and Oliveira (2015).
The spatial dependence index (SDI), proposed by Seidel and Oliveira (2014), is given by the following expressions for the spherical, exponential, and Gaussian models, respectively: in which C 0 is the nugget effect, C 1 is the contribution, and a is the practical range, both semivariogram parameters like the NE and SPD indexes, and 0.5MD are half of the maximum distance (MD) between sampled points, given that in the cases in which the ratio a 0.5MD results in a value superior to 1, this ratio is then truncated into 1, in order to assume values only between zero and 1.In expressions 3, 4, and 5, we use 0.5MD since Seidel and Oliveira (2014), when studying simulation and the application of real data, considered half of the largest distance between sampled points.In addition, the 0.5MD factor is inspired on practical recommendations for using pairs of points up to half of the largest sampling distance in order to estimate semivariances (Journel and Huijbregts, 2003;Landim, 2006;Olea, 2006;Soares, 2006).The constants 0.375 (for spherical semivariogram), 0.317 (for exponential semivariogram), and 0.504 (for Gaussian semivariogram) are the respective values of the model factor (MF) of each of the three models.According to Seidel and Oliveira (2014), this MF is the constant that expresses the strength of the spatial dependence that the specific model can reach, given that the higher its value, the larger the strength of the spatial dependence of the model.Solo 2016;40:e0160007 In order to build the SDI categorization, we considered 13 variations of the 0.25,0.3,0.4,0.5,0.6,0.7,0.75,0.8,0.9,and 1. Those values represent 0,10,20,25,30,40,50,60,70,75,80,90, and 100 % of the generic sill parameter, respectively.In addition, we considered 13 variations of the a 0.5MD component: 0, 0.1, 0.2, 0.25, 0.3, 0.4, 0.5, 0.6, 0.7, 0.75, 0.8, 0.9, and 1.Those values represent 0, 10,20,25,30,40,50,60,70,75,80,90, and 100 % of the half of the generic maximum distance, respectively.Thus, the 13 variations of the

Rev Bras Cienc
, combined with the 13 variation of the a 0.5MD , generate the distribution of 169 values of SDI proposed in this study.In each model, the 169 theoretical values were multiplied by the value of the respective MF×100 % to generate a specific distribution of the SDI corresponding to each of the three models.
First, we calculate the SDI for each model (spherical, exponential, and Gaussian), based on the 169 theoretical values.These 169 calculated values are then considered the data set over which we construct the boxplot graph to evaluate the form of distribution of the SDI values, and, subsequently, we categorize the SDI.To do this, we calculate the position measures: minimum, 1st quartile, median, 3rd quartile, and maximum.The intent is to construct SDI categorization from the position measures calculated and generate the classification as weak, moderate, or strong spatial dependence.This procedure is based on the classification suggested by Cambardella et al. (1994), given that the limits SPD = 25 % and SPD = 75 % can correspond to the 1st and 3rd quartiles of the SPD index, respectively.This strategy of creating categories by the establishment of cut-off points of values is chosen based on the rationale that is virtually impossible to establish what is weak, moderate, or strong spatial dependence categories using data of real spatial variability because of the infinite number of different scenarios and variables around the world (and also in Brazil), and the conflicting opinion of specialist in regard to the cut-off values among the categories.So, these cut-off points are suitable for all types of variables and phenomena, like the cut-off points of the SPD index, which uses the same rationale implicit in the criterion of Cambardella et al. (1994) for classification, because there is no other reasonable criterion.
Finally, to exemplify the proposal of categorization and classification of SDI and to compare it with classification based on the SPD index (whether the two criteria are equivalent, or not), real data, from papers of the Soil Science area, were used.A search was made on the Scientific Electronic Library Online (SciElo Brazil) journal portal according to the following guidelines: we searched for papers published in the Revista Brasileira de Ciência do Solo (RBCS) in which spherical, exponential, and Gaussian models were fitted, published in the period 2006-2015, and which had some kind of information for obtaining the maximum distance from the sampling grid in order to calculate the SDI and to obtain at least 100 values of the SDI in each model.If the total values for each model were not complete, a search was made in papers from other journals in the soil science area.The following papers were found: Simões et al. (2006) Santos et al. (2015), and Siqueira et al. (2015).These papers present real data dealing with different soil types, soil layers, spatial dependencies, and sizes of sampling grids.Thus, SDI exemplification and classification equivalence with the SPD index was performed for different soil types, soil layers, and spatial dependencies.All procedures were performed on the software R (R Core Team, 2012).

RESULTS AND DISCUSSION
In order to verify the distribution of the theoretical values of the index we used the boxplot graph as presented in figure 1, in which we can observe that the SDI have positive Rev Bras Cienc Solo 2016;40:e0160007 asymmetric distribution (higher concentration of values at the lower extreme of the distribution).But, theoretically, the SPD index has approximately symmetrical distribution.This positive asymmetry of SDI is because of consideration of the model factor, the range parameter, and the maximum distance of the grid in its calculation, creating a more realistic and conservative measure of spatial dependence.
Given that the SDI (spherical, exponential, and Gaussian) exhibit positive asymmetric distribution, they must be categorized having the median and the 3rd quartile as limits, in order to have a classification that is more coherent and in fitting with the index distribution form.
For the Gaussian semivariogram: SDI Gaussian (%) ≤ 9 % → weak spatial dependence; 9 % < SDI Gaussian (%) ≤ 20 % → moderate spatial dependence; SDI Gaussian (%) > 20 % → strong spatial dependence.The Gaussian semivariogram model represented the highest potential strength of spatial dependence, with a higher MF (Seidel and Oliveira, 2014), which led to a wider scope of values for the SDI.The difference among the limits for the categories according to the many different models is consistent with the semivariograms, and it is not natural that we fix these limits among all of the possible models at the values proposed by Cambardella et al. (1994), because the spatial dependencies in those models are different.In this respect, the SDI categorization with its many different limits according to the semivariogram model is more reasonable than SPD categorization.
Based on real data obtained from selected papers, it was possible to obtain the distributions of SDI for spherical, exponential, and Gaussian models, respectively (Figure 1).In accordance with evaluation of figure 1, the proposed theoretical behavior based on the 169 theoretical values and the behavior observed from the real data are similar, highlighting the same positive asymmetry of the SDI in both cases, as well as the approximately equal values of the median and the 3rd quartile in both cases.
For different soil types, different soil layers, and different degrees of spatial dependence was calculated the SDI and its distribution evaluated for spherical model (Figure 2), exponential model (Figure 3) and Gaussian model (Figure 4), respectively.
In all cases there was a positive asymmetric distribution of SDI.Furthermore, the median values ranged from 9 to 13 in the spherical case, from 9 to 15 in the exponential case, and from 6 to 16 in the Gaussian case.The 3rd quartile values, for their part, ranged from 12 to 19 in the spherical case, from 13 to 21 in the exponential case, and from 17 to 29 in the Gaussian case.This shows that similar behavior is observed from the real data evaluated in different situations of soil types, soil layers, and spatial dependencies compared to the proposed classification based on these measures, despite some variation in values (between the determination of theoretical values of the SDI and the values obtained from sampling of papers).
In addition to these results, we applied the current classification proposal from the SDI to the data of the research papers and made a comparison with the previous classification, given by Cambardella et al. (1994), as shown in Table 1.
Most of the classifications changed (58.9 %) (Table 1).The spherical model had fewer changes (50.5 %) and the Gaussian model showed higher classification change (73.6 %).Also, it can Rev Bras Cienc Solo 2016;40:e0160007 be observed that the weak classifications of spatial dependence increased and the moderate classifications decreased.Strong classifications did not show many changes.This is consistent with the positive asymmetric shape of the SDI.These changes between SDI and the classification of Cambardella et al. (1994) show that these two classifications are not equivalent.
Because of the short time for the scientific community to consider the use of SDI, few studies have used it yet.Pazini et al. (2015) was one of the first to use the SDI (Exponential and Gaussian form) to quantify the degree of spatial dependence, but with no classification defined at the time of producing the score.However, with the categorization proposed in this study, it is possible to classify the spatial dependence obtained in the study conducted by Pazini et al. (2015).Considering that SDI Exponential (%) = 22.3 %, it would be classified as indicating strong spatial dependence (SDI Exponential (%) > 13 %); SDI Gaussian (%) = 8.9 % would indicate weak spatial dependence (SDI Gaussian (%) ≤ 9 %); and SDI Gaussian (%) of 44.5 to 50.4 % would indicate strong spatial dependence (SDI Gaussian (%) > 20 %).

CONCLUSIONS
The SDI categorization was based on its median and 3rd quartile over a wide set of possible values, allowing to create a classification for spatial dependence as weak, moderate, or strong.
The proposed categorization allows the user to transform the numerical values calculated for the SDI into categories of degree of spatial variability, with adequate power for explanation and comparison.

( 1 )
General: 41.1 % of the data showed no change.58.9 % of the data showed change in classification; Exponential: 47.3 % of the data showed no change.52.7 % of the data showed change in classification; Gaussian: 26.4 % of the data showed no change.73.6 % of the data showed change in classification; Spherical: 49.5 % of the data showed no change.50.5 % of the data showed change in classification.

Table 1 .
(Cambardella et al., 1994)s classification(Cambardella et al., 1994)and current classification of spatial dependence, proposed in this study, applied to real data, for the spherical, exponential, and Gaussian models(1)