COMPARISON MEASURES OF MAPS GENERATED BY GEOSTATISTICAL METHODS

This study uses several measures derived from the error matrix for comparing two thematic maps generated with the same sample set. The reference map was generated with all the sample elements and the map set as the model was generated without the two points detected as influential by the analysis of local influence diagnostics. The data analyzed refer to the wheat productivity in an agricultural area of 13.55 ha considering a sampling grid of 50 x 50 m comprising 50 georeferenced sample elements. The comparison measures derived from the error matrix indicated that despite some similarity on the maps, they are different. The difference between the estimated production by the reference map and the actual production was of 350 kilograms. The same difference calculated with the mode map was of 50 kilograms, indicating that the study of influential points is of fundamental importance to obtain a more reliable estimative and use of measures obtained from the error matrix is a good option to make comparisons between thematic maps.


INTRODUCTION
In the early 90s, began to be developed technologies to manage the spatial variability associated with aspects of agricultural production.Productivity varies spatially, and to determine the causes of this variation is the challenge that faces the precision agriculture.The processing and the integration of data, in precision agriculture are usually made in order to select and model the variables that best explain the productivity (JOHANN et al., 2002).
The techniques of localized application and varying rates of agricultural inputs with the help of geostatistics has been one of the areas to focus efforts in recent research (CORA & BERALDO, 2006, SOUZA et al., 2007), and facing this, it is desirable the study of methodologies that allows the comparison of thematic maps generated by different techniques, because only a visual comparison is subjective and important differences between maps can be ignored.
The aim of this study is to use measures derived from the error matrix to compare two thematic maps constructed by geostatistical methods, being one map generated with all samples and defined as a reference map and the other map is generated without two points detected as influential by the diagnostic technique of local influence and defined as model.

MATERIALS AND METHODS
The experimental area covers a grain production field with 13.55 hectares, located in the city of Salto do Lontra -PR, which was cultivated with the wheat cultivar IAPAR 78.In the monitored area, was defined a regular sampling grid of 50 x 50 m a total of 50 georeferenced locations (Figure 1 (A)).FIGURE 1. (A) Location of the 50 sampling points and (B) Area plots division, the gray area was not used in the experiment.
In Figure 1 (B) it is observed that the area was divided into nine plots to facilitate mechanical harvesting, making possible to determine the wheat real production.The area shaded in Figure 1 (B) represents a region of rocks that is not monitored in the experiment.In the sampling points were collected the wheat plots in an area of 1 m 2 .These plots were threshed, weighed and converted into t ha -1 .To perform the geostatistical analysis it was used the software R (R DEVELOPMENT CORE TEAM, 2009) and its GeorR package (RIBEIRO JUNIOR & DIGGLE, 2001).
To model data with a spatial structure, it was considered a Gaussian stochastic process {Z(s), sS},with , being 2  two-dimensional Euclidean space.Supposing that the data, Z(s 1 ),...,Z(s n ),of this process are recorded in known spatial locations s i (i = 1,..., n),and generated by the model Z(s i ) = µ(s i ) +  (s i ),where the deterministic terms µ(s i ) and stochastic (s i ) may depend on the spatial location where Z(s i ) was obtained.It is assumed that the stochastic error  (.) has mean zero, E[ (s i )] = 0,that the variation between points in the space is determined by some covariance function C(s i , s u )=COV[ (s i ),  (s u )] and that for some known functions of s, as x 1 (s),...,x p (s),the mean is defined as (  ,being, β 1 ,...,β p unknown parameters to be estimated.In an equally form and in a matrix form, we have Z = X β + ε,being, E (ε) = 0 (null vector) and the covariance matrix ofΣ= [(σ iu )], in which σ iu = C(s i , s u ).It is assumed that Σ is nonsingular that X has columns with full rank and that Z follows abnormal n-variant distribution with mean vector Xβ and covariance matrix Σ.
Considering the parametric form of the covariance matrix Σ= φ 1 I n + φ 2 R, (MARDIA & MARSHALL, 1984), where φ 1 is the parameter known as the nugget effect, φ 2 is known as the contribution parameter, R=R(φ 3 )=[(r iu )] is an nxn symmetric matrix, with diagonal elements r ii =1, i=1,...,n, where φ 3 is the range function (a) of the model, and I n is an nxn identity matrix.The parametric form of the covariance matrix Σ, occurs for various isotropic processes, where the covariance C(s i , s u ) is defined according to the covariance function C(h iu )=φ 2 r iu ,where h iu is the Euclidean distance between the points s i and s u .. In the covariance functions C(h iu ), the variance of the stochastic process Z is C(0)= φ 1 + φ 2 , and can be defined as semivariance γ(h)=C( 0

) -C(h).
In this study for the adjustment of a spatial model to the experimental semivarigram it was used the exponential, spherical and Gaussian models, for the parameters estimation it was used the ordinary least squares(OLS), the weighted least squares (WLS1),the maximum likelihood (ML) and the restricted maximum likelihood (RML) estimation methods (MARDIA& MASHALL, 1984).To choose the space model that best suits the semivariances it was used the cross-validation technique (VAUCLIN et al., 1983;FARACO et al., 2008) and to investigate the existence of influential points it was carried out a diagnostic analysis of local influence (BORSSOI et al., 2009).
For a set of observed data l(θ) is the log-likelihood function of the postulated model, where θ = (β, φ 1 , φ 2 , φ 3 ) T and that ω a disturbance vector belonging to a disturbance space Ω.And that l(θ/ω) is the logarithm of the likelihood function corresponding to the perturbed model by ωΩ.It is assumed that there is aω 0 Ω such that l(θ) = l(θ/ω 0 ), for all θ and that l (θ / ω) is twice differentiable on (θ T , ω T ) T .Consider the following perturbation scheme: Z ω = Z + ω, with ω = (ω 1 ,...,ω n ) T response perturbation vector and ω 0 = (0,...,0) T the point of no perturbation.With this perturbation scheme is intended to detect possible outliers in the data that affect the maximum likelihood estimator of θ.The ω perturbation influence in the ML estimator of the θ parameter vector can be evaluated by the likelihood difference, defined by LD , where  ˆis the ML estimator of θ of the assumption model and   ˆis the ML estimator of θ of the disturbed model.COOK (1986) proposed to study the local behavior of the LD (ω) around ω 0 , using the normal curve of LD (ω) in ω 0 in the direction of some unitary vector l, defined C l =2|l T Δ T L -1 Δl|, with ||l|| = 1, where L is the observed information matrix, evaluated in    , Δ is a matrix (p + q) xn given by In witch, Where the observed information matrix is L max is the eigenvector, normalized, associated with the largest eigenvalue, in modulus, from the matrix B = Δ T L -1 Δ.The graphic elements | L max | versus i (order) may reveal what kind of perturbation hat has the greatest influence in LD (ω) in the vicinity of ω 0 (COOK, 1986;BORSSOI et al., 2009).
Table 1 shows the generic form of an error matrix (GRINAND et al., 2008).In this matrix, the pixels of the reference map are quantified in columns while the pixels of the model map are quantified in the lines.Each matrix element represents the number of pixels belonging to the class C i , i=1,...,k, of the model map and the class C j , j=1,...,k, ofthe reference map.The elements of the main diagonal (when i = j) represent cases where the pixels have the same classification in both maps while the elements out of the diagonal represent the main erroneous classifications.Using the error matrix elements, the following indexes were calculated; the Global Accuracy (GA) (BACH et al., 2006), the user accuracy (UA i ) (ELLIS & PORTER-BOLLAND, 2008), the producer accuracy (PA i ) LIU et al., 2007), the Kappa ( K ˆ) (JENNESS&WYNNE, 2005, LI et al, 2008)and the Tau (T) (RIEGL & PURKIS, 2005), presented respectively from Equations ( 9) to (13).
Assemblage of the words Picture and Elements.Represents the smallest element of a map to which is possible to assign a value.In this study, the pixels will be defined as the matrix elements of the Kriger values.
The global accuracy (GA) is a statistical assessment used to measure the similarity between something real or of reference and an adjusted model, however, this measure is to be presented in association with other measures that take into account other values contained in the error matrix and not only the main diagonal elements.The user accuracy index (UA i ) represents the ratio between the number of pixels correctly classified in class i over the total number of pixels classified in class i of the model map and the producer accuracy index (PA i ) is an statistic that gives the probability of a pixel to be classified in class i if it really belong s to class i.
The Kappa index ( K ˆ) usually ranges between 0 and 1, with values close to 1 reflecting greater agreement (ALMEIDA & VIEIRA, 2008;LI et al, 2008).According to the classification of KRIPENDORFF (1980), K ˆ is classified with low accuracy if K ˆ<0.67, average accuracy if 0.67  K ˆ< 0.80 and high accuracy if K ˆ 0.80.The Tau index (T) is similar to Kappa, however, differentiates by using in its formulation the probabilities priori for each class.When these probabilities a priori are equal, we have pi = 1 / k, where k represents the number of classes.Another way to compare thematic maps is provided by measurements obtained from the confusion matrix per class (JENNESS & WYNNE, 2005), this matrix consists of four elements, , obtained from the error matrix.The sensitivity index S k = a k /(a k +c k ) is a measure that indicates the probability of a pixel in the model map be classified as belonging to class k if it actually belongs to class k in the reference map and is therefore an equivalent measure to the producer accuracy.The specificity index Table 2 presents some measurements of the total confusion matrix.The total sensitivity (S) and the total specificity (E) index are analogous in its versions per class; with the difference that now they show measures for the full map.The Matthews correlation coefficient (MCC) is a discrete version of the Pearson correlation coefficient and its values are found in the interval [-1, 1], that is the value 1 is equivalent to a perfect prediction, 0 is equivalent to a random prediction and -1 means an inverse prediction.

RESULTS AND DISCUSSION
The samplings exploratory analysis had an wheat average yield of 1.31 t ha-1, and the minimum production of 0.30 t ha -1 and a maximum of 2.40 t ha -1 with standard deviation of 0.54 t ha -1 and coefficient of variation (CV) of 41.40%, considering an heterogeneity among the sample values of the wheat productivity.
A variographic study was carried out using the whole sample and was concluded by the crossvalidation technique that the best fitted model was an exponential with a parameter estimation method of maximum likelihood (ML).
It was conducted a variographic study using the whole sample and found the cross-validation technique that best fitted model was an exponential with parameter estimation method of maximum likelihood (ML).Picture 2 (A) shows a graph of influence of the self-location vector |Lmax| versus i that it is found that the sampling elements 12 and 18 were considered influential, and thus can change some kind of decision in the construction of models geostatistical and /or construction of thematic maps.The figure 2(B) shows the location of influential points in the monitored area.Thus, it was decided to make a new geostatistical analysis by removing the points considered influential.The criterion of cross-validation indicated that the best adjusted model was the Gaussian with estimation method parameters restricted maximum likelihood (RML).Figure 3 shows the semivariograms adjusted with and without the influential points and their parameters (φ 1 , φ 2 , a).The maps were constructed using five classes, this number being determined in such a manner that enables both a visual identification of areas of productivity as convenience to perform localized applications of inputs, for a large number of intervals results in very small regions, making handling found unfeasible.It is observed in Figure 4 that the regions of greatest difference are the regions of influential points, however, various other regions differ, indicating that the withdrawal of influential points affected the maps as a whole.To better compare the maps, it is appropriate to quantify their 5423 pixels in an array of errors (Table 3), defining the map generated with all the points as the reference map and the map generated without influential points as the model map.In the matrix of the errors, the pixels of the reference map are measured in columns and the pixels of the model map are quantified in the lines.Thus, it has that from the 5423 pixels in each Semivariance Productivity (t há -1 ) No modified region map 537 pixels are in the first class of the reference map and 766 pixels are in the first class of the model map.This first information allows realizing an estimate of the wheat production in the area to be compared with the actual grain yield, collected and weighed by the manufacturer, which was 18.49 tons.The total area of 13.551 hectares was monitored and corresponds to 5423 pixels.Performing an arithmetic operation, it is found that the 537 pixels of the first class of the reference map represent an area of 1.3419 hectares.Multiplying this area by the midpoint of the first class (0.5778), it is obtained the estimate of 0.78 tons.Following the same procedure for the other classes, it is obtained the estimated production in tons of the area in Table 4.  4 that the production estimated by reference map was 18.84 tons, 350 kg above the real production and estimated production using the map model was 18.54 tons, 50 kg more than actual production.Thus, it is emphasized that the withdrawal of influential points was of great importance in the analysis because it allowed the estimated production with better precision.The matrix of errors is obtained from Table 3 that the EG index = 0.76, ie the overall accuracy was 76%.This value indicates that the thematic maps differ, because if they were equal, the overall accuracy would be 1.Accuracy can be investigated for each of the five classes, using the contents of Table 5.It is noted from Table 5 that the AU 4 index indicates that 37.9% of the pixels that belong to the class C 4 on the model map belong to other classes in the reference map and the index AP 4 indicates that the probability of a pixel to be classified in class C 4 of the model map if it really belongs to this class in the reference map is 0.7097.Using the elements of Table 3 calculated the Kappa index = 0.69 and considering the equality of the a priori probabilities of each class it was calculated the index Tau T = 0.70.These indexes were close, indicating an average accuracy of the maps, according to the classification of the Kappa (KRIPENDORFF, 1980).Table 6 presents the measurements of the confusion matrices of each class C i .It stands out with the class C 5 is the lowest sensitivity and highest specificity, so the probability of a pixel to be classified in class C 5 on the model map if this pixel belongs to the class C 5 of the reference map, is the lowest if compared to other classes and the probability of a pixel of the model map does not present yield in the range [2.024, 2.437] as the pixel has no productivity in the same range reference map is greater when compared with other classes.It is observed that the biggest mistake of over estimating (TFP k ) occurs in the class C 3 and the largest error of omission (TFN k ) occurs in the class C 5 .Using the elements of the matrix of errors in Table 3, it is obtained the following complete confusion matrix: a = 4113, b =1310, c =1 310 and d =20 382.Using these values obtains S =0.76, E =0.94 and CCM= 0.70.An important feature of the total sensitivity index (S) is that it equals the rate of global accuracy (GA), so the overall capacity on the model map correctly classified pixels is 0.76.The total level of specificity (E) indicates that the capacity on the model map to avoid incorrect treating is 0.94.The correlation of Matthews's coefficient (CCM) obtained was 0.70 and as this index is value 1 if the maps are the same, it appears that the model map differs as compared with the generated map with all points.
An important issue that must be addressed is to define a priori the number of classes, as the increase of these implies a decrease in the similarity between the maps.For example, considering 7, 10, 15 and 20 classes, it is obtained respectively indices of GA= 0.72, GA =0.61, GA = 0.50, GA =0.40 that are smaller than the overall accuracy considering five classes.Considering that one of the functions of the thematic map is to indicate locations for a localized management, it is important to choose a number of classes that makes this work possible.MONMOIER (1993) recommends that the number of ideal classes is around five classes and never more than seven classes, because it is preferable to represent with few groups with symbols of areas graphically stable rather than risk a fine representation of difficult visualization.

CONCLUSIONS
The results showed that the measures derived from the error matrix if the confusion matrix is appropriate to compare thematic maps, as they provide global values and still allow making comparisons by class.Considering that the wheat production is estimated by the model map close to actual production obtained, it is concluded that it is fundamentally important to carry out research on influential points, and the technique of local influences appropriate for that function.
the probability of a pixel not belonging to the class kin the reference map being classified as not belonging to class k in the model map.The false positive rate TFP k = b k /(b k +d k ) represents the commission errors and indicates the proportion of pixels that do not belong to class k in the reference map that are classified as belonging to class k in the model map.The false negative rate TFN k = c k /(a k +c k ) represents the omission errors and indicates the proportion of pixels that belong to the class k of the reference map and were classified in other classes in the model map.Based on these measures, it is possible to compare individually the classes of the model map with the classes of the reference map.For a global comparison of the maps, JENNESS & WYNNE(2005) present the total confusion matrix, in which the values ) are considered estimation errors.The errors of type b are also known as commission or overestimation errors, while errors of type c are known as omission errors.

FIGURE 2 .
FIGURE 2. Diagnostic graph |L max | x i; (B) Localization of the influential points.

FIGURE 4 .
FIGURE 4. (A) Thematic map of the wheat's productivity using 50 sampling units in the interpolation and (B) Thematic map of the wheat's productivity using 48 sampling units in the interpolation.

TABLE 1 .
Generic error matrix of k x k order.
K:class number; C i : class I; n i .: total of pixels in C i class of the model map; n. i: : total of pixels in C j class of the reference map.

TABLE 2 .
Measures derived from total confusion matrix.

TABLE 3 .
Error matrix of wheat's productivity maps.

TABLE 4 .
Estimated production in each subtitle class.

TABLE 5 .
User (AU i ) and producer (AP i ) accuracy indexes

TABLE 6 .
Measures obtained from the confusion matrix of each class.
k :Sensitivity, E k :Specificity, TFP k : False positive rate e TFN k: False negative rate.